7,181 Matching Annotations
  1. Jul 2022
    1. Author Response:

      Reviewer #3 (Public Review):

      Murphy et al. further develop the linked selection model of Elyashiv et al. (2016) and apply it to human genetic variation data. This model is itself an extension of the McVicker et al. (2009) paper, which developed a statistical inference method around classic background selection (BGS) theory (Hudson and Kaplan, 1995, Nordborg et al., 1996). These methods fit a composite likelihood model to diversity data along the chromosome, where the level of diversity is reduced by a local factor from some initial "neutral" level π0 down to observed levels. The level of reduction is determined by a combination of both BGS and the expected reduction around substitutions due to a sweep (though the authors state that these models are robust to partial and soft sweeps). The expected reduction factor is a function of local recombination rates and genomic annotation (such as exonic and phylogenetically conserved sequences), as well as the selection parameters (i.e. mutation rates and selection coefficients for different annotation classes). Overall, this work is a nice addition to an important line of work using models of linked selection to differentiate selection processes. The authors find that positive selection around substitutions explains little of the variation in diversity levels across the genome, whereas a background selection model can explain up to 80% of the variance in diversity. Additionally, their model seems to have solved a mystery of the McVicker et al. (2009) paper: why the estimated deleterious mutation rate was unreasonably high. Throughout the paper, the authors are careful not only in their methodology but also in their interpretation of the results. For example, when interpreting the good fit of the BGS model, the authors correctly point out that stabilizing selection on a polygenic trait can also lead to BGS-like reductions.

      Furthermore, the authors have carefully chosen their model's exogenous parameters to avoid circularity. The concern here is that if the input data into the model - in particular the recombination maps and segments liked to be conserved - are estimated or identified using signals in genetic variation, the model's good fit to diversity may be spurious. For example, often recombination maps are estimated from linkage disequilibrium (LD) data which is itself obtained from variation along the chromosome. Murphy et al. use a recombination map based on ancestry switches in African Americans which should prevent "information leakage" between the recombination map and the BGS model from leading to spuriously good fits. Likewise, the authors use phylogenetic conservation maps rather than those estimated from diversity reductions (such as McVicker et al.'s B maps) to avoid circularity between the conserved annotation track and diversity levels being modeled. Additionally, the authors have carefully assessed and modified the original McVicker et al. algorithm, reducing relative error (Figure A2).

      One could raise the concern that non-equilibrium demography confounds their results, but the authors have a very nice analysis in Section 7 of the supplementary material showing that their estimates are remarkably stable when the model is fit separately in different human populations (Figure A35). Supporting previous work that emphasizes the dependence between BGS and demography, the authors find evidence of such an interaction with a clever decomposition of variance approach (Figure A37). The consistency of BGS estimates across populations (e.g. Figures A35 and A36) is an additional strong bit of evidence that BGS is indeed shaping patterns of diversity; readers would benefit if some of these results were discussed in the main text.

      We appreciate the reviewer’s kind remarks. With regards to the results included in the main text vs the supplement, we attempted to strike a balance between having the main text remain communicative to a larger readership and providing experts with details they may find useful. We have, however, done our best for the supplementary analyses to be written clearly.

      I have three major concerns about this work. First, it's unclear how accurate the selection coefficient estimates are given the non-equilibrium demography of humans (pre-Out of Africa split, and thus not addressed by the separate population analyses). The authors do not make a big point about the selection coefficient estimates in the main section of the paper, so I don't find this to be a big problem. Still, some mention of this issue might be helpful to readers trying to interpret the results presented in the supplementary text.

      As the reviewer notes, we chose not to emphasize the inferred distributions of selection coefficients. Our main reason for this choice is the technical issue addressed in Appendix Section 1.5 (L561-564): “Second, thresholding potentially biases our estimates of the distribution of selection effects. While this bias is probably smaller than the bias without thresholding, its form and magnitude are not obvious. This is why we decided not to report the inferred distributions of selection effects in the Main Text.” We agree that if we were to focus on our estimates of the distribution of selection effects, the effects of demographic history would also need to be considered. This is, however, not the focus here.

      Second, I'm curious whether the composite likelihood BGS model could overfit any variance along the chromosome - even neutral variance. At some level, the composite likelihood approach may behave like a sort of smoothing algorithm, albeit with a functional form and parameters of a BGS model. The fact that there is information sharing across different regions with the same annotation class should in principle prevent overfitting to local noise. Still, there are two ways I think to address this overfitting concern. First, a negative neutral control could help - how much variation in diversity along the chromosome can this model explain in a purely neutral simulation? I imagine very little, likely less than 5%, but I think this paper would be much stronger with the addition of a negative control like this. Second, I think the main text should include the R2 values from out-sample predictions, rather than just the R2 estimates from the model fit on the entire data. For example, one could fit the model on 20 chromosomes, use the estimated θΒ parameters to predict variation on the remaining two. The authors do a sort of leave-one-out validation at the window level (Figure A31); however, this may not be robust to linkage disequilibrium between adjacent windows in the way leaving out an entire chromosome would be.

      The two requested analyses were done and their results are described above, in response to essential revisions (p. 2-3 here). In brief, there is no overfitting of neutral patterns or otherwise. We elaborate on why this finding is expected below.

      Finally, I feel like this paper would be stronger with realistic forward simulations. The deterministic simulations described in the supplementary materials show the implementation of the model is correct, but it's an exact simulation under the model - and thus not testing the accuracy of the model itself against realistic forward simulations. However, this is a sizable task and efforts to add selection to projects like Standard PopSim are ongoing.

      We agree that forward simulations would be a nice addition, but believe that it is a project in itself. Indeed, a major complication is that when, for computational tractability, purifying selection is simulated in small populations with realistic population-scaled parameters, the reduction in diversity due to selection at unlinked sites has a major effect on neutral diversity levels (see, e.g., Robertson 1961). We hope to address this issue in future work. Meanwhile, we note that the theory that we rely on has been tested against simulations in the past (e.g., Charlesworth et al., 1993; Hudson and Kaplan, 1995; Nordborg et al., 1996).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01158

      Doi preprint: https://doi.org/10.1101/2021.11.16.468835

      Corresponding author(s): Salah, MECHERI

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      2. Point-by-point description of the revisions

      This section is mandatory. Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Whole sporozoite vaccines confer sterilizing protection against Plasmodium infection. However, further improvements of whole sporozoite vaccines is needed and requires a thorough understanding of the immune processes that mediate protection and the deployment of novel strategies further augment protective immunity while limiting the impact of factors that are detrimental to protection. Work from the Mecheri laboratory and others had previously established that IL-6 signaling plays a critical role in the immune response to a liver stage infection; engagement of IL-6 signaling promotes the initial control of a liver stage infection and enhances the protective adaptive immune response. Given this potent protective role for IL-6, Belhimeur and colleagues design a parasite strain in rodent malaria parasites that encodes and secrete murine IL-6 during liver stage infection. They show that upon infection of wildtype mice, these transgenic parasites i) are unable to transition to blood stage infection, ii) produce Il-6 and iii) induce a durable adaptive immune response that can protect against sporozoite challenge. This study is novel and intriguing. However, a superficial analysis of the transgenic parasite strain, an incomplete analysis of the immune response to infection and the lack of data regarding the possibility of IL-6 mediated immunopathology have dampened this reviewer's enthusiasm for the work.

      **Major Concerns:** __

      1)The data in Figure 3b-3d clearly indicate that the IL-6 encoding transgenic parasites exhibit a defect in parasite development within HepG2 cells that is maintained in vivo. The authors propose that an arrest of these parasites in the liver stage precludes their transition to blood stage infection and that this arrest is dependent on IL-6 signaling. To better support that claim the authors should:

      a.Better characterize in vivo liver stage arrest using infected liver tissue analysis with immunofluorescence microscopy to determine when and how precisely IL-6 transgenic parasites are impacted in development.

      Done. New data in figure 3B, C, D

      b.Determine if arrested development of IL-6 transgenic parasites is truly dependent on IL-6 signaling using antibody blockade of IL-6 signaling and mice with genetic defects in IL-6 signaling.

      Experiments were done using anti-IL-6 receptor blocking antibodies, but did not work. This was commented in the text and shown in Supplementary Fig 2 .

      2)The authors claim that IL-6 production and secretion into the liver tissue augments the adaptive immune response to liver stage infection. This in turn results in a durable adaptive immune responses that protect against infection. However, the mechanistic underpinning of IL-6 signaling in the liver that is induced by their transgenic parasites and the impact on adaptive immune responses is poorly characterized:

      a.There is no evidence that the protective adaptive immune response induced by IL-6 trangenic parasite infection is dependent on IL-6 signaling. Is superior protection and immunogenicity lost in IL-6 signaling deficient animals that are infected with IL-6 transgenic parasites?

      Not addressed but the point is that IL-6 leads to attenuation.

      b.What elements of the adaptive immune response are impacted? One can imagine that IL-6 mediated killing of infected hepatocytes might introduce more parasite antigen that can be acquired by antigen presenting cells, or that IL-6 mediated pro-inflammatory signaling might regulate the maturation of antigen presenting cells, increased differentiation of helper T cells, the downregulation of regulatory T cell function and frequency and/or the differentiation of effector CD8 T cells into long-lived hepatic memory CD8 T cells. The authors should conduct a more comprehensive analysis of how parasite-encoded IL-6 impacts adaptive immunity.

      Done. An extensive analysis of CD4 and CD8 phenotype and status of activation is represented in Fig 9.

      3)While IL-6 transgenic parasites induce a potent and durable adaptive immune response, the authors should show how this compares to published whole sporozoite immunizations. The authors should determine if immunization with IL-6 transgenic parasites is superior to for example immunization with radiation-attenuated sporozoites and generically attenuated sporozoites.

      It not the point. The work presented here emphasizes the proof of concept that the proposed new strategy works. Follow up studies will compare this model to previous ones.

      4) IL-6 signaling is a major player in inflammatory diseases and the induction of immunopathology. As such the authors should carefully examine the duration and magnitude of IL-6 protein production in the liver, and serum after IL-6 Tg parasite infection and determine if IL-6 signaling promotes liver immunopathology.

      Not done but this point was discussed in the text. Also, we made it clear in the material and methods section that the way the construct was made, i.e the IL-6 production is time-frame restricted to the first 48h of liver infection, precisely because of the expression of IL-6 gene is under the control of LISP-2 promoter. Therefore there is no persistence of IL-6 production by liver stage parasites.

      Reviewer #1 (Significance (Required)):

      The paper is reporting a novel strategy to generate a whole sporozoite vaccine. Expression of IL6 in a transgenic parasite. This could be a significant contribution to the field if additional experiments as outlined in the critique are conducted.The work might also inform vaccine design for other pathogens.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes the construction of a Plasmodium berghei that expresses murine interleukin-6 in exoerythrocytic (liver stage) parasites and the analysis of mice infected with sporozoites of this parasite line. They find that such parasites do not complete development in liver cells and therefore do not produce subsequent infection in red blood cells. The ability of prior infection with these parasites on the ability of the host to resist both wild type and heterologous species challenge is then examined.

      The key assumption that underlies the study is that the observed phenotypes result from parasite expression of bioactive IL-6 that functions to modulate the immune system. Other explanations are not considered, for example the over-expression of secreted IL-6 may prevent the complete maturation of the intracellular parasite by clogging up the parasite secretory pathway. The authors use the 'wild type' parasite as the control but not only does the wild type not express IL-6 it also does not express the human DHFR gene used as a selection system. A much better control parasite would be one that expresses a non-bioactive IL-6 so that the potential effects on parasite maturation can be differentiated from those on the mouse immune system. Another control to be considered would be comparison with a genetically attenuated parasite with a block in late stage development, and which does not produce a host cytokine.

      Interesting comment but key novel result is that co-infection studies show reversed phenotype of IL-6 transgenic parasites, likely due to counteracting Of IL-6 effect by Wild type parasites (Supplementary Fig 1)

      Another assumption is that IL-6 is secreted from the infected liver cell and mediates its effects, presumably by binding to its cell surface receptor. The expectation of Il-6 secretion from the parasite is that it would accumulate in the parasitophorous vacuole - how would it get out of the infected host cell? While evidence is provided of IL-6 in the in vitro culture supernatant of infected cells - this might arise from damaged cells in rather artificial conditions. Have the authors considered doing the experiment of concurrent mouse infection with both wild type and recombinant parasites? If the mechanism of parasite killing in infected liver cells is as proposed, then a reduction of wild type parasites in the subsequent asexual blood stage would be expected.

      Experiments done. We discussed both experiments: IL-6 receptor blocking antibody experiement (Suppl Fig 2), and mixed infection (Suppl Fig 1).

      Figure 3 indicates that IL-6 TgPbA/LISP2 parasites are as efficient or better than wild type parasites at invading host cells but then they do not develop to maturity. What is the evidence that the key factor in their ability to immunize the host is expression of IL-6 rather than the effect of an attenuated parasite?

      This is an interesting observation made by the reviewer. With the available data, we cannot really tell which of the two possibilities is operating in thin system. It could also be that the two option are interconnected.

      In this model malaria infection, it looks like there are two lethal outcomes: one associated with experimental cerebral malaria at relatively low blood stage parasitemia (which I understand is a controversial model for human cerebral malaria) and the second associated with high blood stage parasitemia. Some of the protocols affect which outcome occurs (see for example Fig 6), but this observation is not properly discussed.

      In many occasions, we did see in the past a discrepancy between anti-parasite immunity and anti-disease protection. In this particular experiment (Fig 6), we explored the dose effect of the IL-6 mutant. What is clear from this model is that at the high dose, 104 SPZ, we observe both anti-parasite and anti-disease protection and immunity, whereas at the lower doses, 103 and 102 SPZ, although there was no efficient anti-parasite immunity, mice did not die from cerebral malaria but much later from hyperparasitemia. We consider that the two low doses of IL-6 transgenic parasites did protect against disease expression.

      For the data presented in Fig 7, why was there a challenge with WT PbA sporozoites before the heterologous Py challenge? If this step is excluded is there still an effect against P. yoelii? Why was the parasite chosen for the heterologous challenge Py17XNL? Since this parasite is largely restricted to reticulocytes in the blood stream would a different effect have been observed if the heterologous challenge parasite was, for example, P. chabaudi?

      Out of scope.

      Although the expectation is that IL-6 expression would not occur in the asexual blood stage, I think it would be important to demonstrate experimentally that this is the case.

      Done. IL-6 transgenic parasite, when inoculated as infected erythrocytes have no development defect and grow normally in infected mice.

      In Fig. 4A the y-axis is labelled IL-6 rRNA when it should be IL-6 mRNA.

      Corrected

      Reviewer #2 (Significance (Required)):

      The significance of the report does depend on whether or not the experimental evidence is sufficient to support the claim that parasite expression of IL-6 is important in generating immunity. There has been a number of studies to show that infection with sporozoites that have been genetically attenuated to not complete subsequent development in the infected liver cell can provide immunity to subsequent infection; what is different about this study is that the authors specifically target the parasite to express a host protein that is likely to be important in acquisition of immunity. Therefore for the study to have high significance they have to show convincingly that it is the expression and activity of IL-6 that is important and I do not think this is the case with the experiments reported. If the authors are correct, then the idea of manipulating the host response by expression of host proteins by the parasite may be an attractive approach to dissect the key elements of immunity to sporozoite infection. At the moment, although there is a lot of focus on developing an attenuated whole sporozoite vaccine against malaria, and this study may provide proof of principle for including a host component in the parasite, there would still be long way to go before any practical application of this approach.

      The key message was toned down. As the formal demonstration that the expression and activity of IL-6 is direcxtly involved in IL-6 transgenic parasites to confer protective immunity, we suggest to tone down the message by saying that IL-6 attenuates parasite virulence, the mechanism being likely through IL6 signaling detrimental effect on parasite development.

      The audience would be those interested in parasite immunology.

      __

      Reviewer expertise: malaria parasite cell and molecular biology; host immunity.

      **Referees cross commenting** __

      __ I think all reviewers are of the opinion that there needs to be a better demonstration that the observed phenotype is mediated by expression and signaling of IL-6, for example by antibody blockade or using a mouse line with a genetic defect in IL-6 signaling. Looking at all the issues that have been raised by the reviewers and need to be addressed with further experimentation, my feeling is that this will take longer than 6 months.

      __

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __ **Summary** This study explores the expression of murine IL-6 by rodent Plasmodium berghei as a means to generate transgenic parasites whose development in the liver is arrested, which may be used as a genetically attenuated pre-erythrocytic vaccine against malaria. The authors conclude that IUL-6-expressing Plasmodium parasites elicit CSD8+ T-cell mediated immune responses that protect against a subsequent challenge with infectious sporozoites.

      **Major Comments** __

      In Figure 3, the authors show the results of qRT-PCR analysis of mouse livers infected with WT or transgenic parasites. They then use HepG2 cells to assess hepatic parasite numbers and development. Why didn't the authors assess this also in vivo, in liver sections of infected mice?

      Done. New data are presented in Fig 3B, C, D

      Linked to the above, a more complete analysis of the parasite's behavior in HepG2 cells should be provided. The authors write in the discussion that "IL-6 transgenic parasites develop perfectly well in cultured hepatocytic cells". Does this mean that they develop to the production of infectious merozoites? This could be confirmed by allowing the infected cultures to progress for 60-70 hours and then collecting the supernatants of these cultures and injecting them into naïve mice, to understand whether or not infectious merozoites are formed in vitro.

      New analysis demonstrate that IL-6 transgenic parasites actually display a developmental defect at the pre-erythrocytic stage in vivo.

      Figure 3C: The authors mention this result almost in passing but fail to provide an explanation for this observation. Why is the number of transgenic parasite EEFs approximately double that of WT parasite EEFs?

      A new figure 3 is provided and show that the EEF density (Fig 3B) was drastically reduced both at 24h and at 48h in mice infected with the IL-6 transgenic parasites as compared to those infected with WT PbA parasites, although the differences were not statistically significant. We also examined the size (Fig. 3C) of EEFs, and found the same tendency, namely a reduced size and diameter of IL-6 transgenic EEF as compared to those of WT PbA EEFs with a statistical difference only at 40h.

      Figure 3D: The EEF area units (mm2) on the YY axis are certainly wrong. However, they cannot be um2 either, as 15-30 um2 would be far too small for EEFs at 48 hours post-infection. What is it then?

      New data are now provided in a new Fig 3.

      The authors write "... suggest that the failure of IL-6 Tg-PbANKA/LISP2 parasites to develop in the liver of infected mice is likely due to an active anti-parasite immune response mediated by parasite-encoded IL-6 in vivo". I have several issues with this statement. 1) as mentioned above, the in vitro data cannot be used to draw definitive conclusions about the parasites' behavior in vivo; 2) the transgenic parasites do not "fail to develop in the liver of infected mice". If anything, they develop less than their WT counterparts, which is different from "failing to develop". Clarifying how much they do develop would be important (see next comment).

      We provide new in vivo data as to the development of IL-6 transgenic parasites. A new figure 3 is provided and show that the EEF density (Fig 3B) was drastically reduced both at 24h and at 48h in mice infected with the IL-6 transgenic parasites as compared to those infected with WT PbA parasites, although the differences were not statistically significant. We also examined the size (Fig. 3C) of EEFs, and found the same tendency, namely a reduced size and diameter of IL-6 transgenic EEF as compared to those of WT PbA EEFs with a statistical difference only at 40h. We replaced failure by a defect in development.

      In connection with the above, I would like to know more about the time when the development of IL-6 Tg-PbANKA/LISP2 parasites is arrested in vivo, in the liver. Are these early- or late-arresting parasites? Is the liver stage of infection compromised during parasite development or at egress? To clarify this, the manuscript would benefit from a timecourse analysis of liver sections of mice infected with this parasite, including data on EEF numbers and sizes up to and beyond 48 h after sporozoite inoculation.

      Done. See new figure 3.

      Still linked to the issue of parasite arrest in vivo and the possibility of breakthroughs, the manuscript would benefit from an experiment where mice were injected with a high number of transgenic sporozoites and parasitemia is monitored thereafter, much like what was done in Figure 2D, but starting off with a larger inoculum of at least 5 x 10^5 sporozoites.

      This was done and there was no breakthrough even with doses as high as 106 sporozoites

      While the results shown to suggest that secreted IL-6 restricts the parasite's liver stage development in vivo, this could be more definitely demonstrated by performing an infection with the transgenic parasites in the context of blocking or absence of the host's IL-6 receptor. This experiment was done but unfortunately did not work (Suppl. Fig 2). That is, the treatment of mice infected with IL-6 transgenic parasites with anti-IL-6 receptor blocking antibodies did not reverse the infection phenotype. This was also discuss in the manuscript.

      **Minor Comments**

      __

      The manuscript needs to be improved in terms of both language and format. Some examples, solely from the abstract, are listed below, but the manuscript needs to be appropriately revised in terms of language, grammar, punctuation and format throughout:__

      -Space missing between "P." and "berghei"

      Done

      -Gene names should be italicized

      Done

      -Rephrase "Considering IL-6 as a critical proinflammatory signal..." to "Considering that IL-6 is a critical proinflammatory signal..."

      Done

      -"transgenic IL-6 sporozoites" should be "transgenic IL-6-expressing P. berghei sporozoites"

      Done

      -"impairs Plasmodium infection at the liver stage" should be "impairs the liver stage of Plasmodium infection"

      Done

      INTRODUCTION

      The sentence "Among them, parasites lacking integrity of the parasitophorous vacuole, or late during development, and..." appears to be incomplete and needs rephrasing.

      Done

      The references used in sentence "During the last decade, in search of key mechanisms that determine the host inflammatory response, a set of host factors turned out to be critical for malaria parasite liver stage development (Mathieu et al., 2015); (Demarta-Gatsi et al., 2017; Demarta-Gatsi et al., 2016) (Grand et al., 2020)" do not all relate to the liver stage of infection. The authors need to select references that are relevant for their statement or else change the statement.

      Rephrased

      RESULTS

      I suggest the authors change the title of Results section "Transgenic P. berghei parasites expressing IL-6 during the liver stage lose infectivity to mice" not only to improve the quality of the English language employed but also to better clarify the notion that they are talking about hepatic infectivity.

      On the same section, please correct "timely specific timely".

      Done

      Transfectants are not "verified". If anything, the insertion of the gene in the parasite's genome is verified or, better still, confirmed.

      Done

      Sentence "The two lines behave similarly" is redundant.

      Done

      The legend of Figure 1 must include the definitions of all the acronyms in that figure.

      Acronyms in the whole manuscript are defined elsewhere

      "IL-6 transgenic sporozoites" is not an appropriate designation. If anything, they should be called IL-6-expressing P. berghei sporozoites".

      Done

      Figure 2 B: The YY axis should clarify that it refers to sporozoite numbers, as there are many other parasite stages in mosquitoes.

      Done

      Figure 2C: This scheme is hardly necessary. It would suffice to label the plots in D and E with the names of the parasite lines employed rather than "Group 1", "Group 2", "Group 3". The scheme is provided for more clarity and easy reading of the accompanying figures

      Figure 2D, 2E: Why didn't the authors use the same scale on the XX axis of the two plots?

      The qRT-PCR data per se do not substantiate the statement "Therefore, RT-qPCR analysis in the liver confirms that the loss of infectivity of IL-6 Tg-PbANKA/LISP2 SPZ is due to a defect in liver stage development in vivo", as a defect in invasion of hepatocytes cannot be excluded. The term "loss of infectivity" is also misleading. Do the authors mean loss of blood stage infectivity?

      Yes

      Sentence "... all parasites were able to invade and develop inside HepG2 cells." is misleading. The authors probably mean "parasites of both lines".

      Changed

      Figure 4: Why did the authors swap the order of the two experimental groups from one plot to the next? The same order should be used, to avoid confusion! Also, the authors should make the width of the bars in similar between the two plots.

      Done

      The authors should consider moving Figure 5 to the Supplementary materials.

      Reviewer #3 (Significance (Required)):

      *Nature and significance of the advance. Compare to existing published knowledge. Audience.*

      This study extends our current knowledge on genetically attenuated malaria vaccine candidates and validates the concept of suicide parasites for immunization against malaria. This paper will be of interest to researchers working on malaria vaccination, as well as all those interested in transgenic Plasmodium parasites, and the biology and immunology of liver stage infection by malaria parasites.

      *Your expertise.*

      The co-reviewer and the reviewer are experts on the liver stage of Plasmodium infection and on pre-erythrocytic malaria vaccination.

      **Referees cross commenting**

      I agree with all of Reviewers 1 and 2's remarks and, upon consideration, I would like to revise my "Estimated time to Complete Revisions" to become between 3 and 6 months

    1. Author Response

      Reviewer #1 (Public Review):

      The general idea of comparing response patterns to stress in the offspring generation is new and very interesting.

      We thank Reviewer 1 for their time and thoughtful comments. We agree that these comparisons are new and very interesting and have added multiple revised analyses to the manuscript based on the reviewer comments that we think will further enhance the impact of and conclusions made in this study.

      However, the data that are presented are in several ways preliminary. The phenotype comparisons are mostly convincing, although statistical treatments are partly unclear, given that each "replicate" includes itself many individuals.

      The statistical treatments for groups of individuals are the same as in Burton et al., 2017, Burton et al., 2020, and Willis et al., 2021 which include the original reports of the intergenerational responses studied here. Replicates that include many individuals are relatively common when working with C. elegans and are usually compared using ANOVA or student’s t-tests (depending on the number of comparisons) to analyze the variation in batch effects as well as differences between populations of animals.

      We believe this ability to assay hundreds or even thousands of animals, in total, for each comparison in this study makes our data substantially stronger and more reliable. However we are happy to perform any additional statistical tests the reviewer might want to see.

      The transcriptomic data are minimal (only three replicates)

      To address this comment we compared our original three replicates of RNA-seq from F1 animals from C. elegans parents exposed to P. vranovensis BIGb0446 to a second independent three replicates of F1 animals from C. elegans parents exposed to a second P. vranovensis isolate (BIGb0427 – the data for this second P. vranovensis isolate was already part of Fig. 4 of this manuscript).

      By comparing these three new replicates to our previous findings from three original replicates we found that 515 of the 562 genes that exhibited a >2-fold change and were significant at padj <0.01 in the original three replicates were also changed at >2-fold and padj <0.01 in the new three replicates. We believe our findings that 91.6% of genes change >2-fold and remain significant at padj<0.01 even when the number of replicates is doubled (and a different isolate of P. vranovensis is used!) suggests that adding additional replicates would not substantially change the conclusions of this manuscript.

      We would also like to highlight, as above, that because this analysis was done on populations of thousands of similarly staged animals, as opposed to individuals, that this further reduces the variability between replicates. In addition, much of our transcriptomic data from each species was then compared across species and genes were only analyzed for those that changed in multiple different species which themselves each represent a separate three additional replicates [ie genes that change in all 4 species analyzed have to exhibit significant (>2-fold, padj <0.01) changes across 12 total replicates].

      Our new findings comparing six replicates did not substantially change the number of genes identified when compared to using three replicates, and the fact that for all of the main conclusions of this manuscript each set of triplicates from one species was then compared across 9 additional replicates from three other species from pools of thousands of animals makes us very confident that our results are robust and highly reproducible.

      and lack comparison to the stress responses in the parental animals.

      We agree with Reviewer 1 that comparisons to parental animals are interesting and important. Comparisons of F1 progeny gene expression patterns to parental animals were not included here because such comparisons were previously published in some of our original reports of these intergenerational effects (For example, see Burton et al., 2020). In summary, we found that most, but not all, of the effects on gene expression in F1 animals were also detected in parental animals. However, the transcriptional responses only turn on in F1 animals post gastrulation and do not appear to be due to the simple deposition of parental mRNAs into embryos (Burton et al., 2020).

      We have updated the text to highlight these findings.

      The analysis of the transcriptome data is limited to counting overlaps between significantly changed genes, without deeper discussion of the genes and pathways that are affected.

      In the revised manuscript we have completely redone all of the transcriptomic analysis to use a stricter set of cutoffs for significance – both padj <0.01 and requiring a >2-fold change in expression based on the helpful comments of Reviewer 1 – which we agree with – see below.

      As part of this new analysis we have now also included a deeper discussion of the genes that exhibited similar changes across species, including using g:Profiler to examine the genes that exhibited changes across all four species.

      In addition, we have now paired our phenotypic and transcriptomic data across species to identify 19 new genes that we predict are highly likely to be involved in intergenerational responses to stress based on their expression patterns across species. These 19 genes come out of highly filtered analyses across species that identified a total of 23 genes that change only in species that adapt to P. vranovensis or osmotic stress and not in species that do not adapt.

      Interestingly, this analysis identified nearly all of the previously known genes involved in intergenerational adaptations to these stresses including rhy-1, cysl-1, cysl-2 and gpdh-1. Thus, we predict the remaining 19 genes that came out of this analysis are highly likely to be involved in the responses to these stresses. Furthermore, in the revised text we highlight that our new list of 19 genes includes multiple conserved factors that are required for animal viability including genes involved in nuclear transport (imb-1 and xpo-2), the CDC25 phosphatase ortholog cdc-25.1, and the PTEN tumor suppressor ortholog daf-18. This new analysis will likely form the basis for future investigations into the mechanisms underlying these exciting intergenerational effects.

      We believe this additional analysis greatly improves this manuscript. We are also happy to include any specific additional analysis the reviewer would like to see.

      The top response genes that are directly tested have been discovered before. Hence, while interesting patterns are evident from the data, this work largely confirms prior work, including that described in Burton et al. 2020.

      We have revised the text to highlight that the aims of this particular study were to determine if multigenerational responses to stress were evolutionarily conserved at any level, as well as to determine the potential costs of such effects and the specificity of the responses. Questions that were not addressed in any previous study of multigenerational effects, including Burton et al., 2020. Because of the aims of this study we believe it was critical to focus on genes that had an established role in these intergenerational responses in C. elegans and to compare and contrast the behavior and requirement of these genes in intergenerational responses in other species. (Although we note that this newly revised manuscript we have now also reported 19 new top response genes – see above).

      In addition to our original goals, in this study we were able to determine the extent to which intergenerational transcriptional responses are conserved and the extent to which intergenerational transcriptional changes persist transgenerationally (which we find to be effectively not at all using our revised stricter analysis). We believe these findings are not only novel, but perhaps will be surprising to much of the intergenerational and transgenerational field and have a major impact on both how multigenerational studies are interpreted and how they are conducted in the future. This is especially the case for studies in C. elegans which is one of the leading model organisms to study the mechanisms underlying both intergenerational and transgenerational responses to stress.

      For example, we note that several landmark studies of transgenerational effects (persisting into F3 or later generations) in C. elegans performed RNA-seq on F1 progeny (For example, Moore et al., Cell 2019 or Ma et al., Nature Cell Biology 2019). Our new findings reported here suggest that it is possible that none of the transcriptional effects detected in F1 animals will persist in F3 progeny. Furthermore, our studies demonstrate the importance of comparing C. elegans transcriptional effects to related Caenorhabditis species as we found that only a subset of the effects detected in C. elegans are conserved in any other Caenorhabditis species. (Such comparisons are important for determining if and to what extent observations of intergenerational and/or transgenerational effects observed in C. elegans represent conserved phenomena).

      For all of these reasons we believe our data is highly exciting, will be of broad interest to the field, and represent novel and potentially unexpected findings that were not previously reported in any prior work including Burton et al., 2020.

      Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      We thank the reviewer for these comments. We’d like to briefly highlight that P. vranovensis was also shown to elicit the same transgenerational effects as P. aeruginosa in the bioRxiv version of the same papers that reported transgenerational effects for P. aeruginosa (Kaletsky et al., 2020 – GRb0427 is an isolate of P. vranovensis).

      It is not clear to us why this result was not included in the final published version of this manuscript, but we in fact used P. vranovensis for these studies in part because of this bioRxiv paper and because we failed to detect any robust intergenerational effects using P. aeruginosa PA14 in any of our assays – including at the RNA-seq level (unpublished).

      Nonetheless, we have since confirmed with Coleen Murphy’s lab that they do find P. vranovensis elicits the same transgenerational effect on behaviour as P. aeruginosa. We expect that future investigations into the conditions under which P. vranovensis elicits effects that are lost/erased after 1 generation and the conditions under which effects might be maintained for more than 3 generations will be highly interesting.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      We thank Reviewer 2 for their excitement and we agree that these findings were highly exciting.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      We agree with reviewer 2 that our findings suggest that intergenerational effects are common and transgenerational effects are either rare in comparison or only occur under specific conditions. We have updated the text to include this interpretation.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      We agree and, similar to above, have updated the text accordingly to state that it is also very possible that transgenerational effects only occur under certain conditions.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      We thank reviewer 2 for these comments and agree that phenotypic investigations of F3 effects are also very interesting.

      We have previously investigated the phenotypic effects of all of the stresses used in this paper on F3 animals using the assays described here and consistent with our new gene expression findings we previously found that most of these stresses do not exert phenotypic effects in F3 animals (Burton et al. 2020, Willis et al 2021, Hibshman et al., 2016).

      Separately, we have also attempted to investigate the effects of pathogen exposure on pathogen avoidance, as these effects have previously been reported to occur transgenerationally, but to date have been unable to consistently replicate these findings. We expect that this is likely due to what might be subtle differences in conditions between labs (differences in water used for the media prep, air humidity, potential differences in N2 wild-type strains etc….) because assays such as behavioral avoidance are known to be very sensitive to many different environmental inputs.

      We currently believe that our experiences as they relate to intergenerational and transgenerational effects support the general conclusion of this manuscript that while intergenerational effects are common and easy to initiate across multiple labs (the intergenerational effects studied here have now been successfully reproduced in labs in the US, UK, and Canada), transgenerational effects might be more specific and/or only occur/be initiated under more stringent conditions – perhaps with the aim of avoiding the costs of such multigenerational effects.

      Future studies of exactly when/under what conditions C. elegans initiates intergenerational vs transgenerational effects is likely to be very interesting.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      We completely agree with Reviewer 2 and have indeed attempted these experiments both in Burton et al., 2020 and in unpublished results.

      With regards to the transgenerational F3 effects, as mentioned above, P. vranovensis has been reported to elicit the same transgenerational effect as P. aeruginosa PA14 – at least as reported in the Kaletsky et al., 2020 bioRxiv version of the manuscript from the same studies. (GRb0427 is an isolate of P. vranovensis).

      To date, however, in our laboratory we have been unable to detect any transgenerational effects for either P. vranovensis or P. aeruginosa infection on gene expression data from RNA-seq experiments (data from this manuscript and unpublished data).

      It is not yet clear why this is the case, but we note that the RNA-seq analysis from the transgenerational PA14 studies (published in Moore et al., Cell 2019) was performed on F1 animals and thus was looking at intergenerational effects – to our knowledge no RNA-seq on F3 progeny from animals exposed to PA14 has ever been published. Thus, as it stands there is no existing F3 gene expression studies done using PA14 for us to compare our results to, but it remains possible that PA14 does not elicit specific effects on F3 gene expression when analyzed by RNA-seq.

      For F1 effects we have published a gene expression comparison for P. vranovensis and P. aeruginosa F1 effects in a previous manuscript (Burton et al 2020) and will add a mention of this to the text. Briefly, we detected very few F1 effects on gene expression when exposing adults to P. aeruginosa for 24 hours and parental infection by P. aeruginosa did not result in protection for offspring from P. vranovensis infection (Burton et al., 2020). We concluded that the intergenerational adaptation to P. vranovensis was not initiated by P. aeruginosa and was at least somewhat specific to P. vranovensis as well as the new species of Pseudomonas described in this manuscript which does cross protect.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work.

      Excellent work and I think it will generate a lot of interest in the community, definitely want to see it published in eLife.

      We agree with Reviewer 2 and thank them for their kind comments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors address whether the mechanisms mediating intergenerational effects are conserved in evolution. This question is important not only to frame this phenomenon in an evolutionary context, but to address several interlinked questions: is there a mechanism in common between adaptive versus deleterious effects? What makes some effects last one instead of several generations? What is the ecological relevance for those mechanisms? Using Caenorhabditis elegans as a model of reference, they compare four types of intergenerational effects on additional three Caenorhabditis species.

      The authors used previously characterized models of intergenerational inheritance, focusing on those that are likely to have adaptive significance. This is relevant, because the adaptive relevance of other published examples of inter- and transgenerational inheritance is not clear. They used functional studies to probe for conservation of mechanisms for bacterial infection and resistance to osmolarity stress, which is a major strength of this study. The data supports the claim of conservation in some types of intergenerational inheritance and divergence in others. One major question addressed in this manuscript is whether there is a potential overarching mechanism that confers stress-resistance across generations. Their experiments convincingly show that this is not the case, but that instead, there are stress-specific mechanisms responsible for intergenerational inheritance.

      We agree and thank Reviewer 3 for their kind comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The relationship between genetic disease and adaptation is important for biomedical research as well as understanding human evolution. This topic has received considerable attention over the past several decades in human genetics research. The present manuscript provides a much more comprehensive and rigorous analysis of this topic. Specifically, the authors select a set of ~4000 human Mendelian disease genes and examine patterns of recent positive selection in these genes using the iHS and nSL tests (both haplotype test) for selection. They then compare the signals of sweeps to control genes. Importantly, they match the control set to the disease genes based upon many different genomic variables, such as recombination rate, amount of background selection, expression level, etc. The authors find that there is a deficit of selective sweeps in disease genes. They test several hypotheses for this deficit. They find that the deficit of sweeps is stronger in disease genes at low recombination rate and those that have more disease mutations. From this, the authors conclude that strongly deleterious mutations could be impeding selective sweeps.

      Strengths

      The manuscript includes a number of important strengths:

      1) It tackles an important question in the field. The question of selection in disease genes has been very well-studied in the past, with conflicting viewpoints. The present study examines this topic in a rigorous way and finds a deficit of sweeps in disease genes.

      2) The statistical analyses are rigorously done. The genome is a confusing place and there can often be many reasons why a certain set of genes could differ from another set of genes, unrelated to the variable of interest. Di et al. carefully match on these genomic confounders. Thus, they rigorously demonstrate that sweeps are depleted in disease genes relative to control genes. Further, the pipeline for ranking the genes and testing for significance is solid.

      3) The Introduction of the manuscript nicely relates different evolutionary models and explanations to patterns that could be seen in the data. As such, the present manuscript isn't just merely an exploratory analysis of patterns of sweeps in disease genes. Rather, it tests specific evolutionary scenarios.

      Weaknesses

      1) The authors did not discuss or test a basic explanation for the deficit of sweeps in disease genes. Namely, certain types of genes, when mutated, give rise to strong Mendelian phenotypes. However, mutations in these genes do not result in variation that gives rise to a phenotype on which positive selection could occur. In other words, there are just different types of genes underlying disease and positive selection. I could think that such a pattern would be possible if humans are close to the fitness optimum and strong effect mutations (like those in Mendelian disease genes) result in moving further away from the fitness optimum. On the other hand, more weak effect mutations could be either weakly deleterious or beneficial and subject to positive selection. I'm not sure whether these patterns would necessarily be captured by the overall measures of constraint which the disease and non-disease genes were matched on.

      We thank the reviewer for suggesting that alternative explanation. It is indeed important that we compare it with our own explanation. To rephrase the reviewer’s suggestion, it is possible that disease genes may just have a different distribution of fitness effects of new mutations. Specifically, mutations in disease genes might have such large effects that they will consistently overshoot the fitness optimum, and thus not get closer to this optimum. This would prevent them from being positively selected. Two predictions can be derived from this potential scenario. First, we can predict a sweep deficit at disease genes, which is what we report. Second, we can also predict that disease genes should exhibit a deficit of older adaptation, not just recent adaptation detected by sweep signals. Indeed, the decrease in adaptation due to (too) large effect mutations would be a generic, intrinsic feature of disease genes regardless of evolutionary time. This means that under this explanation, we expect a test of long-term adaptation such as the McDonald-Kreitman test to also show a deficit at disease genes.

      This latter prediction differs from the prediction made by our favored explanation of interference between deleterious and advantageous variants. In this scenario, the sweep deficit at disease genes is caused by the presence of deleterious, and most importantly currently segregating disease variants. Because the presence of the segregating variants is transient during evolution, our explanation does not predict a deficit of long-term adaptation. We can therefore distinguish which explanation (the reviewer’s or ours) is the most likely based on the presence or absence of a long-term adaptation deficit at disease genes.

      To test this, we now compare protein adaptation in disease and control genes with two versions of the MK test called ABC-MK and GRAPES (refs). ABC-MK estimates the overall rate of adaptation, and also the rates of weak and strong adaptation,and is based on Approximate Bayesian Computation. GRAPES is based on maximum likelihood. Both ABC-MK and GRPES have shown to provide robust estimates of the rate of protein adaptation thanks to evaluations with forward population simulations (refs). We find no difference in long-term adaptation between disease and control non-disease genes, as shown in new figure 4. This shows that the explanation put forward by the reviewer of an intrinsically different distribution of mutation effects at disease genes is less likely than an interference between currently segregating deleterious variants with recent, but not with older long-term adaptation. We even show in the new figure 4 that disease genes and their controls have more, not less strong long-term adaptation compared to the whole human genome baseline (new figure 4C). Also, disease genes in low recombination regions and with many disease variants have experienced more, not less strong long-term adaptation than their controls. Therefore, far from overshooting the fitness optimum due to stronger fitness effects of mutations, it looks like that these stronger fitness effects might in fact be more frequently positively selected in these disease genes.

      We now provide these new results P15L418:<br /> “Disease genes do not experience constitutively less long-term adaptive mutations<br /> A deficit of strong recent adaptation (strong enough to affect iHS or 𝑛𝑆!) raises the question of what creates the sweep deficit at disease genes. As already discussed, purifying selection and other confounding factors are matched between disease genes and their controls, which excludes that these factors alone could possibly explain the sweep deficit. Purifying selection alone in particular cannot explain this result, since we find evidence that it is well matched between disease and control genes (Figures 2 and Figure 4-figure supplement 1). Furthermore, we find that the 1,000 genes in the genome with the highest density of conserved elements do not exhibit any sweep deficit (bootstrap test + block-randomized genomes FPR=0.18; Methods). Association with mendelian diseases, rather than a generally elevated level of selective constraint, is therefore what matters to observe a sweep deficit. What then might explain the sweep deficit at disease genes?

      As mentioned in the introduction, it could be that mendelian disease genes experience constitutively less adaptive mutations. This could be the case for example because mendelian disease genes tend to be more pleiotropic (Otto, 2004), and/or because new mutations in mendelian are large effect mutations (Quintana-Murci, 2016) that tend to often overshoot the fitness optimum, and cannot be positively selected as a result. Regardless of the underlying processes, a constitutive tendency to experience less adaptive mutations predicts not only a deficit of recent adaptation, but also a deficit of more long-term adaptation during evolution. The iHS and nSL signals of recent adaptation we use to detect sweeps correspond to a time window of at most 50,000 years, since these statistics have very little statistical power to detect older adaptation (Sabeti et al., 2006). In contrast, approaches such as the McDonald-Kreitman test (MK test) (McDonald and Kreitman, 1991) capture the cumulative signals of adaptative events since humans and chimpanzee had a common ancestor, likely more than six million years ago. To test whether mendelian disease genes have also experienced less long-term adaptation, in addition to less recent adaptation, we use the MK tests ABC-MK (Uricchio et al., 2019) and GRAPES (Galtier, 2016) to compare the rate of protein adaptation (advantageous amino acid changes) in mendelian disease gene coding sequences, compared to confounding factors-matched non-disease controls (Methods). We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes that are stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes. If disease genes have not experienced less adaptive mutations during long-term evolution, then the process at work during more recent human evolution has to be transient, and has to has to have limited only recent adaptation. It is also noteworthy that both disease genes and their controls have experienced more coding adaptation than genes in the human genome overall (Figure 5A), especially more strong adaptation according to ABC-MK (Figure 5C). The fact that the baseline long-term coding adaptation is lower genome-wide, but similarly higher in disease and their control genes, also shows that the matched controls do play their intended role of accounting for confounding factors likely to affect adaptation. The fact that long-term protein adaptation is not lower at disease genes also excludes that purifying selection alone can explain the sweep deficit at disease genes, because purifying selection would then also have decreased long-term adaptation. A more transient evolutionary process is thus more likely to explain our results.”

      Then P22L613: “More importantly, the fact that constitutively less adaptation at disease genes combined to more power to detect sweeps in low recombination regions does not explain our results, is made even clearer by the fact that disease genes in low recombination regions and with many disease variants have in fact experienced more, not less long-term adaptation according to an MK analysis using both ABC-MK and GRAPES (Figure 5F,G,H,I,J). ABC-MK in particular finds that there is a significant excess of long-term strong adaptation (Figure 4H, P<0.01) in disease genes with low recombination and with many disease variants, compared to controls, but similar amounts of weak adaptation (Figure 5G, P=0.16). It might be that disease genes with many disease variants are genes with more mutations with stronger effects that can generate stronger positive selection. The potentially higher supply of strongly advantageous variants at these disease genes makes it all the more notable that they have a very strong sweep deficit in recent evolutionary times. This further strengthens the evidence in favor of interference during recent human adaptation: the limiting factor does not seem to be the supply of strongly advantageous variants, but instead the ability of these variants to have generated sweeps recently by rising fast enough in frequency.”

      2) While I think the authors did a superb job of controlling for genome differences between disease and non-disease genes, the analysis of separating regions by recombination rate and number of disease mutations does not seem as rigorous. Specifically, the authors tested for enrichment of sweeps in disease genes vs control and then stratified that comparison by recombination rate and/or number of disease mutations. While this nicely matches the disease genes to the control genes, it is not clear whether the high recombination rate genes differ in other important attributes from the low recombination rate genes. Thus, I worry whether there could be a confounder that makes it easier/harder to detect an enrichment/deficit of sweeps in regions of low/high recombination.

      We thank the reviewer for emphasizing the need for more controls when comparing our results in low or high recombination regions. We have now compared the confounding factors between low recombination disease genes and high recombination disease genes, as classified in the manuscript. As shown in new supp table Figure 6 figure supplement 1, confounding factors do not differ substantially between low and high recombination disease genes, and are all within a range of +/- 25% of each other. It would take a larger difference for any confounding factor to explain the sharp sweep deficit difference observed between the low and high recombination disease genes. The only factor with a 35% difference between low and high recombination mendelian disease genes is McVicker’s B, but this is completely expected; B is expected to be lower in low recombination regions.

      We now write P20L569: “Further note that only moderate differences in confounding factors between low and high recombination mendelian disease genes are unlikely to explain the sweep deficit difference (Figure 6-figure supplement 1).”

      Regarding the potential confounding effect of statistical power to detect sweeps differing in low and high recombination regions, please see our earlier response to main point 2.

      Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      We want to apologize profusely for this avoidable mistake. We have now made it clearer from the very start of the manuscript that we focus on mendelian non-infectious disease genes. We have modified the title and the abstract accordingly, specifying mendelian and non-infectious as required.

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      We thank the reviewer for their recommendation. We should have written more about what is currently well known or unknown about recent adaptation in disease genes, and in more nuanced terms. Instead of writing “Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution”, we now write in the new abstract:

      “Despite our expanding knowledge of gene-disease associations, and despite the medical importance of disease genes, their recent evolution has not been thoroughly studied across diverse human populations. In particular, recent genomic adaptation at disease genes has not been characterized as well as long-term purifying selection and long-term adaptation. Understanding the relationship between disease and adaptation at the gene level in the human genome is hampered by the fact that we don’t know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during the last ~50,000 years of recent human evolution.”

      We also toned down the start of the introduction. We now write P3L74:

      “Despite our expanding knowledge of mendelian disease gene associations, and despite the fact that multiple evolutionary processes might connect disease and genomic adaptation at the gene level, these connections are yet to be studied more thoroughly, especially in the case of recent genomic adaptation.”

      Although we agree that others have made extensive efforts to characterize older adaptation or purifying selection at disease genes compared to non-disease genes, we still believe that our results are novel and more conclusive about recent positive selection. Our initial statement was however poorly phrased. To our knowledge, our study is the first to look at the issue using specifically sweep statistics that have been shown to be robust to background selection, while also controlling for confounding factors. These sweep statistics have sensitivity for selection events that occurred in the past 30,000 or at most 50,000 years of human evolution (Sabeti et al. 2006). This is a very different time scale compared to the millions of years of adaptation (since divergence between humans and chimpanzees) captured by MK approaches.

      We also want to note that we did cite the Blekhman et al. paper for their result of stronger purifying selection in our initial manuscript. It is true however that we did not specify mendelian disease genes, which was confusing. We want to apologize again for it:

      From the earlier manuscript: “Multiple recent studies comparing evolutionary patterns between human disease and non-disease genes have found that disease genes are more constrained and evolve more slowly (lower ratio of nonsynonymous to synonymous substitution rate, dN/dS, in disease genes) (Blekhman et al., 2008; Park et al., 2012; Spataro et al., 2017)”

      “Among other confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008)”

      It is important to remember that, as we mention in the introduction, previous comparisons did not take potential confounding factors at all into account. It is therefore unclear whether their conclusions were specific to disease genes, or due to confounding factors. We have now made this point clearer in the introduction, as we believe that we have made a substantial effort to control for confounding factors, and that it is a substantial departure from previous efforts:

      P7L201: “In contrast with previous studies, we systematically control for a large number of confounding factors when comparing recent adaptation in human mendelian disease and nondisease genes, including evolutionary constraint, mutation rate, recombination rate, the proportion of immune or virus-interacting genes, etc. (please refer to Methods for a full list of the confounding factors included).”.

      P9L253: “These differences between disease and non-disease genes highlight the need to compare disease genes with control non-disease genes with similar levels of selective constraint. To do this and compare sweeps in mendelian disease genes and non-disease genes that are similar in ways other than being associated with mendelian disease (as described in the Results below, Less sweeps at mendelian disease genes), we use sets of control non-disease genes that are built by a bootstrap test to match the disease genes in terms of confounding factors (Methods)”.

      Furthermore, we have now added a comparison of older adaptation in disease and non-disease genes using a recent version of the MK test called ABC-MK, that can take background selection and other biases such as segregating weakly advantageous variants into account. Also controlling for confounding factors, we find no difference in older adaptation between disease and non-disease genes (please see our response to main point 2).

      Therefore, contrary to the reviewer’s claim that the sweep statistics and MK approaches should have substantial overlap, we now show that it is clearly not the case. We further show that the lack of overlap is expected under our explanation of our results based on interference between recessive deleterious and advantageous variants (see our responses to main point 1 and to reviewer 1 weakness 1).

      Previous analyses were using much smaller mendelian disease gene datasets, less recent polymorphism datasets and, critically, did not control for confounding factors. We also note that reference 3 (Torgerson et al. Plos Genetics 2009) does not make any claim about recent positive selection in mendelian disease genes compared to other genes. Their dataset at the time also only included 666 mendelian disease genes, versus the ~4,000 currently known.

      In short, we do think that we have a claim for novelty, but the reviewer is entirely right that we did a poor job of giving due credit to previous important work. These previous studies deserved much better credit than no credit at all. We want to thank the reviewer from avoiding us the embarrassment of not citing important work.

      We now cite the papers referenced by the reviewer as appropriate in the introduction, based on the scope of their results:

      P3L93: “Multiple recent studies comparing evolutionary patterns between human mendelian disease and non-disease genes have found that mendelian disease genes are more constrained and evolve more slowly (Blekhman et al., 2008; Quintana-Murci, 2016; Spataro et al., 2017; Torgerson et al., 2009). An older comparison by Smith and Eyre-Walker (Smith and Eyre-Walker, 2003) found that disease genes evolve faster than non-disease genes, but we note that the sample of disease genes used at the time was very limited.”

      P5L134 “Among possible confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that mendelian disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008; Spataro et al., 2017; Torgerson et al., 2009),”

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      We want to thank the reviewer profusely for putting us on the right track thanks to their insightful suggestion. As described in our response to reviewer 1 weakness 1, we have now shown with simulations that the interference of deleterious variants on advantageous variants is strongly decreased during a bottleneck of a magnitude similar to the Out of Africa bottlenecks experienced by East Asian and European populations. This decrease of interference is likely strong enough to not require any other explanation, even if other processes may also be at work, such as a decrease of the sweeps signals as suggested by the reviewer.

      About the Granka et al. paper, the last author of the current manuscript has already shown in a previous paper (ref) that the type of approaches used to quantify recent adaptation is likely to be severely underpowered due to a number of confounding factors, notably including comparing genic and non-genic windows that are not sufficiently far from each other to not overlap the same sweep signals. Our result are also based on much more recent and less biased sets of SNPs used to measure the sweeps statistics.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      Based on the reviewer’s comment, we have now better explained why our results are unlikely to be a generic property of purifying selection alone. As we explain in our response to main point 3, our results cannot be explained by purifying selection alone, because we match purifying selection between disease genes and the controls. Indeed, we now show with additional MK analyses and GERP-based analyses that our controls for confounding factors already account for purifying selection. This is shown by the fact that disease genes and their controls have similar distributions of deleterious fitness effects.

      In addition, we added a comparison that shows that purifying selection alone does not explain our results. Instead of comparing sweeps at disease and non-disease genes, we compared sweeps (in Africa) between the 1,000 genes with the highest density of conserved, constrained elements and other genes in the genome. If purifying selection is the factor that drives the sweep deficit at disease genes, then we should see a sweep deficit among the genes with the most conserved, constrained elements compared to other genes in the genome. However, we see no such sweep deficit at genes with a high density of conserved, selectively constrained elements (boostrap test + block randomization of genomes, FPR=0.18). See P15L424. Note that for this comparison we had to remove the matching of confounding factors corresponding to functional and purifying selection densities (new Methods P40L1131).

      Again, our results are better explained not just by purifying selection alone, but more specifically by the presence of interfering, segregating deleterious variants. It is perfectly possible to have highly constrained parts of the genome without having many deleterious segregating variants at a given time in evolution.

      The similarity across MeSH classes can be readily explained if what matters is interference with deleterious segregating variants. Because all types of diseases have deleterious segregating variants, then it is not surprising that different MeSH disease categories have a similar sweep deficit. We make that point clearer in the revised manuscript:

      P26L707: “The sweep deficit is comparable across MeSH disease classes (Figure 8), suggesting that the evolutionary process at the origin of the sweep deficit is not diseasespecific. This is compatible with a non-disease specific explanation such as recessive deleterious variants interfering with adaptive variants, irrespective of the specific disease type.”.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      Based on our response to the previous point, it is clear that a high density of coding sequences, or conserved constrained sequence in general are not enough to explain our results. Furthermore, we want to remind the reviewer that we already control for coding sequence length through controlling for coding density, since we use windows of constant sizes.

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      We are sorry that we were not more explicit from the start of the manuscript. We now make it clearer what the set disease genes includes or not throughout the entire manuscript, by repeating that we focus specifically on mendelian, non-infectious disease genes. By noninfectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most virus-interacting genes since most of them are not associated at the genetic variant level with infectious diseases. It is also important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.

      We write P29L818: “By non-infectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most VIPs since most of them are not associated at the genetic variant level with infectious diseases. It is important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.”

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      We apologize for the lack of precision in this sentence. What we meant is that the recombination rates are not different enough that the mentioned hypothetical artifact would be able to explain our results. We also forgot to remind at this point in the manuscript that we match recombination between disease genes and controls. We now use more precise language:

      P28L772 “The recombination rate at disease genes is also only slightly different from the recombination rate at non-disease genes (Figure 1), and we match the recombination rate between disease genes and controls.”.

      Reviewer #3 (Public Review):

      In this paper, the authors ask whether selective sweeps (as measured by the iHS and nSL statistics) are more or less likely to occur in or near genes associated with Mendelian diseases ("disease genes") than those that are not ("non-disease genes"). The main result put forward by the authors is that genes associated with Mendelian diseases are depleted for sweep signatures, as measured by the iHS and nSL statistics, relative to those which are not.

      The evidence for this comes from an empirical randomization scheme to assess whether genes with signatures of a selective sweep are more likely to be Mendelian disease genes that not. The analysis relies on a somewhat complicated sliding threshold scheme that effectively acts to incorporate evidence from both genes with very large iHS/nSL values, as well as those with weaker signals, while upweighting the signal from those genes with the strongest iHS/nSL values. Although I think the anlaysis could be presented more clearly, it does seem like a better analysis than a simple outlier test, if for no other reason than that the sliding threshold scheme can be seen as a way of averaging over uncertainty in where one should set the threshold in an outlier test (along with some further averaging across the two different sweeps statistics, and the size of the window around disease associated genes that the sweep statistics are averaged over). That said, the particular approach to doing so is somewhat arbitrary, but it's not clear that there's a good way to avoid that.

      In addition to reporting that extreme values of iHS/nSL are generally less likely at Mendelian disease genes, the authors also report that this depletion is strongest in genes from low recombination regions, or which have >5 specific variants associated with disease.

      Drawing on this result, the authors read this evidence to imply that sweeps are generally impeded or slowed in the vicinity of genes associated with Mendelian diseases due to linkage to recessive deleterious variants, which hitchhike to high enough frequencies that the selection against homozygotes becomes an important form of interference. This phenomenon was theoretically characterized by Assaf et al 2015, who the authors point to for support. That such a phenomenon may be acting systematically to shape the process of adaptation is an interesting suggestions. It's a bit unclear to me why the authors specifically invoke recessive deleterious mutations as an explanation though. Presumably any form of interference could create the patterns they observe? This part of the paper is, as the authors acknowledge, speculative at this point.

      We thank the reviewer for their comments. We are sorry that we did not provide a clear explanation of why only recessive deleterious mutations are expected to interfere more than other types of deleterious variants. This was shown by Assaf et al. (2015), and we should have stated it explicitly. The reason why recessive deleterious variants interfere more than additive or dominant ones is that they can hitchhike together with an adaptive variant to substantial frequencies before negative selection actually happens, when a significant number of homozygous individuals for the deleterious mutation start happening in the population. On the contrary dominant mutations do not make it to the same high frequencies linked to an adaptive variant, because they start being selected negatively as soon as they appear in the population.

      We now write P18L496: “In diploid species including humans, recessive deleterious mutations specifically have been shown to have the ability to slow down, or even stop the frequency increase of advantageous mutations that they are linked with (Assaf et al., 2015). Dominant variants do not have the same interfering ability, because they do not increase in frequency in linkage with advantageous variants as much as recessive deleterious do, before the latter can be “seen” by purifying selection when enough homozygous individuals emerge in a population (Assaf et al., 2015).”

      We have also confirmed with SLiM forward simulations that recessive deleterious variants interfere with adaptive variants much more than dominant ones (Table 1).

      I'm also a bit concerned by the fact that the signal is only present in the African samples studied. The authors suggest that this is simply due to stronger drift in the history of European and Asian samples. This could be, but as a reader it's a bit frustrating to have to take this on faith.

      We thank the reviewer for pointing out this issue with our manuscript. We have now shown, as detailed above in our response to main point 1, reviewer 1 weakness 1, that a weaker sweep deficit at disease genes in Europe and East Asia is an expected feature under the interference explanation, due to the weakened interference of recessive deleterious variants during bottlenecks of the magnitude observed in Europe and East Asia. We therefore believe that these new results strengthen our previous claim regarding the role interference between deleterious and advantageous variants. We want to thank the reviewer for forcing us to examine the difference between results in Africa and out of Africa, as the manuscript is now more consistent and our results substantially better explained.

      There are other analyses that I don't find terribly convincing. For example, one of the anlayses shows that iHS signals are no less depleted at genes associated with >5 diseases than with 1 does little to convince me of anything. It's not particularly clear that # of associated disease for a given gene should predict the degree of pleiotropy experienced by a variant emerging in that gene with some kind of adaptive function. Failure to find any association here might just mean that this is not a particularly good measure of the relevant pleiotropy.

      We agree with the reviewer that the number of associated disease may not be a good measure of pleiotropy. Unfortunately to our knowledge there is currently no good measure of gene pleiotropy in human genomes. Given that the evidence in favor of interference of deleterious variants is now strengthened, we have chosen to remove this analysis from the manuscript. As we now explain throughout the manuscript, pleiotropy is an unlikely explanation in the first place because of the fact that disease genes have not experienced less long-term adaptation (see the details on our new MK test results in the response to main point 2).

      P16L447: “We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes.”.

      A last parting thought is that it's not clear to me that the authors have excluded the hypothesis that adaptive variants simply arise less often near genes associated with disease. The fact that the signal is strongest in regions of low recombination is meant to be evidence in favor of selective interference as the explanation, but it is also the regime in which sweeps should be easiest to detect, so it may be just that the analysis is best powered to detect a difference in sweep initiation, independent of possible interference dynamics, in that regime.

      We thank the reviewer for stating these important alternative explanations that needed more attention in our manuscript. In our response to main point 2 above, we explain that higher statistical power in low recombination regions is unlikely to explain our results alone, because we also show that the sweep deficit is substantially present not only in low recombination regions, but also requires the presence of a higher number of disease variants. We also describe in our response to main point 2 how our new MK-test results on long-term adaptation make it very unlikely that mendelian disease genes experience constitutively less adaptation. We want to thank the reviewer again for pointing out this issue with our manuscript, since it was indeed an important missing piece.

    1. Author Response

      Reviewer #2 (Public Review):

      (1) Much of the cited literature that is used to make the case for their hypothesis is very old and actually refers to active HIV infection and patient studies prior to ART. Also, the literature they cite regarding the role of H2S as an antimicrobial agent seem to be limited to tuberculosis infection.

      We have revised the list of literature and included more relevant references post- ART era. Recently, the antimicrobial role of H2S is comprehensively examined in the context of tuberculosis. Given the close association of TB with HIV, we thought our study is very timely and essential. However, we would like to point out that the references showing the effect of H2S on infection caused by respiratory viruses are included in the manuscript (7-9). Further, recent findings showing the influence of H2S in the context of SARS-CoV2 infection are also included in the revised manuscript

      (2) The choice of the latently infected model cell lines is rather unfortunate. There are much better defined models out there these days than J1.1 or U1 cells, such as the J-LAT cells from the Verdin lab or the various reporter cell lines generated by Levy and co-workers. In particularly, U1 cells should not be considered as latently infected, as the virus has a defect in the Tat/TAR axis and is mostly just transcriptionally attenuated. It is unclear why the authors only use J-LAT cells for one of the last experiments

      As suggested by the reviewer, we have generated new data using J-LAT cells in the revised manuscript. First, we confirmed that PMA-mediated HIV-1 reactivation in J-LAT cells is associated with the down-regulation of cbs, cth, and mpst transcripts (Figure 1-figure supplement 1C-D in the revised manuscript). Additionally, we have performed several other mechanistic experiments in J-LAT cells to validate the data generated in U1 (see below response to # 3).

      (3) It is further unclear why the authors perform most of the experiments using U1 cells, which are considered promonocytic, but in the end seek to demonstrate the influence of H2S on latent HIV-1 infection in CD4 T cells. Performing all experiments in J1.1 or better J-LAT cells would have seemed more intuitive.

      The choice of U1 was based on our earlier studies showing that U1 cells uniformly recapitulate the association of redox-based mechanisms and mitochondrial bioenergetics with HIV-latency and reactivation (10-12). We have validated key findings of U1 cells in J1.1 and J-Lat cell lines. We genetically and chemically silenced the expression of CTH in J-Lat cells and examined the effect on HIV-1 reactivation. Consistent with U1 and J1.1, genetic silencing of CTH using CTH-specific shRNA (shCTH) reactivated HIV-1 in J-Lat (Figure 2-figure supplement 1F-G in the revised manuscript). Supporting this, pre-treatment of J-Lat with non-toxic concentrations of a well-established CTH inhibitor, propargylglycine (PAG) further stimulated PMA-induced HIV-1 reactivation (Figure 2-figure supplement 1H-I in the revised manuscript). Altogether, using various cell line models of HIV-1 latency, we confirmed that endogenous H2S biogenesis counteracts HIV-1 reactivation.

      (4) The authors suggest that H2S production would control latent HIV-1 infection and reactivation. Regarding the idea that CBS, CTH or possibly MPST would control latent infection as a function of their ability to produce H2S from different sources, there are several questions. First, if H2S is the primary factor, why would the presence of e.g. MPST not compensate for the reduction of CTH? Second, why would J1.1 and U1 cells both host latent HIV-1 infection events, however, their CBS/CTH/MPST composition is completely different? Third, natural variations in CTH expression caused by culture over time are larger than variations caused by PMA activation.

      These questions are important and complex. CBS, CTH, and MPST produce H2S in the sulfur network. CBS and CTH reside in the cytoplasm, whereas MPST is mainly involved in cysteine catabolism and is mitochondrial localized. The lack of compensation of CTH by MPST could be due to the compartmentalization of their activities. Furthermore, CTH and CBS activities are regulated by diverse metabolites, including heme, S-adenosyl methionine (SAM), and nitric oxide/carbon monoxide (NO/CO). In contrast, MPST activity responds to cysteine availability. How substrates/cofactors availability and enzyme choices are regulated in the cellular milieu of J1.1 and U1 is an interesting question for future experimentation.

      Moreover, the tissue-specific expression/activity of CBS and CTH dictates their relative contributions in H2S biogenesis and cellular physiology (13). Some of these factors are likely responsible for differential expression of CBS, CTH, and MPST in J1.1 and U1 cells. Regardless of these concerns, viral reactivation uniformly reduces the expression of CTH in U1, J1.1, and J-Lat. While we cannot completely rule out natural variations in CTH expression over prolonged culturing, in our experimental setup CTH remained stably expressed and consistently showed down-regulation upon PMA treatment as compared to untreated conditions.

      (5) Also, the statement that H2S production as exerted per loss of CTH would control reactivation is not supported by the kinetic data. In latently HIV-1 infected T cell lines or monocytic cell lines, PMA-mediated HIV-1 reactivation at the protein level is usually almost complete after 24 hours, but at this time point the difference between e.g. CTH levels only begins to appear in U1 cells. The data for J1.1. are even less convincing.

      We have performed the kinetics of p24 production and CTH in U1 cells. We showed that the levels of p24 gradually increased from 6 h and kept on increasing till the last time point, i.e., 36 h post-PMA-treatment (Fig. 2D in the revised manuscript). The p24 ELISA detected a similar kinetics of p24 increase in the cell supernatant (Fig. 2E in the revised manuscript). The CTH levels show reduction at 24 h and 36 h. Based on these data, we report that HIV-1 reactivation is associated with diminished biogenesis of endogenous H2S. We have not made any claims that depletion of CTH precedes HIV reactivation. However, our CTH knockdown data clearly showed that diminished expression of CTH reactivates HIV-1 in the absence of PMA, which is consistent with our hypothesis that H2S production is likely to be a critical host component for maintaining viral latency.

      (6) Figure 2F. PMA is known to induce an oxidative stress response, however, in the experiments the data suggest that PMA results in a downregulated oxidative stress response. Maybe the authors could explain this discrepancy with the literature. In fact, both shRNA transductions, scr and CTH-specific seem to result in a lower PMA response.

      In our experiment, PMA treatment for 24 h results in down-regulation of oxidative stress genes. However, the effect of PMA on the oxidative stress responsive genes is time-dependent. In our earlier publication, we showed that 12 h PMA treatment induces oxidative stress responsive genes in U1 cells (12), whereas at 24 h, the expression of genes is down-regulated (10). Genetic silencing of CTH resulted in elevated mitochondrial ROS and GSH imbalance, which is in line with a further decrease in the expression of oxidative stress responsive genes as compared to PMA alone. As a consequence, PMA-treatment of U1-shCTH induced HIV-1 reactivation, which supersedes that stimulated by PMA or shCTH alone.

      (7) Given that the others in subsequent experiments use GYY4137, which is supposed to mimic the increased release of H2S, the authors should have definitely included experiments in which they would overexpress CTH, e.g. by retroviral transduction. Specifically in U1 cells, which seemingly do not express CBS, overexpression of CBS should also result in a suppressed phenotype

      We have explored the role of elevated H2S levels using GY44137. Treatment with GYY4137 suppressed HIV reactivation in multiple cell lines and primary CD4+ T cells. As suggested by the reviewer, overexpression of CTH could be another strategy to validate these findings. However, since the transsulfuration pathway and active methyl cycle are interconnected and share metabolic intermediates (e.g., homocysteine), overexpression of CTH could disturb this balance and may lead to metabolic paralysis. Owing to these potential limitations, we used a slow releasing H2S donor (GYY4137) to chemically complement CTH deficiency during HIV reactivation. We thank the reviewer for this comment.

      (8) Figure 4F: The authors need to explain how they can measure a 4-fold gag RNA expression change in untreated cells. Also, according to Figure 4A, 300 µM GYY produces much less H2S than 5mM, yet the suppressive effect of 300 µM GYY is much higher?

      The four-fold-expression in untreated cells is likely due to leaky control of viral transcription in J1.1 cells (14-16). However, to avoid confusion, we have replotted the results by normalizing the data generated upon PMA mediated HIV reactivation with the PMA untreated cells in the revised manuscript (Figure 4F in the revised manuscript). The suppressive effect of GYY4137 at the lower concentration is intriguing but consistent with the findings that high and low concentrations of H2S have profound and distinct effects on cellular physiology (3,17). One possibility is that the high concentration of H2S induces mitochondrial sulfide oxidation pathway to avert toxicity. This might modulate mitochondrial activity and ROS, resulting in the suppression of GYY4137 effect. Consistent with this, higher concentrations of H2S have been shown to cause pro-oxidant effects, DNA damage and genotoxicity (3,18). We have discussed these possibilities in the revised manuscript

      (9) Initially, the authors argue "that the depletion of CTH could contribute to redox imbalance and mitochondrial dysfunction to promote HIV-1 reactivation"(p. 9). Less CTH would suggest less produced H2S. However, later on in the manuscript they demonstrate that addition of a H2S source (GYY4137) results in the suppression of HIV-1 replication and supposedly HIV-1 reactivation. This is somewhat confusing.

      We show that depletion of endogenous H2S by diminished expression of CTH (U1-shCTH) resulted in higher mitochondrial ROS and GSH/GSSG imbalance. Both of these alterations are known to reactivate HIV-1 and promote replication (10,11,19). The addition of GYY4137 chemically compensated for the diminished expression of CTH, and prevented HIV-1 reactivation in U1-shCTH. These events are expected to suppress HIV-1 replication and reactivation. We have made this distinction clear in the revised manuscript.

      (10) CTH, or for that matter CBS or MPST do not only produce H2S, however, they also are part of other metabolic pathways. It would have been interesting and important to study how these metabolic pathways were affected by the genetic manipulations and also how the increased presence of H2S (GYY4137) would affect the metabolic activity of these enzymes or their expression.

      We fully agree with the reviewer. In fact, our NanoString data show that upon CTH knockdown (U1-shCTH), MPST levels were down-regulated and CBS remained undetectable (Fig. 2F in the revised manuscript). Additionally, GYY4137 treatment induced the expression of CTH but not MPST upon PMA addition (Fig. 5A in the revised manuscript). We have incorporated these findings in the revised manuscript. Given that CBS and CTH catalyzed at least eight H2S generating steps and two cysteine-producing reactions, the modulation of CTH by HIV is likely to have a widespread influence on transsulfuration pathway and active methyl cycle intermediates. Our future strategies are to generate a comprehensive understanding of sulfur metabolism underlying HIV latency and reactivation. These experiments require multiple biochemical and genetic technologies with appropriate controls. We hope that the reviewer would agree with our views that these experiments should be a part of future investigation. We thank the reviewer for this comment.

      (11) H2S has been reported to cause NFkB inhibition by sulfhydration of p65; as such, the findings here are not particularly novel or surprising. Also, H2S induced sulfhydration is rather not targeted to a specific protein, let alone a HIV protein, making this approach a very unlikely alternative to current ART forms.

      We believe that NF-kB inhibition is not the only mechanism by which H2S exerts its influence on HIV latency. Recent studies point towards the importance of the Nrf2-Keap1 axis in sustaining HIV-latency (20). Our data suggest an important role for Nrf2-Keap1 signaling in mediating the influence of H2S on HIV latency. Additionally, recruitment of an epigenetic silencer YY1 is also affected by H2S. Interestingly, YY1 activity is modulated by redox signaling (21), suggesting H2S could be an important regulator of YY1 activity in HIV-infected cells. We have so far, no evidence for viral proteins targeted by H2S. However, experiments to examine global S-persulfidation of host and HIV protein are ongoing in the laboratory to fill this knowledge gap. Lastly, our findings raise the possibility of exploring H2S donors with the current ART (not as an alternate to ART) for reducing virus reactivation. We have tone down the clinical relevance of our findings.

      (12) The description of the primary T cell model used to generate the data in Figure 6 is slightly misleading. Also, the idea of this model was originally to demonstrate that "block and lock" by didehydro-cortistatin is possible. In this application, the authors did not investigate whether GYY4137 would actually induce a HIV "block and lock" over an extended period of time.

      As suggested by the reviewer, we have cited the didehydro-cortistatin studies as the basis of our strategy. Our idea was to adapt the primary T cell model to begin understanding the role of H2S in blocking HIV rebound. Our results indicate the future possibility of investigating GYY4137 to lock HIV in deep latency for an extended period of time. However, comprehensive investigation would require long-term experiments and samples from multiple HIV subjects. In the current pandemic times with overburdened Indian clinical settings, we cannot plan these experiments. However, we hope our data form a solid foundation for HIV researchers to perform extended “block and lock” studies using H2S donors.

      (13) However, the authors never provide evidence that endogenous H2S is altered in latently HIV-1 infected cells (which may actually be an impossible task). By the end of the manuscript, the authors have not provided clear evidence that the effects of e.g. CTH deletion would be mediated by the production of H2S, and not by another function of the enzyme. Similarly, the inability of stimuli to trigger efficient HIV-1 reactivation following the provision of unnaturally high levels of H2S is not surprising given reports on the effect of GYY4137 as anti-inflammatory agent and suppressor NF-kB activation. Unless the authors were to demonstrate a true "block and lock" effect by GYY4137 the data will likely have limited impact on the HIV cure field.

      It's difficult to measure H2S levels in the latently infected primary cells due to the assay's sensitivity and the insufficient number of cells latently infected with HIV-1. However, in the revised manuscript we have clearly shown that cysteine levels are not affected by CTH depletion and cysteine deprivation does not reactivate HIV-1. These results indicate that the effects of CTH depletion are likely mediated by H2S. This is consistent with our data showing that GYY4137 specifically complement CTH deficiency and blocks HIV-1 reactivation in U1-shCTH. Further, we carried in-depth investigation to show that the effect of GYY4137 is not due to impaired activation of CD4+ T cells.

      Lastly, since CTH catalyzed multiple reactions during H2S production, we cannot rule out the effect of other metabolites in this process. However, we think that this is outside the scope of the present study. Our study focuses on understanding of how H2S modulates redox, mitochondrial bioenergetics, and gene expression in the context of HIV latency. These understandings are likely to positively impact future studies exploring the role of H2S on HIV cure.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to establish a standardized quantitative approach to categorize the activity patterns in a central pattern generator (specifically, the well-studied pyloric circuit in C. borealis). While it is easy to describe these patterns under "normal" conditions, this circuit displays a wide range of irregular behaviors under experimental perturbations. Characterizing and cataloguing these irregular behaviors is of interest to understand how the network avoids these dysfunctional patterns under "normal" circumstances.

      The authors draw upon established machine learning tools to approach this problem. To do so, they must define a set of features that describe circuit activity at a moment in time. They use the distribution of inter-spike-intervals ISIs and spike phases of the LP and PD neuron as these features. As the authors mention in their Discussion section, these features are highly specialized and adapted to this particular circuit. This limits the applicability of their approach to other circuits with neurons that are unidentifiable or very large in number (the number of spike phase statistics grows quadratically with the number of neurons).

      We agree with the reviewer that the size of the feature vectors as described grows quadratically with the number of neurons. The feature sets we describe are most suited for “identified” neurons – neurons whose identity and connectivity are known and can be reliably recorded from multiple animals. The method described here is best suited for systems with small numbers of identified neurons. For other systems, other feature vectors may be chosen, as we have suggested in the Discussion: Applicability to other systems.

      The main results of the paper provide evidence that ISIs and spike phase statistics provide a reasonable descriptive starting point for understanding the diversity of pyloric circuit patterns. The authors rely heavily on t-distributed stochastic neighbor embedding (tSNE), a well-known nonlinear dimensionality reduction method, to visualize activity patterns in a low-dimensional, 2D space. While effective, the outputs of tSNE have to be interpreted with great care (Wattenberg, et al., "How to Use t-SNE Effectively", Distill, 2016. http://doi.org/10.23915/distill.00002). I think the conclusions of this paper would be strengthened if additional machine learning models were applied to the ISI and spike phase features, and if those additional models validated the qualitative results shown by tSNE. For example, tSNE itself is not a clustering method, so applying clustering methods directly to the high-dimensional data features would be a useful validation of the apparent low-dimensional clusters shown in the figures.

      We thank the reviewer for these suggestions, and agree with the reviewer that t-SNE is not a clustering method, and directly clustering on t-SNE embeddings is rife with complexities. Instead we have used t-SNE to generate a visualization that allows domain experts to quickly label and cluster large quantities of data. This makes a previously intractable task feasible, and offers some basic guarantees on quality (e.g., no one data point can have two labels, because labels derive from position of data points in two dimensional space). In addition:

      • We used uMAP, another dimensionality reduction algorithm, to perform the embedding step, and colored points by the original t-SNE embedding. (Figure 3—figure supplement 3). Large sections of the map are still strikingly colored in single colors, suggesting that the manual clustering did not depend on the details of the t-SNE algorithm, but is rather informed by the statistics of the data.

      • We validated our method using synthetic data. We generated synthetic spike trains from different “classes” and embedded the resultant feature vectors using t-SNE. Data from different classes are not intermingled, and form tight “clusters” (Figure 2 -- figure supplement 4).

      • Finally, we attempted to use hierarchical clustering to cluster the raw feature vectors, and were not able to find a reasonable portioning of the linkage tree that separated qualitatively different spike patterns (Figure at the top of this document). We speculate that this is because feature vectors may contain outliers that bias clustering algorithms that attempt to preserve global distance to lump the majority of the data into a single cluster, in order to differentiate outliers from the bulk of the data.

      The authors do show that the algorithmically defined clusters agree with expert-defined clusters. (Or, at least, they show that one can come up with reasonable post-hoc explanations and interpretations of each cluster). The very large cluster of "regular" patterns -- shown typically in a shade of blue -- actually looks like an archipelago of smaller clusters that the authors have reasoned should be lumped together. Thus, while the approach is still a useful data-driven tool, a non-trivial amount of expert knowledge is baked into the results. A central challenge in this line of research is to understand how sensitive the outcomes are to these modeling choices, and there is unlikely to be a definitive answer.

      We agree with the reviewer entirely.

      Nonetheless, the authors show results which suggest that this analysis framework may be useful for the community of researchers studying central pattern generators. They use their method to qualitatively characterize a variety of network perturbations -- temperature changes, pH changes, decentralization, etc.

      In some cases it is difficult to understand the level of certainty in these qualitative observations. A first look at Figure 5a suggests that three different kinds of perturbations push the circuit activity into different dysfunctional cluster regions. However, the apparent spatial differences between these three groups of perturbations might be due to animal-level differences (i.e. each preparation produces multiple points in the low-D plot, so the number of effective statistical replicates is smaller than it appears at first glance). Similarly, in Figure 9, it is somewhat hard to understand how much the state occupancy plots would change if more animals were collected -- with the exception of proctolin, there are ~25 animals and 12 circuit activity clusters which may not be a favorable ratio. It would be useful if a principled method for computing "error bars" on these occupancy diagrams could be developed. Similar "error bars" on the state transition diagrams (e.g. Fig 6a) would also be useful.

      We agree with the reviewer. Despite this paper containing data from hundreds of animals, the dataset may not be sufficiently large to perform some necessary statistical checks. We agree with the reviewer that a more rigorous error analysis would be useful, but is not trivially done.

      Finally, one nagging concern that I have is that the ISIs and spike phase statistics aren't the ideal features one would use to classify pyloric circuit behaviors. Sub-threshold dynamics are incredibly important for this circuit (e.g. due to electrical coupling of many neurons). A deeper discussion about what is potentially lost by only having access to the spikes would be useful.

      We agree with the reviewer that spike times aren’t the ideal feature to use to describe circuit dynamics. This is especially true in the STG, where synapses are graded, and coupling between cells can persist without spiking. However, the data required simply do not exist, as it requires intracellular recordings, which are substantially harder to perform (and maintain over challenging perturbations) than extracellular recordings.

      Finally, the signal to the muscles – arguably the physiologically and functionally relevant signal – is the spike signal, suggesting that spike patterns from the pyloric circuit are a useful feature to measure. Nevertheless, this is an important point, and we thank the reviewer for raising it, and we have included it in the section titled Discussion: Technical considerations.

      Overall, I think this work provides a useful starting point for large-scale quantitative analysis of CPG circuit behaviors, but there are many additional hurdles to be overcome.

      Reviewer #2 (Public Review):

      This manuscript uses the t-SNE dimensionality reduction technique to capture the rich dynamics of the pyloric circuit of the crab.

      Strengths:

      • The integration of a rich data-set of spiking data from the pyloric circuit

      • Use of nonlinear dimension reduction (t-SNE) to visualise that data

      • Use of clusters from that t-SNE visualisation to create subsets of data that are amenable to consistent analyses (such as using the "regular" cluster as a basis for surveying the types of dynamics possible in baseline conditions)

      • Innovative use of the cluster types to describe transitions between dynamics within the baseline state and within perturbed states (whether by changes to exogenous variables, cutting nerves, or applying neuromodulators)

      • Some interesting main results: o Baseline variability in the spiking patterns of the pyloric circuit is greater within than between animals

      o Transitions to silent states often (always?) pass through the same intermediate state of the LP neuron skipping spikes

      Weaknesses:

      • t-SNE is not, in isolation, a clustering algorithm, yet here it is treated as such. How the clusters were identified is unclear: the manuscript mentions manual curation of randomly sampled points, implying that the clusters were extrapolations from these. This would seem to rather defeat the point of using unsupervised techniques to obtain an unbiased survey of the spiking dynamics, and raises the issue of how robust the clusters are

      We have used t-SNE to visualize the circuit dynamics in a two-dimensional map. We have exploited t-SNE’s ability to preserve local structure to generate an embedding where a domain expert can efficiently manually identify and label stereotyped clusters of activity. As the author points out, this is a manual step, and we have emphasized this in the manuscript. The strength of our approach is to combine the power of a nonlinear dimensionality reduction technique such as t-SNE with human curation to make a task that was previously impossible (identifying and labelling very large datasets of neural activity) feasible.

      To address the question of how robust the manually identified clusters are, we have:

      1) used another dimensionality reduction technique, uMAP, to generate an embedding and colored points by the original t-SNE map (Figure 3 – figure supplement 3). To rough approximation, the coloring reveals that a similar clustering exists in this uMAP embedding.

      2) We generated synthetic spike trains from pre-determined spike pattern classes and used the feature vector extraction and t-SNE embedding procedure as described in the paper. We found that this generated a map (Figure 2—figure supplement 4) where classes of spike patterns were well separated in the t-SNE space.

      • the main purpose and contribution of the paper is unclear, as the results are descriptive, and mostly state that dynamics in some vary between different states of the circuit; while the collated dataset is a wonderful resource, and the map is no doubt useful for the lab to place in context what they are looking at, it is not clear what we learn about the pyloric circuit, or more widely about the dynamical repertoire of neural circuits

      • in some places the contribution is noted as being the pipeline of analysis: unfortunately as the pipeline used here seems to rely in manual curation, it is of limited general use; moreover, there are already a number of previous works that use unsupervised machine-learning pipelines to characterise the complexity of spiking activity across a large data-set of neurons, using the same general approach here (quantify properties of spiking as a vector; map/cluster using dimension reduction), including Baden et al (2016, Nature), Bruno et al (2015, Neuron), Frady et al (2016, Neural Computation).

      • Some key limitations are not considered:

      o the omission of the PY neuron activity means that the map as given is incomplete: potentially there are many more states, and hence transitions, within or beyond those already found that correspond to changes in PY neuron activity

      We agree with the reviewer that the omission of the PY neurons’ activity means that the map is incomplete. There are likely many more states, and hence many more transitions, than the ones we have identified. In addition, we note that there are other pyloric neurons whose activity is also missing (AB, IC, LPG, VD). However, measuring just LP and PD allows us to monitor the activity of the most important functional antagonists in the system (because they are effectively in a half-center oscillator because PD is electrically coupled to AB). In general, the more neurons one measures, the richer the description of the circuit dynamics will be. Collecting datasets at this scale (~500 animals) from all pyloric neurons is challenging, and we have revised the manuscript to make this important point (see Discussion: Technical considerations).

      o The use of long, non-overlapping time segments (20s) - this means, for example, that the transitions are slow and discrete, whereas in reality they may be abrupt, or continuous.

      We agree with the reviewer. There are tradeoffs in choosing a bin size in analyzing time series – choosing longer bins can increase the number of “states” and choosing shorter bins can increase the number of transitions. We chose 20s bins because it is long enough to include several cycles of the pyloric rhythm, even when decentralized, yet was short enough to resolve slow changes in spiking. We have included a statement clarifying this (see Discussion: Technical considerations).

      o tSNE cannot capture hierarchical structure, nor has a null model to demonstrate that the underlying data contains some clustering structure. So, for example, distances measured on the map may not be strictly meaningful if the data is hierarchical.

      We agree with the reviewer. t-SNE can manifest clusters when none exist (Section 4 of https://distill.pub/2016/misread-tsne/) and can obscure or merge true clusters. We have restricted analyses that rely on distances measured in the map to cases where there are qualitative differences in behavior (e.g., with decentralization, Fig 7) or have compared distances within subsets of data where a single parameter is changed (e.g., pH or temperature, Fig 5). The only conclusion we draw from these distance measures is that data are more (or less) spread out in the map, which we use as a proxy for variability. We have included a statement discussion limitations of using t-SNE (Discussion: Comparison with other methods).

      • the Discussion does not include enough insight and contextualisation of the results.

      We have completely rewritten the discussion to address this.

      Reviewer #3 (Public Review):

      Gorur-Shandilya et al. apply an unsupervised dimensionality reduction (t-SNE) to characterize neural spiking dynamics in the pyloric circuit in the stomatogastric ganglion of the crab. The application of unsupervised methods to characterize qualitatively distinct regimes of spiking neural circuits is very interesting and novel, and the manuscript provides a comprehensive demonstration of its utility by analyzing dynamical variability in function and dysfunction in an important rhythm-generating circuit. The system is highly tractable with small numbers of neurons, and the study here provides an important new characterization of the system that can be used to further understand the mapping between gene expression, circuit activity, and functional regimes. The explicit note about the importance of visualization and manual labeling was also nice, since this is often brushed under the rug in other studies.

      Major concern:

      While the specific analysis pipeline clearly identifies qualitatively distinct regimes of spike patterns in the LP/PD neurons, it is not clear how much of this is due to t-SNE itself vs the initial pre-processing and feature definition (ISI and spike phase percentiles). Analyses that would help clarify this would be to check whether the same clusters emerge after (1) applying ordinary PCA to the feature vectors and plotting the projections of the data along the first two PCs, or (2) defining input features as the concatenated binned spike rates over time of the LP & PD neurons (which would also yield a fixed-length vector per 20 s trial), and then passing these inputs to PCA or tSNE. As the significance of this work is largely motivated by using unsupervised vs ad hoc descriptors of circuit dynamics, it will be important to clarify how much of the results derive from the use of ISI and phase representation percentiles, etc. as input features, vs how much emerge from the dimensionality reduction.

      We agree with the reviewer that is important to clarify how much of our results come from the data itself, and how we parameterize them using ISIs and phases, and how much comes from the choice of t-SNE as a dimensionality reduction algorithm. We have addressed this concern in the following ways:

      1. We used principal components analysis on the feature vectors and measured triadic differences in features such as the period and duty cycle of the PD neuron. We found that triadic differences were lower in the t-SNE embedding than in the first two PCA features, or in shuffled t-SNE embeddings (Figure 2– Figure supplement 2), suggesting that the embedding is creating a useful representation that captures key features of the data.

      2. We have used uMAP to reduce the dimensionality of the feature matrix to two dimensions and found that it too preserved the coarse features of the embedding that we observe with t-SNE. Coloring the uMAP embedding by the t-SNE labels revealed that the overall classification scheme was intact (Fig 3 – figure supplement 3).

      3. We generated a synthetic dataset and applied the unsupervised part of our algorithm to it (conversion to ISIs, phases, etc., then t-SNE). We colored the points in the t-SNE embedding by the category in the synthetic dataset. We found that categories were well separated in the t-SNE plot, and each cluster tended to have a single color. This validates the overall power of our approach and shows that it can recover clustering information in large spike sets (Figure 2—figure supplement 4).

      4. We have run k-means and hierarchical clustering on the feature vectors directly and shown that our method is superior to these naïve clustering algorithms running on the feature vectors. We speculate that this is because these clustering methods attempt to partition the full space using global distances, at the expense of distance along the manifold on which the data is located. Algorithms like t-SNE are biased towards local distances, and discount global distances between points outside a neighborhood, and are this better suited here.

    1. Author Response

      Reviewer 1

      Panda and co-workers analyzed RS fMRI recordings from healthy patients and from two types of coma: UWS and MCS. They characterized the time-resolved functional connectivity in terms of metastability (time-variance of the Kuramoto order parameter), spatiotemporal patterns via non-negative tensor factorization, and its relationship to the eigenmodes of structural connectivity. Finding greater metastability and non-stationarity of the DMN network in healthy MCS patients, than in UWS patients, they found that the best discriminators to classify the different DoCs are the number of excursions (nonstability) from the DMN, salience and FPN networks extracted by the NNTF analysis. Interestingly, the data-driven NNTF yielded a novel sub-network comprising the FPN and some subcortical structures. The excursions and dwell times from this FPN subnetwork showed to be significantly lower in the UWS patients than in MCS. Surrogate data testing assures that the different methods and fits are effectively expressing the functional connectivity matrices measured.

      Overall, I think that the results are correct and they advance in the characterization and understanding of the brain under DoC. However, some improvements can be made in the way the results, and the rationale behind them, are presented.

      We thank Prof. Patricio Orio for his assessment.

      While reading the Results section, it is easy to have the impression of a disconnected set of analyses that just happened to be together. In particular, the section about the structural eigenmodes and their relationship with the time-resolved FC seems to have little connection with the rest of the work, except for confirming (yet again) that DoC patients have a less dynamic FC. More elaboration about the relevance of these results, and what they say about DoC (that other dynamical FC analyses don't), is needed both in the introduction and discussion. Although a clear explanation is given in the introduction, the bottom line seems to be yet another measure of metastability. Perhaps, a better explanation of what underlies the 'modulation strength of eigenmodes expression' will be helpful for distinguishing this analysis from others. How novel is the connection that is being done with the structural connectivity and why is this important? Moreover, the eigenmodes analysis has little-to-none importance in the discrimination of patients done at the end; thus, its place within the big picture is hard to evaluate.

      We understand the reviewer’s position. Part one of our work covers time-resolved FC and spatiotemporal networks in DoC. Part two covers the relationship between timeresolved FC and eigenmodes of the structural network. The rationale for including part two is the following: there is a lot of literature that shows that eigenmodes of the structural network can be considered as ‘building blocks’ or basis functions/vectors for spatiotemporal networks at the functional level (Aqil et al., 2021; Atasoy et al., 2016, 2018; Deslauriers-Gauthier et al., 2020; Gabay et al., 2018; Gabay and Robinson, 2017; Robinson et al., 2016; Robinson, 2021; Tewarie et al., 2019, 2020; Wang et al., 2017). Ideally to link part one and two, you would take this notion further by analysing if the magnitude eigenmode coefficients differed between UWS, MCS and healthy controls and how this would relate to dwell times or expression of spatiotemporal networks. However, this would lead to an immense multiple testing issue, which would be impossible to overcome with our sample size. An important link between part one and two of our work is the relationship between change in eigenmode expression and metastability. Our measure for metastability is only a proxy for metastability. Lack of change in eigenmode expressions seems to confirm this result of metastability.

      To allow for better integration of part one and two of our work, we have added to the introduction:

      “These eigenmodes can be considered as patterns of ‘hidden connectivity’ that come to expression at the level of functional networks. It has been postulated that eigenmodes form elementary building blocks for spatiotemporal dynamics (Aqil et al., 2021). There is evidence that the well-known resting state networks can be explained by activation of a small set of eigenmodes (Atasoy et al., 2018).”

      We have also clarified in the result section:

      “As resting-state network activity can be explained by activation of structural eigenmodes, we next analyse the role of fluctuations in eigenmode expression over time.”

      Something that I find counter-intuitive and that may confuse some readers, is the (apparent) contradiction between the diminished metastability in the DoC conditions and the reduced dwell times (Figure S1; also "the inability to sequentially dwell for prolonged times in a different set of eigenmodes", as stated in the Discussion). Fewer excursions and shorter dwell times can only mean that some networks are just less visited and maybe this would be enough to distinguish between conditions. Further explaining this will help to understand better the implications of the work.

      We understand the reviewer’s point, however we disagree that diminished metastability is in contradiction with the findings on dwell times. We show that dwell times are reduced in the posterior DMN, FPN and sub-FPTN networks, however, there is very long dwelling in the residual network in DoC. Hence, the brain resides in fewer network states in DoC, which is in agreement with reduced metastability. Our proxy for metastability is the standard deviation of the Kuramoto order parameter. Whenever there are more visits to network states, or switching between network states as is the case for healthy controls in our data, this would lead to phase uncoupling followed by phase synchronization, which would hence boost the standard deviation of the Kuramoto order parameter (a proxy for metastability).

      We agree with the reviewer that the sentence starting “the inability to sequentially dwell for prolonged….” Is confusing. We have now removed this statement.

      We have now added to the result section:

      “These findings of very short dwell times in the posterior DMN, FPN and sub-FPTN and long dwell time in the residual network can be considered as a contraction of the functional network repertoire in DoC, which is in agreement with a loss in metastability in these patients.”

      Finally, some comments about the connection(s) of these analyses with the commonly used FCD analysis (based on sliding windows of pair-wise correlations) will be useful, to put better this work into the big picture of time evolution of the functional connectivity.

      We have now discussed sliding window-based analysis in the context of our work in the methodology section.

      “Lastly, we have used a high temporal resolution method to estimate time-resolved connectivity at every time point instead of a sliding window-based method. Previous studies using sliding window approaches have provided novel insights into brain dynamics of loss of consciousness, such as the brain co-occurrence of functional connectivity patterns, which is known as brain states and its temporal (i.e., rate of pattern occurrence (probability) and between pattern transition probabilities) alteration in loss of consciousness in DoC patients (Demertzi et al., 2019) and anaesthesia induced loss of consciousness (Barttfeld et al., 2014a; Uhrig et al., 2018). However, sliding window approaches have limited sensitivity to non-stationarity in the fMRI BOLD signals (Hindriks et al., 2016) and lack to provide spatial alteration of classical brain functional network. The exploration of the spatiotemporal aspects of well-known resting state networks is an important step forwards for better understanding the relation between brain function and consciousness, in a way that is impossible to achieve at the whole brain level. In addition, recent work on time-resolved connectivity shows that brief periods of co-modulation in BOLD signals are an important driving factor for functional connectivity (Esfahlani et al., 2020; Hindriks et al., 2016).”

      Reviewer 2

      The study is of high significance, rigor, and novelty. Despite the many studies of repertoire, dynamic connectivity, etc., in the study of consciousness, there is (surprisingly, as I confirmed with a literature search) a dearth of application of these approaches to disorders of consciousness. The manuscript is well-written and transparent about its limitations. The author should consider the following recommendations:

      We thank the reviewer for his/her assessment of our work.

      1) There is frequent reference to "subcortical" and related networks, but I see no description in the text of which subcortical structures are involved. Panel N of figure 2 is helpful but I think that more explicit detail is important, especially given the specific predictions of mesocircuit theory.

      We have provided details for the subcortical networks presented in the Panel N of Figure 2. In the manuscript we provide a textual description of the brain areas that are part of the network. To improve the clarity of the description of the network, we also now refer to it as “subcortical fronto-temporoparietal (Sub-FTPN)”.

      In the result section, it read as: “This modulated subcortical fronto-temporoparietal network consist of the following brain regions: bilateral thalamus, caudate, right putamen, bilateral anterior and middle cingulate, inferior and middle frontal areas, supplementary motor cortex, middle and inferior temporal gyrus, right superior temporal, bilateral inferior parietal and supramarginal gyrus.”

      2) Similarly, although the global neuronal workspace does posit a critical role for recurrent frontal-parietal networks, can the authors be more specific about the nodes of the proposed workspace and what they found empirically?

      As above mentioned, we have provided more details about the regions part of the “subcortical fronto-temporoparietal”. As the reviewers rightfully noted, this network also shows some overlap with the Global Neuronal Workspace. We refer to that in more detail in the discussion, highlighting how our functional networks overlap and differ with the two networks (i.e., one feedforward only, one with recurrent activity), and with the predictions of the mesocircuit model. For more detail, please refer to the reply to point 1 of “Recommendations for the authors”.

      3) The classification sensitivity/specificity did not, in my opinion, add much to the manuscript, especially since the number of patients is not remotely close to what would be required for a population-based diagnostic approach. If the authors chose to include this with any reference to diagnosis (highlighted in the introduction and elsewhere), I would encourage a comparison with similar data from other clinical or neuroimagingbased diagnostic approaches. However, I think the value of the study resides more with mechanistic understanding than diagnosis.

      We agree with your suggestions that the primary aim of our work is to provide a mechanistic understanding of loss of consciousness. Therefore, we have removed the classification part from the paper and explain our findings focusing on mechanism of pathological unconsciousness rather than its potential as a clinical diagnostic tool. This change has required several textual edits throughout the manuscript.

    1. When memes or the subjects of a meme are used for commercial purposes without permission, the meme creator may sue, as the effect of the commercial use on the market value of the original meme usually prevents a finding of fair use. In 2013, the owners of the cats featured in the “Nyan Cat” and “Keyboard Cat” memes won a lawsuit against Warner Bros. and 5th Cell Media for respectively distributing and producing a video game using images of their cats.

      Big corporations use other creators' work more often than we think. It is unreal to think that people's work can be stolen from the internet easily and sometimes it could be hard to prove. Fortunately, these two cases were able to win their lawsuit.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript investigates a role for YAP in replication. Previous work from this group has shown that Yap knock-down leads to accelerated S-phase and an abnormal progression of DNA replication in the frog eye. Here they extend this to show that YAP depletion accelerates S-phase and DNA replication in the frog embryo, and that YAP binds a DNA replication regulator called Rif1. Combing assays suggest that YAP acts on origin firing. This is an interesting new aspect of YAP function. I am not an expert on DNA replication, however, I feel that the manuscript would have been improved if more mechanistic insight was gained into how Rif1 and YAP interact, and how that interaction influences replication timing.

      In the revised version of the manuscript, we have strengthened our conclusion that Yap regulates the dynamics of DNA replication. We now provide additional experiments in addition to DNA combing and nascent strand analysis by agarose gel electrophoresis: Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation, western blotting for replication fork proteins. All show that DNA synthesis and origin activation is increased after Yap depletion.

      Moreover, in the revised manuscript we also directly compared the effects of YAP depletion to those of Rif1 depletion alone (page 7, New Figure 4). As for Yap depletion, we first quantified rhodaminedUTP incorporation after Rif1 depletion by direct fluorescence microscopy that demonstrated a clear increase of DNA synthesis, consistent with Alver et al. 2017. Second, we performed DNA combing experiments after Rif1 depletion in egg extracts that show a marked increase in DNA replication and fork density like those seen after Yap depletion, spanning from very early to mid S-phase. We therefore found that Rif1 depletion and Yap depletion qualitatively show the same main effects: an increase of DNA synthesis and fork density, that are more pronounced in early S-phase. We also noticed quantitative differences in the direct fluorescence after rhodamine incorporation of whole nuclei and fork density, with stronger effects after Rif1 depletion compared to Yap depletion. This suggests that there might be an additional mechanism for Rif1 in regulating origin activation.

      The title of the manuscript is "A non-transcriptional function of YAP orchestrates the DNA replication program". It is not clear that YAP "orchestrates" DNA replication - for this to be true, it would have to be signal responsive. Since the authors did not reveal any links to YAP activity (such as YAP phosphorylation or nuclear/cytoplasmic distribution) it is not "orchestrating" DNA replication.

      We have replaced “orchestrates” by “regulates”.

      Figure 1 shows that YAP is recruited onto chromatin after MCM2 and MCM7 and at the same time as PCNA and the start of DNA synthesis. Addition of geminin, an inhibitor of Cdt and MCM loading inhibits YAP loading onto chromatin. YAP immuno depletion leads to premature DNA synthesis or replication. Fig 1 B is quite confusing- the labeling in Figure 1B is likely incorrect.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      Figure 2 investigates if YAP depletion affects origin firing or fork speed, using DNA combing. Fig 2A shows that there is increased activated replication origins and decreased distance between origins. The authors say that the increase of fork density is more pronounced than the decreased distance, suggesting YAP is regulating the activation of origins. The number of replicates is low. This is especially true for the conclusion that eye length is unaltered -it appears that there is a subset of eye length that is increased in 2F, which might reach significance if triplicates were performed.

      As the referee points out, both the observed increase of fork density and decrease of origin distances argues that origin activation is increased after Yap depletion. The fact that the increase of the fork density seems more pronounced than the local decrease of neighbouring origins allows a more detailed interpretation, explicitly that whole clusters of origins are activated on top of origins inside already active clusters. This can be observed in the two independent experiments probing many fibers for eye distances and eyes numbers.

      Concerning Figure 2F, the scatter plot makes it look like that the impression that there are more eyes with larger sizes after Yap depletion, but please note that there are also more EL measured as stated in the legend (Mock n=182 versus Yap n=311). To highlight this parameter, we added these numbers below the scatter plot in the revised Figure 2F, as we have done consistently for all of the experiments presented in the revised Figures. The means of the two EL distributions are numerically different but since both distributions are not Gaussian (tested by d'Agostino and Pearson test), only non-parametric tests can apply (Mann-Whitney or Kolmogorov Smirnow test). The results of the two non-parametric tests show that the distributions are not significantly different, as mentioned in the legend. However, we cannot rule out that after Yap depletion some larger eyes may arise from fusions of forks or from a higher fork speed, but again, the tests, applied to a high number of measurements, show no significant statistical differences.

      The authors conducted AP-MS on egg extracts to identify proteins that co-IP with YAP. One of many proteins identified was RIF1 Figure 3 shows a co-IP with RIF1 and YAP. It is a very weak co-IP.

      We agree that the Rif/Yap co-IP is weak, but it is reproducible in several independent experiments with different extracts. There could be many reasons for this. Co-IPs with a high molecular weight partner like Rif1 (250 kDa) are generally tedious (poor gel migrations and WB transfer). Further, Rif1 has been described as having a subnuclear localisation and to associate with the nuclear lamina and heterochromatin. These characteristics are known to make the proteins highly insoluble. These technical limitations have been reported for the mouse Rif1 for instance (Sukackaite R; et al. Sci Rep 2017 May 18;7(1):2119). In fact, similar “weak co-IPs” were also obtained between Rif1 and Nanog (Wang J. et al. Nature 2006 (444), 364–368 ) as well as with PPI (Hiraga S. et al. EMBO Rep. 2017 Mar;18(3):403-419). Finally, it could also be that this interaction is not permanent but dynamic, making it difficult to capture in a Co-IP. Taken together, these parameters mean that the identification of the interaction is in itself challenging. What we did manage to provide is a reciprocal co-IP using the endogenous proteins, which we believe best reflects native conditions.

      Figure 4 shows that YAP levels increase during development and that depletion of YAP or RIF1 leads to increased cell division. The authors use Trim-away to deplete YAP and RIF1 and find that depletion of either leads to an increased number of small cells. The YAP depletion shown in Fig 4B is clear, as is the increased number of small cells in YAP depletion or RIF1 depletion.

      Figure 4 supplement 1 is arguing that trim away and morpholino combined are more effective. Quantitation of the western blots in panel A is needed for this to be convincing.

      The quantification is now presented in new Figure 5-figure supplement 1A. At the 2-cell stage, we observe some fluctuations in the amounts of Yap between samples, the origin of which we do not fully understand. At the 4-cell stage, a reduction in Yap is observed regardless of the depletion strategy used. It is from the 8-cell stage onwards that differential effects between the depletion methods can be appreciated. From this stage onwards, the quantifications confirm that the TRIM-Away and morpholino combined are more effective than taken separately.

      Figure 5 shows that RIF1 is expressed in the eye in RSC and that loss of RIF1 leads to a small eye. Panel B shows that by western blot analysis RIF1 antibody is specific. However, antibodies can have very different abilities in western vs staining. The RIF1 and YAP antibodies should be validated in staining. Also, the staining in Fig5C is at low resolution for both YAP and RIF1 and the identification of foci is unclear.

      This is indeed an important issue. To address this point, we performed immunostaining on retinal sections from embryos depleted with the target protein and compared the fluorescent signal obtained in control versus depleted samples. We show that upon depletion of Yap or Rif, the signal from the immunostaining is severely reduced for Yap or Rif1, respectively, which attests the specificity of the antibodies used in this study. We have added an additional supplementary Figure to show this control (Figure 6-figure supplement 1).

      We agree with the reviewers that the quality of the images could be improved. We now provide confocal images with a better resolution (Figure 6C).

      For Rif1, we observe a clear nuclear staining, rather non-homogenous which is consistent with data reported in the literature. Indeed, Rif1 localisation has been shown to be highly dynamic during the cell cycle and also during S-phase (Cornacchia D. et al. EMBO J. 2012). Some brighter foci could be observed at specific phases (such as G1-phase) but overall, the general pattern appears rather “granular” and restricted to the nucleus. This is what we are also observing. Interestingly, Rif1 does not appear to colocalize with the replication fork or with the replicative helicase MCM3 (Cornacchia D. et al. EMBO J. 2012). The replication foci observed in this study are therefore to be understood independently of the Rif1 localisation pattern.

      For Yap, we do not detect any granular expression but observe rather homogeneous nuclear and cytoplasmic staining, which is also consistent with reported data showing YAP nucleo-cytoplasmic shuffling (see for instance Manning S.A. et al. Curr Biol. 2018). STED microscopy might be necessary for higher resolution.

      It is difficult to see the points the authors wish to communicate in Figure 6. There is almost no Edu in the YAP-MO, which questions the ability to recognize the different patterns in this region of the eye.

      Our observations show that there are fewer EdU positive cells in the Yap-MO but not “no EdU”. The fluorescence intensity in the green-labelled nuclei in Figure 7C after Yap MO does not appear different from that in the control-MO. Under these conditions, there is no reason to think that one pattern is more difficult to recognise than the other one.

      Reviewer #2 (Public Review):

      This paper is of potential interest within the field of DNA replication, as it identifies a novel role for YAP protein in DNA replication dynamics. However, the conclusions are not supported by properly controlled data. Several aspects of data analysis and representation need to be revised.

      In this manuscript, the authors characterized YAP function in the control of DNA replication dynamics, taking advantage of the Xenopus laevis system.

      They found that YAP is recruited to replicating-chromatin and showed that its chromatin enrichment depends on the assembly of pre-RC proteins. In addition, they show that the immuno-depletion of YAP leads to increased DNA synthesis and origin activation, revealing YAP's possible role in the regulation of replication dynamics.

      The authors were also interested in finding YAP potential partners that could mediate its function. They identified Rif1, a major regulator of replication timing, as a novel YAP interactor during DNA replication.

      As RIF1 expression in vivo is restricted to the stem cell compartment of the Xenopus retina, similar to YAP, the authors assessed whether Rif1 could regulate the spatial-temporal program of DNA replication in stem cells. They showed that depletion of Rif1 at early stages of Xenopus embryos development leads to alterations in replication foci of retinal stem cells, resembling the effect observed following YAP down-regulation.

      Finally, they studied the impact of YAP and RIF1 down-regulation at early stages of development, showing that their absence results in the acceleration of cell division rate of Xenopus embryos, where RNA transcription is absent. Based on these results they concluded that YAP has a role in S-phase independent from transcription.

      The higher rate of DNA synthesis observed in the absence of Yap in Figure 1D is not very evident from the gels in Figure 1, supplement 3B. The timing of the experiments is continuously changing throughout the figures. It is therefore difficult to compare them. Also, comparisons across different gels are difficult to interpret. Most importantly, relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of YAP. To accurately quantify the replication of DNA added to the extract, the total amount of DNA synthesized must be quantified.

      Although we do not agree that relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of Yap, we thank the reviewer for his suggestion since we now provide additional data clearly strengthening our conclusion.

      Many studies, published in high standards journals and coming from different Xenopus replication laboratories have quantified DNA synthesised after 32P-dCTP incorporation and separation by agarose gel electrophoresis (Shechter et al, 2004; Trenz et al, 2008; Guo et al, 2015; Walter & Newport, 1997; Suski et al, 2022, Nature). Nevertheless, as the referee suggested, we quantified the total amount of DNA synthesized in three new independent experiments. These new results, presented page 5, lines 34-39 and shown in Figure 1G, support our conclusion, as they also show that Yap depletion increases total DNA synthesis. Please note that the DNA combing results presented in Figure 2 also show that replication is increased after Yap depletion. Finally, we also added another set of experiments to Figure 1 to further confirm these findings. We used the incorporation of Rhodamine-dUTP followed by the quantification of the fluorescence intensity within nuclei. This nuclei-fluorescence based method is frequently used in proliferation assays to assess nucleotide incorporation resulting from the DNA replication process in other organisms. Our new results demonstrate that DNA synthesis is increased 1.5-fold in six biological replicates and represent a third independent method, in addition to DNA combing and 32P-dCTP incorporation, showing that DNA synthesis is increased upon YAP depletion. These new results are now presented page 5, lines 27-24 and shown in Figure 1D-F.

      As explained in the MM section page 14 in the original manuscript, the replication extent (percent of replication) differs for a specific time point from one extract to another, because each egg extract prepared from one batch of eggs replicates nuclei with its own replication kinetics. To overcome this problem and to compare different independent experiments performed using different egg extracts, the data points of each sample were normalized to maximum incorporation value.

      It is also necessary to analyze the dynamics and the abundance of chromatin-bound replication proteins associated with the active replication fork after Yap depletion using chromatin binding assays. This would further confirm the increase in the fork density observed by DNA combing experiments.

      We thank the referee for this suggestion and we added a western blot of chromatin bound proteins after Yap depletion. This shows that two replication proteins associated with the active replication fork, namely Cdc45 and PCNA, are enriched after Yap depletion compared to the control at the beginning of S-phase. This observation further supports the DNA combing results showing that more forks are active after YAP depletion. This new data is now presented page 6 lines 25-32 and displayed in Figure 2H.

      We would like to stress here that with these additional methods added to the revised version, five different methods in total (Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation - total synthesis, 32P-dCTP incorporation - nascent strand analysis, DNA combing, western blotting for replication fork proteins) show that DNA synthesis and origin activation is increased after Yap depletion.

      The quantification of the amount of YAP in Figure 1B is confusing. The legend of the chart states "Control in light grey and presence of geminin in black", but the bar colors are of different shades of grey. It is not clear how to evaluate them.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      The efficiency of depletion for both Rif1 and YAP is different in Figure 4B and Figure 4A, supplement 1.

      We agree with the referee that the efficiency of depletion is different in both figures. This is explained by the fact that the extent of the depletion varies from experiment to experiment. We work with different batches of in vitro fertilized embryos and extracts, so these differences simply reflect the technical/biological variability.

      Moreover, the combined use of the TRIM-away approach with injections of MO led to a stronger and prolonged YAP depletion but also triggered toxicity in the tadpoles, which display severe abnormalities.

      It is important to point out that abnormal development is not always attributable to a toxic effect. Many losses of gene function result in malformations without being ascribed to toxicity or unspecific effects. However, we agree with the reviewers on the need to present a rescue experiment, which is now shown in new Figure 5C and new Figure 5-figure supplement 1B. In addition, we also provide gain-of-function (GOF) data for YAP in early embryos. In brief, we find that the Yap GOF leads to opposite outcomes than those of its depletion with embryos at the same stage of development, having fewer and larger cells than the control. Furthermore, we show that the effects of Yap depletion, i.e. embryos with more and smaller cells than the control at the same developmental stage, are rescued by the injection of MO-resistant Yap mRNA to restore the protein level. This is true for both embryonic divisions (new Figure 5C) and development, as we obtained normal-looking neurula after Yap rescue (new Figure 5-figure supplement 1B). Overall, these data now clearly show that Yap is both sufficient and necessary to maintain the rate of embryonic divisions and that this phenotype is specific since it can be rescued by expressing Yap alone. These new data are presented page 8, lines 2-10.

      Reviewer #3 (Public Review):

      The article by Garcia et al clearly describes a set of experiments establishing Yap as a novel regulator of DNA replication dynamics. Its characterization as both a RIF1 interaction partner as well as playing its own role in replication initiation will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated.

      The authors aim to identify a non-transcriptional function of YAP through the use of the Xenopus in vitro replication system and Yap depletion. Strengths of the paper include the particularly appropriate use of the Xenopus in vitro replication system, as well as the combined use of Trim-Away and morpholino oligonucleotides to deplete Yap and Rif1. Moreover, these experiments were elegantly complemented by single-molecule molecular combing and in vivo studies. Identifying Yap as a novel regulator of DNA replication dynamics, the authors achieved their aim. Through characterization of Yap as both playing a role in replication initiation and as a Rif1 interaction partner will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated. A weakness of the paper is that some of the representative data does not appear to be very representative of the entire data set.

      We replaced representative data in Figure 2 A, which we think better reflects the main conclusions of the entire data set.

    1. Reviewer #1 (Public Review):

      1: The authors formulate competing hypotheses on the behavioral impact of alpha oscillations using signal detection theory (SDT) (Intro and Fig. 1). SDT is indeed well suited for this, as it is used to compute the orthogonal behavioral metrics d' (discriminability) and criterion (bias). However, soon the authors write:

      "The higher d' for conservative trials may be due to the more skewed mapping between the false alarm (FA) rate to its Z-value in our d' computation. Specifically, when criterion (or the decision boundary) intersects the noise distribution at its right tail, small changes in FA rate are nonlinearly exaggerated after Z-transformation. As we did not observe a difference in accuracy between conservative and liberal trials, which is a more robust measure of perceptual discriminability when target presence rate equals 50%, we argue that the observed statistically significant d' difference is equivocal."

      And also:

      "For the binning analyses, we mainly focused on the percentage correct (i.e., accuracy),<br /> and hit and FA rates, because these metrics scale linearly (as opposed to d', which scales<br /> nonlinearly as the hit rate increases or FA rate decreases linearly) and are well defined for both<br /> behavioral data and MVPA outputs."

      And indeed from Fig. 3 onwards they do not really use SDT anymore, which is confusing given the Introduction and Fig. 1. I think it's also problematic, as accuracy, hit-rate and fa-rate are not orthogonal and are therefore much less suited to arbitrate between their competing hypotheses. As a result, I'm not convinced the paper accomplishes what it sets out to do in the Introduction.

      2: Related, if indeed the authors choose to deviate from SDT, they should put the metric "% yes-choices" on equal footing with accuracy. For example, in Fig. 3A, we can see that alpha oscillations predict a reduction of hit-rate as well as fa-rate; this suggest that the main effect is actually on choice bias (% yes-choices) rather than accuracy. If that's true, then the title of this manuscript is misleading.

      3: Have the authors considered to test for non-monotonic effects of alpha oscillations and cortical computation and behavior?

      4: The authors use challenging and sophisticated methods, but these are introduced very casually. For example:

      "To obtain a more fine-grained picture of the alpha power modulation of behavior, we applied generalized linear mixed models (GLMMs; see Methods) to account for both between-subjects and within-subject trial-by-trial response variability, and to estimate the effects of alpha oscillatory power on d' and criterion simultaneously."

      And:

      "To evaluate the quality of visual information coding, we used multivariate pattern analysis (MVPA), operationalizing the quality of visual representation as the neural classifier's classification performance. We used the priming trials to train binary classifiers to classify target-present vs. absent trials in a time-resolved manner, [...]".

      It would help a lot if the authors could unpack their rationale some more. For example, why did they consider between-subjects effects, and could they show some scatter plots with between-subjects correlations before turning to the GLMM? Also, what is the question the authors wanted to answer that required training the classifier in a time-resolved manner (which I like, on a personal note)?

      5: Throughout, the label "liberal trials" is odd, given that group-average criterion > 0 on those trials (Fig. 2C).

      6: It would be nice to explicitly bridge to the literature on (pupil-linked) arousal predicted shifts in decision-making, and to findings on the relationship between alpha oscillations and (pupil-linked) arousal.

    1. But I think we may go still further. The right to regulate the use of wealth in the public interest is universally admitted. Let us admit also the right to regulate the terms and conditions of labor, which is the chief element of wealth, directly in the interest of the common good.

      On July 5, 1935, President Roosevelt created the Wagner Act, also known as the Nation Labor Relations Act. The Act included many things such as entitlement to wages and benefits, hour of work, overtime arrangements and overtime compensation, and leave for illness, maternity, vacation or holiday. Labor and working conditions. (n.d.). https://firstforsustainability.org/risk-management/understanding-environmental-and-social-risk/environmental-and-social-issues/labor-and-working-conditions/ National Labor Relations Act (1935). (2021, November 22). National Archives. https://www.archives.gov/milestone-documents/national-labor-relations-act

    1. Well, this was a true early morning treat!You reeeeally botched that one. Like 180 degrees misinterpreted it.That thread is about how Luhmann developed a personal approach that worked for him (as we all do and should), and that there is no one way to work/do a zettelkasten. Ie. We all must (and inevitably will) interpret Luhmann's take on zettelkasten method (and any other tools/method/etc we encounter) in light of what our needs are.What's super dope, is that my whole jam in this ZK world is about showing the thread/lineage of these techniques and helping people specifically wrestle with some of the principles and practices Luhmann employed so that in the end they can apply them in whatever way they see fit. And yet, somehow....you actually miss that?Also, this.... (you)"We approach these methods from such a top down manner, in part, because our culture has broadly lost the thread of how these note taking practices were done historically. Instead of working with something that has always existed and been taught in our culture, and then using it to suit our needs, we're looking at it like a new shiny toy or app and then trying to modify it to make it suit our needs."... Is this....(me)"We're coming at [zettelkasten] top-down. We're appropriating something and trying to retrofit it in a desire to "be better." In doing so, we're trying "clean it up a bit."I'm critiquing this approach 😂 I'm saying we come at it top-down bc we see it as a reified object (which is incorrect) that is set in stone, when in fact those who present the "one true way" are actually presenting a "cleaned up version" of Luhmann's very personal approach and calling it "official." Again, I'm critiquing that! I am, by design and punk ethos, kinda against "official."Silly, dude. The whole thread is about not looking at it as a "shiny new toy" and seeing it as a more fluid aspect of note-taking and personal practice. It's about recognizing that the way to recreate Luhmann is to be flexible, interpret these methods for yourself. Why? Bc that's exactly what Luhmann did."Let the principles and practices guide your zettelkasten work. Throw them in a box with your defined workflow issues. Let them hash it out. Shake the box and let them tell you the "kind" of zk you should be working with." (thread the day before the above mentioned)Also, and you're gonna love this....Here's you above...."People have been using zettelkasten, commonplace books, florilegium, and other similar methods for centuries, and no one version is the "correct" one."And here's me....."The most well-known slip-boxes in the world have been employed by writers in service of their writing. Variations of the system date back to the 17th c., [3] and modern writers such as, Umberto Eco, Arno Schmidt, and Hans Blumenberg are all known for employing some version of the slip-box to capture, collect, organize, and transform notes into published work. Of course, today, the most famous zettelkasten is the one used...."Sound familiar? It's me citing you, ya dum dum 😂 Footnote numero tres....https://writing.bobdoto.computer/zettelkasten-linking-your-thinking-and-nick-milos-search-for-ground/Such a funny thing to see this fine Friday morning! ☀

      Sadly I think we're talking past each other somehow; I broadly agree with all of your original thread. Perhaps there's also some context collapse amidst our conversations across multiple platforms which doesn't help.

      Maybe my error was in placing my comment on your original thread rather than a sub branch on one of the top several comments? I didn't want to target anyone in particular as the "invented by Luhmann myth" is incredibly wide spread and is unlikely to ever go away. It's obvious by some of the responses I've seen from your thread here in r/antinet that folks without the explicit context of the history default to the misconception that Luhmann invented it. This misconception tends to reinforce the idea that there's "one true way" (the often canonically presented "perfect" Luhmann zettelkasten, rather than the messier method that he obviously practiced in reality) when, instead, there are lots of methods, many of which share some general principles or building blocks, but which can have dramatically different uses and outcomes. My hope in highlighting the history was specifically to give your point more power, not take the opposite stance. Not having the direct evidence to the contrary, you'll noticed I hedged my statement with the word "seems" in the opening sentence. I apologize to you that I apparently wasn't more clear.

      I love your comparison of LYT and zettelkasten by the way. It's reminiscent of the sort of comparison I'm hoping to bring forth in an upcoming review of Tiago Forte's recent book. His method—ostensibly a folder based digital commonplace book, which is similar to Milo's LYT—can be useful, but he doesn't seem to have the broader experience of history or the various use cases to be able to advise a general audience which method(s) they may want to try or for which ends. I worry that while he's got a useful method for potentially many people, too many may see it and his platform as a recipe they need to follow rather than having a set of choices for various outcomes they may wish to have. Too many "thought leaders" are trying to "own" portions of the space rather than presenting choices or comparisons the way you have. Elizabeth Butler is one of the few others I've seen taking a broader approach. A lot of these explorations also means there are multiple different words to describe each system's functionality, which I think only serves to muddy things up for potential users rather than make them clearer. (And doing this across multiple languages across time is even more confusing: is it zettelkasten, card index, or fichier boîte? Already the idea of zettelkasten (in English speaking areas) has taken on the semantic meaning "Luhmann's specific method of keeping a zettelkasten" rather than just a box with slips.)

    1. Author Response

      Reviewer 1

      Strengths:

      This manuscript combines experimental, exploratory, and observational methods to investigate the big question in innovation literature--why do some animals innovate over others, and how information about innovations spread. By combining a variety of methods, the manuscript tackles this question in a number of ways, and finds support for previous work showing that animals can learn about foods via social olfactory inspection (i.e., muzzle to muzzle contact), and also presents data intended to investigate the role of dispersing animals in innovation and information spread.

      Using data from a previously-published experiment, the manuscript illustrates how investigators can numerous interesting questions while limiting the disturbances to wild animals. The manuscript's attempt at using exploratory analysis is also exciting, as exploratory analyses provide a useful tool for behavior research-indeed, Tinbergen insisted that behavior must first be described.

      Weaknesses:

      The manuscript's introduction is a bit unclear as to how the fact that dispersing males may be an important source of information ties to innovations in response to disruptions due to climate change, humans, or new predators, if at all. An introduction regarding the role of dispersed animals in introducing novel behaviors and social transmission would better prepare readers for the questions presented in the manuscript. As it stands now, the manuscript only provides one sentence discussing the theoretical relevance of investigating the role of dispersing animals in innovations.

      We have added some information about this to the introduction (lines 66 – 69 and 121-123) and maintain our discussion of it in the discussion.

      Additionally, while the manuscript attempts to use exploratory analysis, it does not provide enough theoretical background as to why certain questions were asked while the data were explored. While the discussion provides some background as to the role of dispersing males in innovation, the introduction provides little background, and thus does not properly frame the issue. It is unclear how dispersing males became of interest and why readers should be interested in them. As the manuscript reads now, it may be that dispersing males became interesting only as a result of the exploratory analysis-except that the predictions explicitly mentions dispersing males. Thus, manuscript at present makes it difficult to know if the questions surrounding immigrant males resulted from the exploratory analysis, or was a question the analyses were intended to answer from the beginning. If this question only came out after first reviewing the results, then this needs to be made clear in the introduction. I see no issue with reporting observations that were the result of investigations into earlier results, but it needs to be reported in a way that can be replicated in future research-I need to know the decision process that took place during the data exploration.

      We hope this is clearer from our new research aims (lines 125-173)

      The manuscript never clearly defines what counts as an immigrant male; presumably, in this species, all adult males in the group should be immigrants, as females are the philopatric sex. Sometimes, the manuscript uses "recently" to modify immigrant males, but doesn't define exactly what counts as recent, except to say that the males that innovated were in their respective groups for fewer than 3 months, but never explains why three months should be an important distinction in adult male tenure.

      We realise how we wrote about this previously was not clear and perhaps misleading. We noticed that the males that innovated had been in the group for less than three months. We do not know if this is necessary for them to innovate or not. We also added to the discussion a description of the male in AK19 who had been in the group for four months and did no innovate – as he had many other traits which we would expect to exclude him from criteria for innovation (e.g. very old, post-prime, and inactive – died within months of the experiment).

      Due to the above weaknesses, the provided predictions are a bit murky. It is not clear how variation between groups in accordance with who innovated, or initiated eating a novel food, or demographics is related to the central issue. The manuscript does contribute to the literature by looking at changing rates of muzzle contact over exposure to a novel food source, and provides a good extension of previous findings; that, if muzzle contacts help animals learn about new foods, then rates of muzzle contacts involving novel foods should decrease as animals become familiar with the food. However, this point isn't explicit in the manuscript.

      This is now addressed in the new aims paragraph (lines 125-173)

      Finally, it is also unclear as to why changing rates of muzzle contact AND whether certain individual level variables like knowledge, sex, age, and/or rank might influence muzzle contacts during opportunities to innovate.

      We are not sure exactly what the reviewer means here, but hope that the substantial revisions we have made now address their concern.

      As for the methods, the manuscript doesn't provide enough details as to why certain decisions were made. For example, no reason is given as to why only the first four sessions after an animal ate were considered, why the first three months of tenure (but not four, as seen on one group that didn't innovate) was considered to be a critical time for which immigrant males may innovate, why (including the theoretical reasons) the structure of models for one analysis was changed (dropping one variable, adding interactions), or even how the beginning and ending of a trial was decided, despite reporting that durations varied widely,-from 5 minutes to two hours.

      Please see: above about the male with 4 month tenure; and top of document for description of our updated models.

      The discussion contains results that are never elsewhere presented in the manuscript- (2a) Individual variation in uptake of a novel food according to who ate first).

      It was just an error in the sub-title in the discussion – this is now amended. But all the other corresponding details were already there, in the list of research aims in the introduction and in the results as well.

      Finally, the largest issue with the manuscript is that its results are not as convincing as the conclusions made. An issue with all the analyses is that some grouping variables in some analyses but not others despite the fact that all of the analyses contain multiple groups (necessitating group as a grouping variable) and multiple observations of the same individuals (i.e., immigrant males tested in multiple groups, necessitating animal identity as a random effect), and not accounting for individual exposure to the experiment when considering whether animals ate the food in the allotted period (an important consideration given the massive differences in trial times), making these results difficult to interpret in their current forms. As for the results regarding muzzle contact, the analyses has a number of issues that make it difficult to determine if the claims are supported. These issues include not explaining why rank calculated a year before the experiments took place was valid or if rank was calculated among all group members or within age and sex classes, not explaining how rank was normalized, and not conducting any kind of formal model comparisons before deciding the best model.

      Mostly addressed at top of this document. Regarding rank calculations: rank was not calculated a year before the experiments, it was calculated using a year’s worth of data up to the beginning of the experiments – and ranks were calculated among all group members - we have made this clearer in the methods. We also explained our method of normalisation, and noted that it was an error to include non-normalised rank in one of the models – this has now been rectified

      As for the results regarding immigrant males and innovation, little is done to help the fact that these results are from very few observations and no direct analyses. It is possible that something that occurs relatively often but in small sample sizes, like dispersing animals, could have immense power in influencing foraging traditions, and observation is a necessary step in understanding behavior. However, the manuscript doesn't consider any alternative hypotheses as to why it found what it found. No other possible difference between the groups was considered (for example, the groups that rapidly innovated appear to be quite smaller than the groups that did), making the claim that immigrant males were what allowed groups to innovate unconvincing. This is particularly true given that some groups in this study population have experimental histories (though this goes unmentioned in the current manuscript), which likely influenced neophobia-especially given work by the same research group showing that these animals are more curious compared to their unhabituated counterparts.

      We have added more discussion of alternative hypotheses to the discussion (line numbers mentioned above).

      Regarding the comment about rapid innovation in smaller groups – we are not sure what the reviewer means here – all groups except BD were similar sized. The second largest group, NH, had one of the quickest innovations and a smaller group (KB) innovated only at the third exposure. Unless the reviewer instead refers to the spread of the innovation here? This is also not quite what we see in the data – BD is the largest group and one of the fastest to spread, and KB is the smallest group and the slowest to spread. Regarding groups experimental histories, all the five studied groups have already been used in field experiments. The group (LT) with the least experimental history was the one having the greatest proportion of individuals eating the novel food at the first and over the four exposures (see Fig. 2) while one of the groups with the most experimental history (NH) was one having a smaller proportion of individuals eating the food across the experiment. This is discussed in the discussion (lines 370-380).

      Reviewer 2

      I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.

      1) Conceptual clarity

      There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.

      For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.

      We re-wrote the research questions paragraph and results with this advice in mind – hope it is clearer now. We keep the innovation part just descriptive and hope this is less problematic now.

      Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).

      We have now updated the whole research aims (lines 125-173).

      The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard.

      When we use “long-lived” in the introduction, we explain that we mean animals with slow generational turnover for whom genetic adaptation is relatively slow – too slow to adapt to very rapid environmental change. Within the distinctions the reviewer makes here, we feel that vervets and marmosets are much more similar to apes than to dragonflies etc. in this respect… and we think making the comparisons that we do are valid in this context (though we do agree that for other reasons we would not find it appropriate). We have modified the sentence in the introduction (line 4042) and hope this is clearer now. The study in reference 9 is about crop-raiding, which is something vervets can learn to do within one generation too. In addition, reference 8 is used as it was one of the earlier and long-standing definitions of innovation which we are using here – we are not comparing vervets to apes directly, but we do not think a different definition of innovation is required.

      This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).

      We agree that the line running through the whole paper needed to be clearer and have tried to improve this.

      2) Observational detail

      There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.

      We added more details about the experiment in the method section.

      While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.

      The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:

      We did not intend to claim that muzzle contact was the specific mechanism by which individuals learned to extract and eat peanuts – we rather use this experiment to evaluate the function of muzzle contact in the presence of a novel food.

      We did not record observation networks in all groups during experiments and cannot obtain accurate ones from all our videos – we hope it is clearer in our text now. Our group’s previous study (Canteloup et al., 2021) already shows social transmission of the opening techniques using data of two of our groups (NH and KB).

      • Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?

      No, and we have now added some info about other individuals approaching the box and inspecting the peanuts before innovation took place

      • Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?
      • How many tried and failed to extract the nuts before and after observing effective demonstrators?

      We have added the number of individuals that inspected the peanuts (visually and with contact)

      • Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?
      • Did group-members use the same methods of extraction as their 'innovators'?

      Yes – this is the topic of Canteloup et al. 2021 – and these data are not presented again here. That study was on two of the groups presented here (KB and NH), and with up to 10 exposures in each of those groups and present a fine-grained analysis of peanuts opening techniques used by monkeys. We hope this is clearer now in the text where we refer to this paper.

      • How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?

      For this, and the above points: We did not record an observation network for the groups added in this study and are not able to answer this – it is not the focus of this study. For this reason, we do not make claims in this line in the present study, and are cautious with our social learning related language. Whilst we examine the role of muzzle contact in acquiring information about a novel food, we do not expect this behaviour to be a necessary prerequisite in being able to extract and eat this food – indeed many individuals who learned to eat did not perform muzzle contacts. This aspect of the study is about using this novel food situation to explore whether muzzle contact serves information acquisition – which our evidence suggests it does.

      Moreover, the processing of this food is not complex and is similar to natural foods in their environment, and we do expect individuals to be capable of reinventing it easily (and this point with Tennie’s hypothesis is actually discussed in Canteloup et al. 2021 paper) – but the point here is that their natural tendency is to be neophobic to unknown food, and therefore they do not readily eat it until they see a conspecific doing so, after which they do. And we also used this opportunity, though in a very small sample size, to investigate which individuals would overcome that neophobia and be the first to eat successfully.

      The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.

      See above – the reason we talk about ‘uptake’ rather than social learning is that we really see this as a case of social disinhibition of neophobia, rather than more detailed social learning such as copying or imitation, as it would be in a tool-use setting, for example (though in Canteloup et al. 2021 paper, evidence is found that the specific methods to open peanuts are socially transmitted).

      Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.

      Please see response to reviewer 1 on this.

      For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.

      Experiments were done at sunrise at monkeys’ sleeping site in AK, LT, NH and KB where most of the group was present in the area; we added more precision on this point in the Method section (lines 615-619).

      Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).

      We discuss this with respect to the result showing that higher rank individuals were more likely to extract and eat the food at the first exposure and over all four exposures

      Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).

      We hope our distinction between information acquisition and information use is clearer now.

      For example: - What is the role of kinship in these events?

      We did not analyse kinship here, but we see a lot of targeting towards adult males, and we do not have reliable kinship data for them. We also checked (see response to reviewer 3) the muzzle contacts initiated by knowledgeable adult females, and they are mostly towards adult males, not towards related juveniles (see new figure 4D and lines 497-500).

      • Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?

      We recorded muzzle contacts visible within 2m of the box, so individuals were not necessarily eating at the box at the time of engaging in muzzle contacts. However, the majority of muzzle contacts that we could record took place directly at the edge of the box – at the location where the food is accessed – so an individual would not likely be if they were not able to have access to the food. It is possible they could be there and not eating, but they would not have been chased off, otherwise they would not be able to engage in muzzle contacts there. But it is not entirely clear what the reviewer’s point is here.

      • Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)

      This is not typical of this species. Very few specific individuals remove food from others’ mouths, and they do it with their hands, usually beginning with grooming their face and cheekpouches, before prising their mouth open and removing food from the victim’s cheekpouches

      • What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?

      We agree somewhat with this statement, but given the multiple ways we show the effect of knowledge – both at the individual level and the group level (effect of exposure number i.e. overall group familiarity) – we feel we present enough evidence to establish the link between knowledge of the food and muzzle contacts. We find that the model showing the interaction between exposure number and number of monkeys eating on the overall rate of muzzle contacts actually addresses this issue, because we see that when many monkeys are eating during later exposures, when many were indeed knowledgeable, the rate of muzzle contacts is massively decreased. Moreover, if 90% of the individuals present are knowledgeable, then only 10% of the individuals present are naïve, and we show both that knowledgeable individuals are targeted, but also that naïve individuals are initiators.

      • Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?

      We do not observe food-sharing in vervets.

      • Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzlecontact were more likely to learn the target behaviour.

      We disagree here, because there is a distinction between information acquisition and information use - obtaining olfactory information about a novel resource that conspecifics are eating is not the same as learning a complex tool use behaviour for which detailed observation of a model is required. We are not claiming that that muzzle contact is THE mechanism by which the monkeys learn how to eat the food – but we do believe that the clear separation between naïve individuals initiating and knowledgeable individuals being target, and the decrease of the rate of this behaviour as groups’ familiarity with the food increases – is good evidence that this behaviour functions to acquire information about a novel food.

      3) Analysis

      There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.

      A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).

      We have now checked for overfitting in our models.

      The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.

      Please see our answer at the beginning of the document, detailing how we have updated our models.

      B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).

      We have added this now and mention it above already.

      Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).

      This is now addressed (please see details above) – we use VIFs to assess multicollinearity of predictors in our models and find they are all satisfactory (see R code).

      Line 452. A few comments on this muzzle-contact analysis:

      The comments below are a little confusing as some seem to refer to the muzzle-contact rate model (previously line number 452), and some seem to refer to the initiator/receiver model. We have tried to figure out which comments refer to which, and answer accordingly.

      "We investigated muzzle contact behaviour in groups where large proportions of the groups started to extract and eat peanuts over the first four exposures"

      What was the criteria for "a large proportion"?

      All groups are now included in this analysis.

      The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.

      The model about muzzle contact rate never contained individual as a random effect because individuals are not relevant in this model – it is the number of muzzle contacts occurring during each exposure. However, the reviewer might refer here to the model that we forgot to provide the script for. Nonetheless, we have substantially revised this model, it now (Model 3) includes all groups, and has group as a random effect.

      Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.

      Figure 3 is not related to the model described in (original) line 452.

      The numbers were referring to the number of muzzle contacts, and this was written in the figure caption. However, we no longer present these details on the new figure (see Fig 4).

      Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.

      We have included the number of monkeys eating in our muzzle contact rate model now (Model 3) as upon further thought, we found that this was the issue leading us to want to exclude exposures, and only include the groups where many monkeys were eating. We have resolved this now by including all groups and not dropping exposures, and rather we include an interaction between number of monkeys eating and exposure number. We feel this addresses our hypothesis here much more satisfactorily. We hope these updates also address the reviewers concerns adequately.

      Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"

      What was the criterion for a satisfactory proportion? Why was this chosen

      See above – this is now addressed.

      Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."

      The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?

      We no longer present this like this. We revised the model examining muzzle contact rate substantially and actually included the number of individuals eating in the model rather than excluding groups where this number was low. The results of the new model show good support our hypothesis.

      Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."

      This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"

      Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.

      Yes we agree with this issue, and no longer do that. Rather we remove the infant data from this model, which is now Model 6, because of the large amounts of error they introduced into the model due to the small sample size. We show the process in the R code, and we describe our reasons in the text (lines 713-719). Since we are now only comparing within age- and sex-categories (see below) we do not find this decision introduces any bias.

      Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.

      This is now addressed and discussed in descriptions of our new models.

      Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.

      Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).

      We thank the reviewer for noticing this as we had indeed chosen an inappropriate model for what we were intending to measure – this has been addressed now with two additional models (Models 4 and 5; see details at the top of document). We nonetheless found the aspects of this model to still be highly interesting, so have re-framed it to focus on them.

      Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."

      This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.

      For Model 6, we no longer include rank at all, because we had not hypothetical reason to (see lines 723-725). We now begin with the three-way interaction, and only remove this, because it is not significant, and the model had problems converging as well, due to its complexity. We show this in the R script. We retain only the two separate interactions, and we do not include group as a random effect in this model due to the complexity AND because we do not think there is a theoretical requirement for it to be included here (this is explained in lines 730-735- in the manuscript. We report the results of the 3-way interaction in the supplementary material – SM3 Table S2).

      Reviewer 3

      In this study, the authors introduce a novel food that requires handling time to five vervet monkey groups, some of which had previous experience with the food. Through the natural dispersal of males in the population, they show that dispersing individuals transmit behavioral innovations between groups and are often also innovators. They also examine muzzle contact initiations and targets within the groups as a way to determine who is seeking social information on the new food source and who is the target of information seeking. The authors show that knowledgeable adults are more often the target of muzzle contacts compared to young individuals and those that are not knowledgeable.

      This is a very interesting study that provides some novel insights. The methods employed will be useful to others that are considering an experimental approach to their field research. The data set is good and analyzed appropriately and the conclusions are justified. However, there are several areas where the paper could be improved for readers in terms of its clarity.

      1) It wasn't until the Discussion that it became clear to me that the actual physiological and personality traits of dispersers were being linked with innovation. From the Title, Abstract, and Introduction, it seemed as though the focus was on dispersing males bringing their experience with a novel food to a new group to pass it on. I think it needs to be made clear much earlier in the manuscript that the authors are investigating not only the transmission of behavioural adaptation but also how the traits of dispersers might may make them more likely to innovate.

      We have now addressed this above.

      2) Early in the paper on line 28, the authors state that continued initiation of muzzle contacts by adult females could have been an effort to seek social information. This is true but another interpretation is that females were imparting or giving social information. It seems important here and elsewhere (lines 322-323) to consider and report the target of these initiations. If these were directed at more knowledgeable individuals, it supports the idea that this was social information seeking. If muzzle contacts were directed to younger or unknowledgeable individuals, it would imply a form of teaching, which is possible but perhaps unlikely, so I think the authors need to be totally clear here.

      We thank the reviewer for pointing this out We looked into our data and now present figure 4D, showing that almost all knowledgeable adult females’ muzzle contacts were targeted towards knowledgeable adult males and talk about it in the discussion (lines 499-500).

      3) The argument made on lines 344-350 needs more fleshing out to be convincing or it should be deleted. The link between number of dispersers, social organization, and large geographic range seems a little muddled. There are many dispersing individuals in species that are not typically in large multi-male, multi-female social organizations. Indeed, in many species both sexes disperse. Think of pair living birds where both sexes disperse and geographic range can be enormous. There are also no data or references presented here to show that species in multi-male, multi-female social organizations do have larger geographic ranges than those that are not in these social organizations. It seems to me that, even if this is the case, niche is more important than social organization, for instance not being dependent on forests to constrain much of your range.

      We have removed this section

    2. Reviewer #2 (Public Review)

      I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.

      1) Conceptual clarity

      There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.

      For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.

      Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).

      The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard. This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).

      2) Observational detail

      There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.

      While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.

      The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:

      - Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?<br /> - Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?<br /> - How many tried and failed to extract the nuts before and after observing effective demonstrators?<br /> - Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?<br /> - Did group-members use the same methods of extraction as their 'innovators'?<br /> - How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?

      The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.

      Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.

      For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.

      Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).

      Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).

      For example:<br /> - What is the role of kinship in these events?<br /> - Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?<br /> - Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)<br /> - What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?<br /> - Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?<br /> - Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzle-contact were more likely to learn the target behaviour.

      3) Analysis

      There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.

      A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).

      The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.

      B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).

      Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).

      Line 452. A few comments on this muzzle-contact analysis:

      "We investigated muzzle contact behaviour in groups where large proportions of the<br /> groups started to extract and eat peanuts over the first four exposures"

      What was the criteria for "a large proportion"?

      The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.

      Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.

      Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.

      Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"

      What was the criterion for a satisfactory proportion? Why was this chosen?

      Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."

      The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?

      Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."

      This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"<br /> Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.

      Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.

      Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.

      Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).

      Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."

      This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an extremely well-done study, revealing a fascinating phenotype of mes-4 mutant, which they show upregulates X-linked genes, leading to PGC death. These X-linked genes are mostly oogenesis genes, upregulation of which likely impedes normal proliferation of PGCs. The results are very concrete and supports their conclusion, and contribute significantly to the field. I do not have any major concerns except for a couple of conceptual issues. First, the title 'germline immortality' does not seem to be well aligned with the results. It is not wrong that PGCs die in mes-4 mutant, and thus the germline is 'mortal': however, the term 'germline immortality' implies multi-generational passages of germline, and the data in the present study, where mutant PGCs just die in the offspring, do not necessarily point to 'germline immortality' per se. So, I suggest to change the title to reflect the contents of the paper better.

      Good point. We changed germline immortality to germline survival and/or development throughout the paper.

      Second, although the authors speculate (in the discussion) why X activation is toxic to germ cells (discussing that upregulated X-linked genes are oogenesis genes, whose precocious activation is toxic to PGCs), there is not sufficient discussion as to why the effect is mostly limited to X chromosome, and why mes-4 is specifically involved in this. Is it because all oogenesis genes are concentrated on X chromosome? (likely not). Are autosomal genes that are upregulated in mes-4 mutant also oogenesis genes? Is this related to dosage compensation? I would like to see fuller discussions as to why X chromosome requires special regulation, also discussing the role of mes-4 in this context. I understand that the authors might have refrained from expanding discussions on matters that do not have any data, but without this discussion, I feel that many readers will be left wondering 'why?'.

      As noted in Point #5 above, we added to Discussion whether up-regulation of X genes in mes-4 mutant PGCs and EGCs reflects a defect in dosage compensation or a defect in keeping the oogenesis program (which is enriched for X-linked genes) quiet in the nascent germline (see lines 604-630). Based on new analyses showing up-regulation of oogenesis genes (on the X and autosomes) in mes-4 and PRC2 nascent germlines and the points in Discussion, we favor the view that the essential function of MES-4 and PRC2 is to repress X-linked oogenesis genes in PGCs and EGCs (see Figures 6 and 7, associated figure supplements, and lines 389-417).

      Reviewer #2 (Public Review):

      This manuscript makes substantial progress in resolving a long-standing mystery regarding the precise role of the histone methyltransferase MES-4 in promoting germline development. MES-4 maintains the histone modification H3K36me3 and germ cell survival, but prior evidence was unable to distinguish among several possibilities for target pathways. This paper utilizes a transcriptional profiling approach at the critical time of germline development to definitively demonstrate that the essential function of MES-4 is to repress X gene expression in germ cells. This result is surprising because X repression is an indirect effect of MES-4 activity (MES-4 does not localize to the X), while the direct effect of maintaining germline gene expression is not essential. To buttress this finding, the authors also utilize a series of elegant genetic experiments to independently test whether expression from the X is sufficient to cause germ cell degeneration. They then go further to identify a single X-linked target, lin-15b, as a primary contributor to the inappropriate X-linked gene expression in mes-4 mutants, by showing that loss of lin-15b activity rescues both the germline degeneration and X mis-expression of mes-4 mutants. Finally, the authors demonstrate that PRC2, the H3K27me3 histone methyltransferase and MRG-1, a candidate H3K36me3 effector protein, are also involved in promoting X silencing through lin-15b.

      The manuscript's strengths lie in the development or application of novel techniques, including the profiling of individual pairs of PGCs (a non-trivial advancement), as well as some very well-designed and conceptually innovative genetic assays. These were used to address specific and important gaps in knowledge regarding the phenotype of mes-4, which had been elusive despite having been studied for almost 30 years. Although specific to C. elegans in some ways, the findings are clearly relevant to conserved regulatory events, such as epigenetic memory mechanisms and establishment of opposing chromatin states. Thus, this work provides a substantial advance in the field overall.

      One limitation of this study is the lack of clarity about the conclusions regarding the relationship between the two H3K36me3 histone methyltransferases mes-4 and met-1, and between X vs autosomal gene expression. The authors do not precisely state what genes (X or A) are affected in the met-1 and mes-4 mutants. Ultimately, this confusion muddles the final message of X chromosome upregulation being the critical contributor to the mes-4 germline degeneration phenotype. The experiment presented in figure 3B indicates that loss of mes-4 or met-1 is sufficient to prevent germline development even when the Xs are repressed, indicating that failure to activate autosomal gene expression is also an underlying cause of the degeneration. Perhaps this cannot be definitively concluded without directly assessing met-1 and met-1;mes-4 mutant PGCs (or EGCs) for gene expression changes. If technically possible, this would be a very valuable experiment to directly examine autosomal gene expression changes in the double mutant.

      We profiled met-1 PGCs and observed very few mis-regulated genes (Figure 7 – supplemental figure 1). We tried to profile met-1; mes-4 double mutant PGCs, which completely lack both MET-1 and MES-4 and inherit chromosomes that lack H3K36me3. That was not feasible, due to the high level of embryonic lethality and rapid deterioration of PGCs dissected from met-1; mes-4 double mutant larvae. Notably, this demonstrates that germlines that lack both maternal K36me3 HMTs are sicker than those that lack just 1 of the HMTs. The high degree of embryo lethality suggests an essential function for MET-1 and MES-4 in the soma. As requested, we generated and included a list of X and autosomal genes mis-regulated in met-1, mes-4, and other mutant PGCs (see Figure 7—figure supplement 1).

      The sterility of hermaphrodites with a met-1; mes-4 mutant XspXsp germline and lacking either maternal MES-4 or maternal MET-1 may be due to mis-regulation of autosomal genes, or it may reflect that the X chromosomes are not repressed in met-1; mes-4 XspXp germlines that lack H3K36me3. To test that, we would need to profile those XspXsp PGCs. It is not feasible to identify mutant F1 larvae with Xsp/Xsp PGCs immediately after hatching, which is required for transcript profiling. We think that the main message from analyzing met-1; mes-4 mutant XspXsp germlines -- that inherited H3K36me3 marking is not critical for germline development but re-establishment of marking is important and requires both enzymes – does not require our delving into the cause of sterility of mutant XspXsp germlines lacking MET-1.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Autophagy of the endoplasmic reticulum (ER-phagy) is a fundamental process that is essential for maintaining cellular homeostasis and quality control. We recently identified a novel mechanism regulating ER-phagy in both plants and animals that is based on the ubiquitin-like protein modifiers ATG8 and UFM1, and the ER-associated protein, C53. Here, we use a combination of evolutionary, biochemical, and physiological experiments to investigate the evolution and regulation of this process. We reveal the dynamic evolution of UFM1 and the ubiquity of C53-mediated autophagy across eukaryotes. Leveraging these results, we then identify an ancestral molecular toggle switch, mediated by shuffled ATG8-interacting motifs (sAIMs), that controls C53-mediated autophagy through competitive binding between UFM1 and ATG8. These findings provide new insights into the evolution of UFM1, reveal a conserved mechanism for the regulation of ER-phagy, and raise new and exciting hypotheses about the diversity and function of the UFMylation pathway. We believe that this work will be of interest to those studying autophagy and cellular stress response but will also serve as an interesting example of the benefits of combining evolutionary analyses with biochemical and cellular experiments.

      Our manuscript has been reviewed by three reviewers through ReviewCommons, whose comments, and our responses, can be found below. Two of the reviewers (Reviewer 1 and 3) were supportive of our work and its significance whereas Reviewer 2 questioned the novelty of our findings.

      Each of the reviewers’ comments can be addressed through a few supporting experiments as well as an improved manuscript which clarifies the novelty and significance of our results. While being supportive of our work, Reviewer 1 requested minor additional experiments to support our mechanistic conclusions and Reviewer 3 suggested that we expand our characterizations of C53 function to additional eukaryotic supergroups. These experiments are straightforward to perform, the materials and protocols to accomplish them are already established, and our overall conclusions are robust to the resulting outcomes.

      In contrast, Reviewer 2 did not suggest any additional experiments but rather challenged the novelty of our results as well as some of our interpretations. In particular, Reviewer 2 was uncertain of how our phylogenomic analyses built upon a previous study, published in 2014, which used comparative genomics to identify ubiquitin-related machinery across eukaryotes. Although it was an oversight to not reference this study (we cited a more recent article showing the same results), we were aware of their conclusions that UFMylation was present in the last eukaryotic common ancestor but absent in Fungi. We now clearly outline, both below and within the manuscript, our key phylogenomic results. These were acquired after implementing more advanced and comprehensive comparative genomic searches which allowed us to identify dynamic patterns in UFMylation evolution and permitted co-evolutionary analyses which were not only important for informing our experimental hypotheses but generated new functional questions. Our phylogenomic analyses are also linked to biochemical and physiological data, providing, for the first time, experimental support for our conclusions regarding UFMylation evolution. Similarly, Reviewer 2 suggested that our mechanistic results were an incremental extension of our previous work. Although our current work does of course build on our initial identification of C53-mediated autophagy, this manuscript provides novel insights into the importance and function of this process by revealing its ubiquity across eukaryotes and by characterizing the mechanistic details of its regulation. Ultimately, we disagree with Reviewer 2 but appreciate that this misunderstanding likely resulted from a lack of context and clarity in our manuscript which we have now resolved.

      As outlined in detail below, we will address the reviewers concerns through additional experiments, analyses, and improvements to the text.

      Thank you for considering our manuscript. We look forward to hearing from you.

      Description of the planned revisions

      We thank the reviewers for carefully evaluating our manuscript and for providing us with an opportunity to respond to their suggestions and criticisms. As you can see below in our pointby-point response, we address each of the points raised by the reviewers through the addition of supporting experiments, analyses, and an improved text. Altogether, we think these additional experiments and textual changes will significantly improve the manuscript. Therefore, we would like to thank all the reviewers and editors for their time and input.

      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Picchianti et al. provide novel insights into the interaction of C53 with UFM1 and ATG8. Initially, the authors show that protein modification by UFM1 exists in the unicellular organism Chlamydomonas reinhardtii. To that end they demonstrated that pure Chlamydomonas UBA5, UFC1 and UFM1 proteins, can charge UFC1. Then, they showed that C53 interacts with ATG8 and UFM1. Specifically, they found that the sAIM are essential for the interaction with UFM1, while substituting this motif with canonical AIM prevents the binding of UFM1 but not of ATG8. Since binding of C53 to ATG8 recruits the autophagy machinery, the authors suggest that ufmylation of RPL26 releases UFM1 from C53 which allows the binding of ATG8. Overall, the authors demonstrate that C53 that forms a complex with UFL1 connects between protein ufmylation and autophagy by its ability to bind both UBLs. Here the authors revisited the assumption that only multicellular organisms have the UFM1 system. Using bioinformatic tools they show that it exists also in unicellular organism. Also, they show that in some organisms the E3 complex UFL1, UFBP1 and C53 exist but not UBA5, UFC1 or UFM1. This is a very interesting observation that suggests an additional role for this complex. In Fig 1C the authors show that in Chlamydomonas RPL26 undergoes ufmylation. Please use IP against RPL26 and then a blot with anti UFM1. From the current experiment it is not clear how the authors know that this is indeed RPL26 that undergoes ufmylation

      RPL26 is highly conserved across eukaryotes, so by comparing our western blots with previous studies (Walczak et. al., 2019, Wang et al. 2020), we concluded that these bands corresponded to UFMylated RPL26. However, we agree with the reviewer that we need to confirm the identify of RPL26 with additional assays. Since the submission of the manuscript, we tested RPL26 antibodies in Chlamydomonas and showed that they work well. So, we will update our figure with the confirmation westerns.

      In the second part of the manuscript the authors characterize the interaction of C53 with ATG8 and UFM1. This is a continuation of their previous published work (Stephani et al, 2020). Here the reviewer thinks that further data on the binding of these proteins to C53 is required. Specifically, defining the Kd of these interactions using ITC or other biophysical method can contribute to the study.

      We agree with the reviewer. To obtain the KD values, we will perform ITC experiments with C53 wild type, a C53 sAIM mutant and a C53 cAIM variant titrated with ATG8 and UFM1.

      Under normal condition the authors suggest that C53 binds UFM1 and this keeps it inactive. The reviewer thinks that this claim needs further support. Using IP (maybe with crosslinker) the author can show that C53, in normal conditions, bind more UFM1 than ATG8. Also, since the interaction of UFM1 to C53 is noncovalent, it will be nice to show how alternations in UFM1 expression levels can affect the activation of C53.

      We thank the reviewer for this suggestion. Since the submission of the manuscript, we have obtained UFM1 overexpression lines. We will pull on C53 using our C53 antibody and check for ATG8 levels in wild type and UFM1 overexpressing lines under normal and stress conditions. We think this will show how alterations in UFM1 levels can affect C53 activation.

      Finally, the authors suggest that ufmylation of RPL26 allows binding of ATG8 to C53 and this, in turn, leads to C53 activation. Can the authors show that in cells lacking UBA5, under normal condition or with Tunicamycin treatment, ATG8 does not activate C53 due to the fact that UFM1 does not leave C53.

      In Stephani et al., we showed that C53-mediated autophagy requires the UFMylation machinery. In ufl1 and ddrgk1 mutants, C53 becomes insensitive to ER stress. However, to supplement these results, we will perform autophagic flux assays using the native C53 antibody to test autophagic degradation of C53 in a uba5 and ufc1 mutant under normal and tunicamycin stress conditions. The uba5 mutant that we have is a knockdown, so that’s why we will include the ufc1 mutant in our experiments.

      Significance

      This manuscript advances our understanding of the connection of ufmylation to autophagy which is mediated by C53.

      Thank you!

      Referee #2

      Evidence, reproducibility and clarity

      The manuscript from Picchianti et al. seeks to define the role of CDK5RAP3 (hereinafter referred as C53) during autophagy and its interplay with UFMylation. Together with UFL1 and DDRGK1, C53 is a component of a trimeric UFM1 E3 ligase complex that modifies the 60S ribosomal protein RPL26 at the endoplasmic reticulum (ER) surface upon ribosomal stalling (among other proposed functions that are not addressed). Several previous studies have implicated the UFMylation pathway in autophagy or ER-phagy although a non-autophagic fate for UFM1- tagged ribosomal subunits has also been reported. A previous study from the same authors (PMID: 32851973) identified an intrinsically disorder region (IDR) in C53 that is necessary and sufficient for interaction between C53 and autophagy receptor, ATG8. They reported that this IDR comprises four non canonical ATG8 interacting motifs (AIM), named shuffled AIMs (sAIMs) and showed that combinatorial mutagenesis of sAIM1, sAIM2, and sAIM3 abrogates ATG8 binding. A similar effect was observed for plant C53, though an additional canonical AIM (cAIM) in the C53 IDR had to be mutated to completely abolish C53 and ATG8 interaction. The earlier study reported that C53 IDR also interacts with UFM1, and this interaction can be disrupted in vitro by adding increasing concentration of ATG8, suggesting that ATG8 and UFM1 may compete with one another for C53 binding. The present paper attempts to build on this previous work by using phylogenomics to infer a coevolutionary relationship between UFMylation machinery and sAIMs in C53, which the authors argue, constitutes further evidence of the primary importance of a role for UFMylation in ER homeostasis. The manuscript includes a lot of biochemical data using variations of in vitro and in vivo pull-down experiments to define the roles of individual AIMs in mediating the binding of C53 to ATG8 and to UFM1. They also use NMR spectroscopy in an attempt to define the structural basis of the UFM1 and ATG8 binding to C53, concluding that plant C53 interacts with UFM1 mainly through sAIM1, while interaction with ATG8 requires cAIM as well as sAIM1 and sAIM2. Finally, the authors attempt to contextualize these findings by conducting studies on Arabidopsis mutants, showing that replacing sAIMs with cAIMs causes increases sensitivity to ER stress and apparently increases formation of C53 intracellular puncta that may colocalize with ATG8. From these data the authors concluded that the dual-ATG8 and UFM1 binding of C53 IDR regulates C53 recruitment to autophagosomes in response to ER stress. Major Issues: 1) The phylogenomics analysis conclusion that UFM1 is common in unicellular lineages and did not evolve in multicellular eukaryotes is not novel, as another comprehensive analysis of UFM1 phylogeny, published eight years ago - in 2014 - by Grau-Bové et al. (PMID: 25525215), also reported that UFM1, UBA5, UFC1, UFL1 and UFSP2 were likely present in LECA and lost in Fungi. Although the phylogenomic analysis by Picchianti et al. is also extended to DDRGK1 and C53 proteins, and some parasitic and algal lineages, their findings are incremental. Their proposed coevolution of sAIM and UFM1 is based on presence-absence correlation observed within five species (i.e., Albugo candida, Albuco laibachii, Piromyces finnis, Neocallimastix californiae, Anaeromyces robustus). However, this coevolutionary relationship must be further investigated by substantially increasing the taxonomic sampling within the UFM1-lacking group.

      We were aware that previous studies had investigated the distribution of UFMylation proteins across eukaryotes and that these analyses had predicted the presence of UFMylation in LECA and subsequent loss in Fungi. We included a more recent citation noting this (Tsaban et al. 2021) but apologise for not citing Grau-Bové et al. (2014), which we have now included. We must emphasize that our results are not incremental. Although we had made a point of emphasizing the presence of UFM1 in LECA, this was to counter a recent and highly cited paper in the field which claimed that UFMylation evolved in plants and animals (Walczak et al. 2019). Below we note the novel and important results from our phylogenomic analyses: 1. We used improved taxonomic sampling and more advanced comparative genomics methods to identify UFMylation components sensitively and specifically across eukaryotes. This involved the inclusion of additional eukaryotic genomes, phylogenetic annotation of orthologs, and genomic searches to complement proteome predictions. These methods are essential for accurately identifying UFMylation components and yield more robust results than using sequence similarity clustering (Tsaban et al. 2021) or un-curated Pfam HMMER search results (Grau-Bové et al. 2014). 2. By placing our UFMylation reconstructions in a modern phylogenetic context we were not only able to support previous observations which noted the presence of UFM1 in LECA and its loss in Fungi (Grau-Bové et al. 2014) and Plasmodium (Tsaban et al. 2021), but also to identify novel patterns in the evolution of UFMylation. This included the observation of recurrent losses in diverse but trophically-related lineages (such as algae and parasites) and revealed the retention of certain UFMylation components in the absence of UFM1. We identified the frequent coretention of UFL1 and DDRGK1 following UFM1 loss in multiple eukaryotic groups, including Fungi, which were previously thought to be devoid of UFMylation machinery. These previously uncharacterized patterns, suggest that these proteins could have alternative functions and may be functionally associated with life history. These results therefore expand on and add complexity to our understanding of the evolution of UFMylation. 3. By conducting a comprehensive and accurate survey of UFMylation components we were able to use our data to examine co-evolutionary trends between C53 and UFM1, which would have been incomplete and inaccurate using previously curated datasets. As the reviewer noted, only five species were identified that encoded C53 but lacked UFM1. This is not a reflection of insufficient taxon sampling, but rather the strong co-evolution between C53 and UFM1 (i.e., when UFM1 is lost, C53 is almost always lost as well). We attempted to identify additional cases by searching hundreds of fungal and oomycete genomes as well as those from other eukaryotes, but no other species were found. We agree with the reviewer that additional taxa would have made our analyses stronger, but importantly, we do not rely on genomic correlations to infer function. Rather, we use these correlations to generate functional hypotheses which we then tested experimentally. In this way, we do not rely on the strength of our correlations. We have now revised the manuscript to include additional context (including citations) and have improved the clarity of the text to better convey the novelty of our findings.

      2) The manuscript presents an overwhelming amount of biochemical and structural data obtained from a variety of protein binding techniques (i.e., NMR spectroscopy, in vitro GSTpulldown, fluorescence microscopy-based on-bead binding assays, and native massspectrometry). The results are poorly explained and not organized in a logical manner. Moreover, no attempt was made to explain the rationale behind using one technique over the other or how one method complements another to build a stronger conclusion than any individual approach. Given that none of the methods employed report quantitative measurement of binding affinities between C53 IDR and UFM1 or ATG8, it is not clear how the data presented in this manuscript contribute to our understanding of the proposed competition model for UFM1 and ATG8 binding to C53 IDR. To conclude that an interaction is "stronger" or "weaker" it is necessary to measure equilibrium binding constants. Fortunately, there are suitable techniques, including surface plasmon resonance (SPR), microscale thermophoresis (MST), fluorescence anisotropy, or calorimetry that are available to dissect these complex competitive binding interactions and to build models.

      We thank the reviewer for their suggestion. Although we attempted to describe the rationale behind each experiment (please see the line 135-137; on-bead binding assays, line154-157; NMR, 177-181), we agree that the volume of data and variety of techniques warrants additional explanation. We will revise the manuscript to further explain our rationale for using each of the different approaches. As we noted above in our response to reviewer 1, we will also perform relevant ITC binding assays to quantify the interaction between C53, ATG8, and UFM1.

      3) The NMR studies have the potential to dissect the types of dynamic binding inherent in unstructured proteins. However, the abundant NMR data presented combined with the aforementioned binding studies, remarkably, do not seem to significantly advance our understanding of how the system is organized or even how UFM1 and ATG8 bind C53, beyond the rather vague and somewhat circular conclusion stated in the abstract: "...we confirmed the interaction of UFM1 with the C53 sAIMs and found that UFM1 and ATG8 bound the sAIMs in a different mode." Or on line 165 "Altogether these results suggested that ATG8 and UFM1 bind the sAIMs withn C54 IDR, albeit in a different manner".

      We agree that NMR has the potential to dissect the complex binding interactions between UFM1, ATG8, and C53, but disagree with the reviewer’s interpretation that our NMR data fail to achieve this. To sum up, our NMR data: 1. Revealed the structural basis of the interaction of C53-IDR with ATG8 and UFM1 at atomic resolution by showing that UFM1 binds preferentially to sAIM1 in the fast-intermediate exchange [Fig.4 and Fig. S7B], instead ATG8 binds cAIM in the slow-intermediate exchange, and once cAIM is occupied, it binds sAIM1,2 with lower affinity in the fast-intermediate exchange (Fig.4 and Fig.S7D). 2. Determined conformational changes in C53 IDR upon binding of ATG8, but not UFM1 (Fig.S7E), which lead to increased dynamics in distinct regions in C53 IDR. These data could explain how binding of first ATG8 would trigger C53-dependent recruitment of the tripartite complex to autophagosomes. 3. Identified how UFM1 binds to atypical hydrophobic patch in C53 sAIM, similar to what was shown for the UBA5 LIR/UFIM. To sum up, our results shed light on how both UBLs interact with C53, being sAIM1 the highest affinity binding site for UFM1 while ATG8 binds cAIM preferentially before occupying sAIM1,2. To provide more detailed information on the atomic details of the interaction between C53 and the UBLs, we will perform molecular docking studies by using the restraints obtained from the experimental NMR data.

      4) The functional assays performed in Arabidopsis do not support the competitive model between UFM1 and ATG8 for binding to C53 during C53-mediated autophagy. The fluorescence microscopy images do not provide convincing evidence of colocalization between C53 and ATG8. In fact, in contrast to the claims made in the text or the quantification, mCherry-C53 fluorescence does not seem to localize in discrete puncta and its signal does not seem to overlap with ATG8A.

      We disagree with the reviewer’s interpretation of these results although we acknowledge that there is some subtlety in interpreting the co-localization data. Importantly, Arabidopsis has 9 ATG8 isoforms and C53 can bind to most of them with varying affinities (see Stephani et al). Because of this, we do not expect C53 puncta to fully colocalize with ATG8A puncta. Additionally, the C53 puncta are smaller and more subtle than ATG8 puncta, which label the entire autophagosome. To reconcile this, we will quantify the effect by performing colocalization analyses under normal and stress conditions. We will also upload all the raw images as supporting material, so that anyone can independently assess our images.

      Minor Issues: 1. The authors might choose to avoid teleological arguments such as (line 135): "As the phylogenomic analysis suggested that eh sAIMs have been retained to mediate C53-UFM1 interaction..."

      We thank the reviewer for this suggestion and will modify the text accordingly.

      1. The authors refer on multiple occasions to C53 "autoactivation" without defining what they mean by this. Do they propose that C53 UFMylates itself?.

      We refer to C53 activity as the ability to recruit the autophagy machinery and initiate cargo sequestration and degradation in the vacuole. We attempted to explain this in lines 57-61 but we will reword it more clearly, as suggested by the reviewer.

      1. The paper might want to avoid preachy philosophical statements like "Our evolutionary analysis also highlights why we should move beyond yeast and metazoans and instead consider the whole tree of life when using evolutionary arguments to guide biological research." (333- 335). While this is indeed a laudable goal, given the rather limited insights from this study, it is unclear how this paper exemplifies the notion.

      We added this statement as we were intrigued by our evolutionary analyses’ ability to link C53 to UFM1 (an association which took years to identify experimentally) and generate useful functional hypotheses about the interaction between C53 sAIMs and UFM1. As we mentioned above, we also wanted to highlight this point in reference to a recent prominent study in the field which drew conclusions after only considering animals, plants, and fungi (Walczak et al., 2019). We believe this point is important and underappreciated by some cell biologists, but we will modify the text to make it more generic: “This work highlights the utility of using evolutionary analyses and eukaryotic diversity to generate mechanistic hypotheses for cellular processes”.

      Significance

      Overall, while the manuscript contains an abundance of new data, the overall conclusion of the work, stated in the title: "Shuffled ATG8 interacting motifs form an ancestral bridge between UFMylation and C53-mediated autophagy" does not constitute a significant advance beyond other published phylogenomic analysis (below) and the two previous papers by the same authors, including the 2020 paper "A cross-kingdom conserved ER-phagy receptor maintains endoplasmic reticulum homeostasis during stress (PMID: 32851973)" and the 2021 paper "C53 is a cross-kingdom conserved reticulophagy receptor that bridges the gap between selective autophagy and ribosome stalling at the endoplasmic reticulum PMID: 33164651)". While a regulatory interaction between UFMylation and autophagy is of potential importance, the data in this manuscript do not constitute a major advance and fail to provide new mechanistic insight to explain the role of C53 IDR in autophagy and its interplay with UFMylation

      We disagree with the reviewer’s suggestion that our work does not constitute a significant advance. We outlined above in detail the novel insights that were obtained from our phylogenomic analysis which involved using improved methods to reveal a much more dynamic and informative picture of UFMylation evolution than has been described previously. Likewise, this manuscript builds substantially on our previous mechanistic work. In our 2020 paper (which is summarized in the mentioned 2021 review article), we identified C53 as an ER-associated protein that binds ATG8 through sAIMs and interacts with the phagophore after RPL26 UFMylation. This work linked C53 activity to ER-phagy and highlighted its importance in plant and animal stress response. However, key questions remained unanswered prior to our current work such as whether this mechanism is conserved across eukaryotes, especially in unicellular species, how C53 activity is regulated, and how UFM1 and ATG8 interact with C53. Our current manuscript builds on this work with the following key results: 1. We use a combination of phylogenomic and experimental analyses to demonstrate that C53 function is conserved across eukaryotes. 2. We reveal a mechanism whereby UFM1 and ATG8 compete for binding at the sAIMs in the C53 IDR and characterize how each of these ubiquitin-like proteins interacts in an alternative way (see the NMR results described above). 3. We show how the sAIMs are required for the regulation of C53-mediated autophagy and reveal the importance of UFM1-ATG8 competition in preventing C53 autoactivation, which causes unnecessary autophagic degradation and impairs cellular stress responses.

      These insights are fundamental for understanding the mechanisms regulating C53-mediated autophagy which were unknown before this work. We will therefore adjust our manuscript to more clearly and explicitly explain how our data build on previous observations so that the novelty and significance of our results are clearer.

      Referee #3

      Evidence, reproducibility and clarity

      Picchianti and colleagues have investigated a conserved molecular framework that orchestrates ER homeostasis via autophagy. For this, they have carried out phylogenomics and large-scale gene family analyses across eukaryote diversity as well as a barrage of molecular lab work. The amount of work carried out as well as the overall quality of the study is impressive.

      Thank you!

      I have only a few comments that should be very easy to tackle. (1) Maybe I missed it, but please upload all alignments used for phylogenetics and phylogenomics for reproducibility to e.g. Zenodo, Figshare or other suitable OA databases.

      We included the alignments in the supplementary data, but as suggested, we will upload all the source data including the scripts and the alignments to Zenodo.

      (2) "Why these non-canonical motifs were selected during evolution, instead of canonical ATG8 interacting motifs remains unknown" --> Maybe there is no "why" and these were not selected at all. Could be random... drift, non-adaptive constructive neutral evolution. I am not saying that asking "why" in evolutionary biology is wrong. It, however, often does not yield satisfactory answers--or any answer at all.

      The reviewer is completely right that “why” is not the right way to frame an evolutionary question. Thank you for pointing this out. We will revise the text and make sure that we remove these kinds of deterministic statements.

      (3) The authors make a case for UFMylation in LECA and I am fully sympathetic with this. However, getting rid of misfoled/problematic proteins and subcellular entities is something that prokaryotes also to a certain degree must have (and still do) master. Are inclusion bodies or export their only answers (I don't know)? Of course, in eukaryotes with all their intracellular complexity this is likely more of an issue. Given the scope of this manuscript (i.e. shedding light on that ancient framework, deep evolutionary roots in eukaryote evolution etc. etc.) it would be very interesting to read the authors thoughts on this and also pinpoint the prokaryote/eukaryote divide in light of the machinery discussed here.

      Thank you for this suggestion. We did indeed check whether any of the UFMylation machinery were present in prokaryotes and only found homologs of UFSP2. These results are consistent with Grau-Bové et al. (2014) who conducted an equivalent analysis and concluded that UFMylation machinery were derived during eukaryogenesis. We will make reference to this in the revised manuscript.

      Significance

      This study not only impresses with the volume of experiments and data, but also the courage to show conservation of a molecular framework by working with such a range of distantly-related eukaryotes. The results and conclusions from this study should be interesting to anyone working in the broad fields of cellular stress and/or autophagy--both extremely timely topics.

      We thank the reviewer for understanding our take-home message and the advances made. We especially thank the reviewer for understanding the challenge of connecting in silico genomic data with in vivo and in vitro experiments.

      CROSS-CONSULTATION COMMENTS

      Referee #2 The challenge in providing a fair review of this manuscript is to clearly define what contributions are novel, significant advances. It is difficult to tell the way the manuscript is written, as it is unclear how the new data - which are voluminous- actually advance the model already put forth by the same authors in two previous publications. It is also unfortunate that the authors overlooked the 2004 phylogenomics paper. There clearly are some new pieces of information here, but the overall increment in knowledge is rather minimal. Response from Referee #3 I agree that the authors somehow steamroll the reader with a wealth of data. But I think this can be addressed by the authors by requesting a lot more justification and by giving them the opportunity to put the significant advances into their own words. This is, in my opinion, quite doable in course of a revision. Overall I have to say that I am very sympathetic with the crosseukaryote reactivity approach that the authors have taken. It is quite intriguing.

      We thank the reviewers for this useful exchange. We agree that our manuscript was not clear enough to emphasize the novelty of our results which likely resulted from the volume and diversity of the experiments and analyses that were presented. We have now revised the manuscript to improve the context and rationale for the study, the intent and hypotheses behind each experiment, and the novel results and insights obtained in each section.

      Response from Referee #2 I agree that the cross-eukaryote approach is intriguing. Shouldn't we be concerned that the 2004 publication already made two of their key points (ie present in LECA, loss in Fungi). What is the incremental insight from this paper? I'd appreciate an opinion from an evolutionary biologist as to how strongly one can conclude functional co-evolution from such correlative data, especially given the rather small number of supporting examples. Is it also necessary to consider counter-examples- ie species that have sAIMs but no UFM1 (I believe that they found a few such cases)?

      Importantly, we do not conclude functional co-evolution from our correlative data. Instead, we used these correlations to generate hypotheses that we tested with various experiments in different model systems. For example, the apparent correlation between C53 sAIMs and UFM1 prompted us to test whether or not UFM1 and sAIMs interact. Regardless of sample size or statistical significance, phylogenomic analyses can never demonstrate functional links, only correlations, which is why we combined these two approaches. Although only a few species encoded C53 without UFM1, each of these contained C53 cAIMs and lacked sAIMs (Figure 2c). There are species with UFM1 that lack C53 but this makes sense as UFM1 is used in other processes besides ER-phagy. We have revised the text to make our approach and reliance on certain data clearer.

      Response from Referee #3 Well with these deep evolutionary questions this is always a challenge. Where does one stop to sample more homologs for one's analyses (one from each supergroup [which are no longer recognised by the community])? In that sense, the authors are right to make the parsimonious base assumption that if X and Y interact in species A and B (no matter how distant they are related) then X and Y interacted in the last common ancestor of A and B. That being said, if I would have designed this study, I would have sampled more broadly for my in vitro crosseukaryote approach. But also this, I think, could be carried out by the authors in a reasonable timeframe. Specifically, they have now sampled from Amorphea and Archaeplastida, they should add one from TSAR, one Haptista, one Cryptista, and one CRuM. If they synthesised the proteins via a company, they could have the constructs in a few weeks for about 1K Euro - I do not think that this would be an unreasonable request.

      We agree that testing C53 function in additional species would strengthen our understanding of the conservation of this pathway across eukaryotes, as it cannot be assumed that orthologous proteins will function in the same way across all species. To our knowledge there is no other work showing experimentally that the UFMylation pathway is working in a single-celled organism. We focussed our efforts on the unicellular green alga, Chlamydomonas due to its relative experimental tractability. However, testing this was not trivial as it required us to establish expression and purification protocols, isolate Chlamydomonas mutants, optimize physiological stress assays, and perform the experiments.

      Nevertheless, we agree that we could expand our in vitro assays with C53 orthologs from additional species. As suggested by reviewer 3, we will now synthesize 6 more C53 isoforms from two TSAR representatives (the alveolate, Tetrahymena thermophila, and the stramenopile, Phytophthora sojae), as well as a representative from Haptista (Emiliania), Cryptista (Guillardia), Diplomonada (Trypanosoma), and CRuMs (Rigifila). We will test their interaction with human and plant ATG8 and UFM1 proteins. We have also added two species from CRuMs into our phylogenomic analysis.

      The list of experiments that we can do to address the reviewer’s concerns: 1. Repeat experiment in Figure 1C probing with �-RPL26. 2. To calculate KD values, perform ITC experiments with C53 wild-type, C53 sAIM mutant and C53 cAIM variant titrated with ATG8 and UFM1. 3. Perform CoIP experiments using C53 antibody in wild type and UFM1 overexpressing lines and detect for ATG8 association, under normal and stress conditions. 4. We will test autophagic degradation of C53 in uba5 and ufc1 mutants under normal and tunicamycin stress conditions by performing autophagic flux assays using the native C53 antibody 5. Molecular docking studies to see C53’s structural rearrangements leading to ATG8 and UFM1 binding. 6. Figures from co-localization experiments in Figure 5G will be revisited and we will perform additional co-localization analyses such as Pearson coefficient under normal and stress conditions. We will also upload all the raw images as supporting material, so that anyone can independently assess our images. 7. We will upload all the source data for phylogenomic analyses, including scripts and alignments to Zenodo. 8. Test the interaction of 6 newly synthesised C53 isoforms from: (1) an alveolate (tsAr, Ciliate), (2) a stramenopile (tSar, Phaeodactylum), (3) a haptophyte (Emiliania), (4) a cryptophyte (Guillardia), (5) a diplomonad (Trypanosoma) and (6) a CrRuM with human and plant ATG8 and UFM1 proteins.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript from Picchianti et al. seeks to define the role of CDK5RAP3 (hereinafter referred as C53) during autophagy and its interplay with UFMylation. Together with UFL1 and DDRGK1, C53 is a component of a trimeric UFM1 E3 ligase complex that modifies the 60S ribosomal protein RPL26 at the endoplasmic reticulum (ER) surface upon ribosomal stalling (among other proposed functions that are not addressed). Several previous studies have implicated the UFMylation pathway in autophagy or ER-phagy although a non-autophagic fate for UFM1-tagged ribosomal subunits has also been reported.

      A previous study from the same authors (PMID: 32851973) identified an intrinsically disorder region (IDR) in C53 that is necessary and sufficient for interaction between C53 and autophagy receptor, ATG8. They reported that this IDR comprises four non canonical ATG8 interacting motifs (AIM), named shuffled AIMs (sAIMs) and showed that combinatorial mutagenesis of sAIM1, sAIM2, and sAIM3 abrogates ATG8 binding. A similar effect was observed for plant C53, though an additional canonical AIM (cAIM) in the C53 IDR had to be mutated to completely abolish C53 and ATG8 interaction. The earlier study reported that C53 IDR also interacts with UFM1, and this interaction can be disrupted in vitro by adding increasing concentration of ATG8, suggesting that ATG8 and UFM1 may compete with one another for C53 binding.

      The present paper attempts to build on this previous work by using phylogenomics to infer a co-evolutionary relationship between UFMylation machinery and sAIMs in C53, which the authors argue, constitutes further evidence of the primary importance of a role for UFMylation in ER homeostasis. The manuscript includes a lot of biochemical data using variations of in vitro and in vivo pull-down experiments to define the roles of individual AIMs in mediating the binding of C53 to ATG8 and to UFM1. They also use NMR spectroscopy in an attempt to define the structural basis of the UFM1 and ATG8 binding to C53, concluding that plant C53 interacts with UFM1 mainly through sAIM1, while interaction with ATG8 requires cAIM as well as sAIM1 and sAIM2. Finally, the authors attempt to contextualize these findings by conducting studies on Arabidopsis mutants, showing that replacing sAIMs with cAIMs causes increases sensitivity to ER stress and apparently increases formation of C53 intracellular puncta that may colocalize with ATG8.

      From these data the authors concluded that the dual-ATG8 and UFM1 binding of C53 IDR regulates C53 recruitment to autophagosomes in response to ER stress.

      Major Issues:

      1. The phylogenomics analysis conclusion that UFM1 is common in unicellular lineages and did not evolve in multicellular eukaryotes is not novel, as another comprehensive analysis of UFM1 phylogeny, published eight years ago - in 2014 - by Grau-Bové et al. (PMID: 25525215), also reported that UFM1, UBA5, UFC1, UFL1 and UFSP2 were likely present in LECA and lost in Fungi. Although the phylogenomic analysis by Picchianti et al. is also extended to DDRGK1 and C53 proteins, and some parasitic and algal lineages, their findings are incremental. Their proposed coevolution of sAIM and UFM1 is based on presence-absence correlation observed within five species (i.e., Albugo candida, Albuco laibachii, Piromyces finnis, Neocallimastix californiae, Anaeromyces robustus). However, this coevolutionary relationship must be further investigated by substantially increasing the taxonomic sampling within the UFM1-lacking group.
      2. The manuscript presents an overwhelming amount of biochemical and structural data obtained from a variety of protein binding techniques (i.e., NMR spectroscopy, in vitro GST-pulldown, fluorescence microscopy-based on-bead binding assays, and native mass-spectrometry). The results are poorly explained and not organized in a logical manner. Moreover, no attempt was made to explain the rationale behind using one technique over the other or how one method complements another to build a stronger conclusion than any individual approach. Given that none of the methods employed report quantitative measurement of binding affinities between C53 IDR and UFM1 or ATG8, it is not clear how the data presented in this manuscript contribute to our understanding of the proposed competition model for UFM1 and ATG8 binding to C53 IDR. To conclude that an interaction is "stronger" or "weaker" it is necessary to measure equilibrium binding constants. Fortunately, there are suitable techniques, including surface plasmon resonance (SPR), microscale thermophoresis (MST), fluorescence anisotropy, or calorimetry that are available to dissect these complex competitive binding interactions and to build models.
      3. The NMR studies have the potential to dissect the types of dynamic binding inherent in unstructured proteins. However, the abundant NMR data presented combined with the aforementioned binding studies, remarkably, do not seem to significantly advance our understanding of how the system is organized or even how UFM1 and ATG8 bind C53, beyond the rather vague and somewhat circular conclusion stated in the abstract: "...we confirmed the interaction of UFM1 with the C53 sAIMs and found that UFM1 and ATG8 bound the sAIMs in a different mode." Or on line 165 "Altogether these results suggested that ATG8 and UFM1 bbind the sAIMs withn C54 IDR, albeit in a different manner".
      4. The functional assays performed in Arabidopsis do not support the competitive model between UFM1 and ATG8 for binding to C53 during C53-mediated autophagy. The fluorescence microscopy images do not provide convincing evidence of colocalization between C53 and ATG8. In fact, in contrast to the claims made in the text or the quantification, mCherry-C53 fluorescence does not seem to localize in discrete puncta and its signal does not seem to overlap with ATG8A.

      Minor Issues:

      1. The authors might choose to avoid teleological arguments such as (line 135): "As the phylogenomic analysis suggested that eh sAIMs have been retained to mediate C53-UFM1 interaction..."
      2. The authors refer on multiple occasions to C53 "autoactivation" without defining what they mean by this. Do they propose that C53 UFMylates itself?.
      3. The paper might want to avoid preachy philosophical statements like "Our evolutionary analysis also highlights why we should move beyond yeast and metazoans and instead consider the whole tree of life when using evolutionary arguments to guide biological research." (333-335). While this is indeed a laudable goal, given the rather limited insights from this study, it is unclear how this paper exemplifies the notion.

      Referees cross-commenting

      Referee #2

      The challenge in providing a fair review of this manuscript is to clearly define what contributions are novel, significant advances. It is difficult to tell the way the manuscript is written, as it is unclear how the new data - which are voluminous- actually advance the model already put forth by the same authors in two previous publications. It is also unfortunate that the authors overlooked the 2004 phylogenomics paper. There clearly are some new pieces of information here, but the overall increment in knowledge is rather minimal.

      Response from Referee #3

      I agree that the authors somehow steamroll the reader with a wealth of data. But I think this can be addressed by the authors by requesting a lot more justification and by giving them the opportunity to put the significant advances into their own words. This is, in my opinion, quite doable in course of a revision. Overall I have to say that I am very sympathetic with the cross-eukaryote reactivity approach that the authors have taken. It is quite intriguing.

      Response from Referee #2

      I agree that the cross-eukaryote approach is intriguing. Shouldn't we be concerned that the 2004 publication already made two of their key points (ie present in LECA, loss in Fungi). What is the incremental insight from this paper?

      I'd appreciate an opinion from an evolutionary biologist as to how strongly one can conclude functional co-evolution from such correlative data, especially given the rather small number of supporting examples. Is it also necessary to consider counter-examples- ie species that have sAIMs but no UFM1 (I believe that they found a few such cases)?

      Response from Referee #3

      Well with these deep evolutionary questions this is always a challenge. Where does one stop to sample more homologs for one's analyses (one from each supergroup [which are no longer recognised by the community])? In that sense, the authors are right to make the parsimonious base assumption that if X and Y interact in species A and B (no matter how distant they are related) then X and Y interacted in the last common ancestor of A and B. That being said, if I would have designed this study, I would have sampled more broadly for my in vitro cross-eukaryote approach. But also this, I think, could be carried out by the authors in a reasonable timeframe. Specifically, they have now sampled from Amorphea and Archaeplastida, they should add one from TSAR, one Haptista, one Cryptista, and one CRuM. If they synthesised the proteins via a company, they could have the constructs in a few weeks for about 1K Euro - I do not think that this would be an unreasonable request.

      Significance

      Overall, while the manuscript contains an abundance of new data, the overall conclusion of the work, stated in the title: "Shuffled ATG8 interacting motifs form an ancestral bridge between UFMylation and C53-mediated autophagy" does not constitute a significant advance beyond other published phylogenomic analysis (below) and the two previous papers by the same authors, including the 2020 paper "A cross-kingdom conserved ER-phagy receptor maintains endoplasmic reticulum homeostasis during stress (PMID: 32851973)" and the 2021 paper "C53 is a cross-kingdom conserved reticulophagy receptor that bridges the gap between selective autophagy and ribosome stalling at the endoplasmic reticulum PMID: 33164651)". While a regulatory interaction between UFMylation and autophagy is of potential importance, the data in this manuscript do not constitute a major advance and fail to provide new mechanistic insight to explain the role of C53 IDR in autophagy and its interplay with UFMylation

    1. Reviewer #2 (Public Review):

      The authors utilize the publicly available dHCP dataset to ask an interesting question: how does postnatal experience and prenatal maturation influence the development of the visual system. The authors report that experience and prenatal maturation differentially contribute to different aspects of development. Namely, the authors quantify cortical thickness, myelination, and lateral symmetry of function as three different metrics of development. The homotopy and preterm infant analyses are strengths that, on their own, could have justified reporting. However, I have concerns about the analytic approaches that were used and the conclusions that were drawn. Below I list my major concerns with the manuscript.

      PMA vs. GA vs. PT

      1. The authors seek to understand the contribution of experience and prenatal development, yet I am unsure why the authors focused on the variables they did. There are three variables of interest used throughout this study: Gestational age at birth (GA), postnatal time (PT), and postmenstrual age at the time of scan (PMA). The last metric, PMA, is straightforwardly related to GA and PT since PMA = GA + PT. In most (but not all) of the manuscript, the authors use PMA and PT, with GA used without justification in some cases but not in others.

      It is unclear why PMA is used at all: PMA is necessarily related to PT and GA, making these variables non-independent. Indeed, the authors show that PMA and PT are highly correlated. The authors even say that "the contribution of postnatal experience to the development was not clarified because PMA reflects both prenatal endogenous effect and postnatal experience." So, why not use GA at birth instead of PMA? Clearly, GA is appropriate in some cases (e.g., Figure S4 or in some of the ANOVA applications), and to me, it seems to isolate the effect the authors care about (i.e., duration of prenatal development). Perhaps there is some theoretical justification for using PMA, but if so, I am unaware.

      That said, I expect that replacing all analyses involving PMA with GA will substantially change the results. I do not see this as a bad thing as I think it will make the conclusions stronger. As is, I am left unsure about what the key takeaways of this paper are.

      2. Using GA instead of PMA will have several benefits: 1) It will be much simpler to think of these two variables since they contrast the duration of fetal maturation and time postnatally. 2) This will help the partial correlation analyses performed since the variance between the variables is more independent. It will also mean that the negative relationships observed between PT and cortical thickness when controlling for PMA (e.g., Figure 2h) might disappear (reversed signs for partial correlations are common when two covariates are correlated). 3) this will allow the authors to replace Figure 1a with a more informative plot. Namely, they could use a scatter of GA and PT, giving insight into the descriptive statistics of both dimensions.

      3. I suspect that one motivation for the use of PMA over GA is for the analysis in Figure 6. In this analysis, the authors pick a group of term infants with a PMA equal to the preterm infants. Since PMA is the same, the only difference between the groups (according to the authors) is the amount of postnatal experience. However, this is not the only difference between the groups since they also vary in GA (and now PT and GA are negatively correlated almost perfectly). I don't know how to interpret this analysis since both the amount of prenatal maturation and postnatal experience vary between the groups.

      Justification of conclusions and statistical considerations

      I had concerns about some of the statistical tests and conclusions that the authors made. I refer to some of these in other sections (e.g., the homotopy analyses), but I raise several here.

      4. I am not sure what evidence the authors are using to make this claim: "we found that the cortical myelination and overall functional connectivity of ventral cortex developed significantly with the PMA but was not directly influenced by postnatal time." Postnatal time is significantly correlated with cortical myelination, as shown in Figures 2g, 2h, 3b, 3c, and postnatal time is significantly correlated with functional connectivity, as shown in Figures 4h, 5c, 5d, and 5e. Hence, this general claim that "the development of CT was considerably modulated by the postnatal experience while the CM was heavily influenced by prenatal duration" doesn't seem to be supported: both myelination and thickness are affected by postnatal experience and prenatal duration (as measured by PMA). A similar sentiment is expressed in the abstract. Perhaps the authors suggest different patterns in the strength of change for PMA vs. PT across these metrics, but if so, then statistical tests need to support that conclusion, and the claims need to reflect that sentiment.

      Interestingly, Figure S4 presents a compelling ANOVA that does support this conclusion. Still, this result is relegated to the supplement, and it also uses GA, rather than PMA, making it hard to reconcile with the other claims made in the main text. Moreover, it uses ANOVAs, which dichotomizes a continuous variable. Here and elsewhere in the manuscript (e.g., Figures 3d, 3e), the authors split the infants into quartiles and compare them with ANOVAs. Their use for visualization is helpful, but it is unclear what the statistical motivation for this is rather than treating these as continuous variables like is possible with linear mixed-effects models. Moreover, it is unclear why the authors excluded half the data from the study (i.e., quartiles 2 and 3) in this ANOVA when all four quartiles could be used as factors.

      5. It is unclear what the evidence is to support the following claim: "Both CT and CM show higher correlation with PMA in the posterior than anterior region, and higher correlation in the medial than lateral part within the anatomical mask (Figure 2a and Figure S2b-c [sic])" From Figure 2 or Figure S2, I don't see a gradient. From Figure S3, there might be a trend in some plots, but it is hard to interpret since it is non-monotonic. More generally, is there a statistical test to support this claim?

      6. "and the interaction [sic] was more prominent in CM (simple effect: t = 10.98, p < 10-9) that in than CT (t = 2.07, p < 0.05)." Does 'more prominent' mean it is 'significantly stronger'? If not, then the authors should adjust this claim

      7. Are the authors Fisher Z transforming their correlations? In numerous places, correlation values seem to be added together or used as the input to other correlation analyses. It is unclear from the methods whether the authors are transforming their correlation values to make that use appropriate.

      Homotopy analyses

      The homotopy section is a strength of the paper, but I have doubts about the approach taken to analyze this data and some of the conclusions drawn. I don't expect any of my suggestions to change the takeaway of this section, but I do think they are essential criticisms to address.

      8. I do not think that the non-homotopic control condition is appropriate. In Arcaro & Livingstone (2017), the authors had 3 categories for this analysis: homotopic pairs (e.g., left V1 vs. right V1), adjacent pairs (e.g., left V1 vs. right V2), and distal pairs (e.g., left V1 vs. right PHA1). In the homotopy analysis performed by Li and colleagues, they compare homotopic pairs with all other pairs. I don't think that is generous to the test since non-homotopic pairs include adjacent pairs that should be similar and distal pairs that shouldn't be similar. This may explain why some non-homotopic distribution overlaps with the homotopic distribution in Figure 4c.

      9. Regardless of this decision, I think the authors should reconsider their statistical test. I think the authors are using a between samples t-test to compare the 34 homotopic pairs with the hundreds of non-homotopic pairs. This is statistically inappropriate since the items are not independent (i.e., left V1 vs. right V1 is not independent of left V1 vs. right V2, which is also not independent of left V3 vs. right V2). This means the actual degrees of freedom are much lower than what is used. Moreover, I am unsure how the authors do this analysis across participants since this test can be done within participants. The authors should clarify what they did for this analysis and justify its appropriateness.

      10. Could the authors speculate on why the correlations in homotopic regions are so much lower than what Arcaro and Livingstone (2017) found. I can think of a few possibilities: higher motion in infants, less rfMRI data per participant, different sleep/wake states, and different parcellation strategies. Regarding the last explanation, I think this is a real possibility: the bilateral correlation may be reduced if the Glasser atlas combines functionally heterogeneous patches of the cortex. Hence, the authors should consider this and other possible explanations.

      11. The authors assume that the homotopic analyses mean that there are lateral connections between hemispheres (e.g., "Furthermore, the connections among the ventral visual cortex have developed during this early stage. Specifically, the homotopic connections between bilateral V1 and between bilateral VOTC both increased with GA, indicating an increased degree of functional distinction"). While this might be true, it doesn't need to be. Functional connectivity can be observed between regions that lack anatomical connectivity. Instead, two regions could both be driven by another region. In this case, the thalamus might drive symmetrical activity in the visual cortex.

      Miscellaneous

      12. I am not sure what the motivation of this line is: "Moreover, those studies did not fully control the visual experience in the first few weeks of the subjects, thus cannot give a clear conclusion whether the innate functional connectivity is unrelated to postnatal visual experience." Arcaro, Schade, Vincent, Ponce, & Livingstone (2017) did control the visual experience of subjects. Moreover, the research here doesn't control infant experience in the way this sentence implies: it implies an experiment manipulation (i.e., fully control) rather than a statistical control that is done here. Consider rephrasing

      13. I am not sure why this claim is made: "Area V1 was selected because this region is the most basic region for visual processing and probably is the most experience-dependent area during early development". Is there evidence supporting this claim? Plasticity is found throughout the visual cortex, and I think which region is most plastic depends on the definition of plasticity. For instance, most people have the same tuning properties to gabor gratings (e.g., a cardinality bias), but there is enormous variability in face tuning across cultures.

      14. The abstract says 783 infants were included in this study, but far fewer are actually used. The authors should report the 407 number in the abstract if any number at all.

      15. Any comparisons of preterms and terms ought to be given the caveat that the preterm environment can be very different than the term environment: whereas a term infant goes home and sees friends and family without restriction, the preterm environment can be heavily regulated if they are in a NICU. Authors should either provide details about the environments of the preterms in their study, or they should consider how differences in the richness of visual experience - regardless of quantity - may affect visual development.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      Referee #1

      Evidence, reproducibility and clarity

      1. This manuscript constructs a gene expression model with various factors. Specifically, the effect of cell size on gene expression is considered, which is often ignored by previous studies. One interesting finding is that the absolute number of the gene products and the concentration can have different distributions. Some predictions of the models are validated by experimental data on E. coli and yeast. This manuscript uses the mean-field approximation for cell volume, which has good accuracy when the number of stages is large. The usage of the power spectrum has a satisfactory effect on studying the concentration oscillation.

      Response: Thank you for the positive comments.

      1. Overall the paper was very difficult to follow and digest easily because of all the different factors and mechanisms invoked. It is mainly an issue of providing sufficient details for each of the factors and organizing them in a systematic and logical way. Although there is a supplementary appendix, it was hard to keep track of all the elements in the main manuscript. Perhaps something like Fig 1 of the Appendix can be presented in the main body to outline all the ingredients and how they affect each other.

      Response: In the revised manuscript, we moved Supplementary Fig. S1 in the previous version into the main text to outline all the ingredients and how they affect each other (see page 8, Fig. 2). Moreover, we provided many details for each of the biological factors and tried to organize them in a more systematic and logical way (see pages 3-7).

      1. It might be good to provide a more detailed description of the goal (studying gene product number and concentration under different parameters) after introducing the full and the reduced models. A table of symbols would also be helpful.

      Response: In the revised manuscript, we added a table explaning the meaning of all model parameters (see page 4, Table 1). Moreover, we provided a detailed description of the goal of the present paper after introducing the full and reduced models (see page 7).

      1. Some technical details in the Methods section are in fact helpful in understanding the conclusions. They can be moved to the Results section.

      Response: In the revised manuscript, we moved many technical details in Methods and Supplementary Notes to the main text to help the readers better understand the conclusions (see pages 5-10).

      1. One concern is that the central concept of this manuscript, “stage”, is not thoroughly discussed. This concept should have some significant biological meaning, not just be coined for mathematical convenience.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it is believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. Fig. 1(b) is a little strange. For the left panel, the x-axis (stage) is discrete, then the volume (y-axis) should be a step function, not a straight red line.

      Response: In the revised manuscript, we added some red dots in the stage-volume plot to show the dependence of the mean cell volume vk on cell cycle stage k for the mean-field model (see page 3, Fig. 1). Moreover, we emphasized that the joining of these dots by a straight red line is simply a guide to the eye.

      Significance

      1. The main advance is a more complete model of gene expression under more realistic organism growth conditions.

      Response: Thank you for acknowledging the results of the manuscript.

      Referee #2

      Evidence, reproducibility and clarity

      1. Jia et al. introduce a modeling framework to represent stochastic gene expression, with an explicit representation of cell volume growth, cell cycle progression (and its dependency on cell volume) and gene dosage compensation. The model is very elegant and general in that it can represent a variety of situations, simply as a matter of parametrization. Under a simplifying assumption, the authors derive a number of metrics (include stationary distribution of gene product and power spectrum of gene product fluctuation dynamics), for both absolute number and concentration of gene product molecules. They use their model and derivations to examine under which conditions cell can achieve homeostasis in the concentration of the expressed gene product, despite changes in cell volume and gene copy number following replication. They also present and discuss the conditions giving rise to specific features (i.e. bimodality in stationary distribution, peak in power spectrum) and examine these features in experimental data to conclude to infer the underlying homeostasis strategies. The model is rather general and powerful. The simplifying assumption seems reasonable (and the authors investigate to some extent its limitations, i.e. Fig. 2). The conclusions are overall convincing.

      Response: Thank you for the positive comments.

      Major comments 1. My main concern is that the metrics that the authors use to assess concentration homeostasis (i.e. the γ parameter and the presence / absence of peak in power spectrum) do not seem quite appropriate to describe how much variability / fluctuations in concentration are driven by cell cycle effects. Indeed, the γ parameter measures how much the *average* concentration in each cell cycle stage varies throughout the cell cycle. However, this variability should be compared to the total variability due to both cell cycle effects and stochastic bursting dynamics. A given level of cell-cycle dependency (say γ = 0.2) could be very visible if gene expression is weakly noisy (e.g. B low and hni high) and completely invisible is gene expression is highly bursty (large B and small hni). In the latter situation, cell-cycle effects would be meaningless for the cell to minimize. In essence, reusing the authors notations, I think γ/φ1/2 , would be a more relevant metric to observe.

      Response: In the revised manuscript, we showed that the total concentration noise φ can be decomposed as φ = φext + φint, where φext is the extrinsic noise which characterizes the fluctuations between different stages due to cell cycle effects and φint is the intrinsic noise which characterizes the fluctuations within each stage due to stochastic bursty synthesis and degradation of the gene product (see page 11). Based on the above decomposition, we introduced a new metric γ = φext/φ, which characterizes the accuracy of concentration homeostasis. Clearly, the new metric γ reflects the relation contribution of cell cycle effects in the total concentration variability. All discussions about concentration homeostasis are based on the new metric γ in the revised manuscript. Moreover, all figures have been updated by using this new metric.

      1. Similarly, when inspecting the peak in the power spectrum, the weight of the Lorentzian function(s) creating the peak, should be compared to the stationary component (λN , uN in the authors’ notations).

      Response: We cannot quite understand why the weights uk of the Lorentzian functions should be compared to the stationary component uN . In fact, all the weights uk except uN are actually complex numbers and we are not so sure about the meaning of uk/uN . However in the revised manuscript, we emphasized that the power spectrum G(ξ) is normalized so that G(0) = 1 throughout the paper (see page 13). To better understand concentration oscillations and its relation to homeostasis, we depicted both γ and H as a function of B and hni (see Supplementary Fig. S5). As expected, the off-zero peak becomes lower as B increases and as hni decreases since both of them correspond to an increase in concentration fluctuations which counteracts the regularity of oscillations; noise above a certain threshold can even completely destroy oscillations. Furthermore, we found that γ and H have similar dependence on B and hni. This again shows that the occurrence of concentration oscillations is intimately related to the visibility of cell cycle effects in concentration fluctuations.

      A complementary analysis including these two points and a discussion the relative contribution of cell-cycle effects and bursting dynamics in the total variability/fluctuation of concentrations would be important to include.

      Response: In the revised manuscript, we made some complementary analysis and discussion about the relative contribution of cell cycle effects and stochastic birth-death dynamics in the total variability of concentrations (see pages 11-14).

      Minor comments 3. The dashed line on Fig. 3a is defined as κ = √ 2 1−β . First is this empirical or does it come from a derivation? Second, it seems incomplete since it should depend on w. Intuitively, this line should correspond to the value of κ that would best mimic balanced biosynthesis in the case where β 6= 1. In other words, κ should be so that hρB0 /V (t)iprereplication = hκρB0 /V (t)ipostreplication, which yields κ = 2w(1−β) ∗ (w − 1)/w ∗ [2w(β−1) − 1]/[2(1−w)(β−1) − 1]. This indeed simplifies into κ = √ 2 1−β when w = 0.5.

      Response: Thank you for providing such a beautiful derivation. In the revised manuscript, we added this derivation into the main text (see pages 12-13). Moreover, we also made it clear that this relation can also be obtained from the perspective of power spectrum (see page 14).

      1. η is used in the caption of Fig. 2, which is cited on page 4. But it is defined only 2 sections later, on page 6.

      Response: In the revised manuscript, we gave the definition of η in both Table 1 and the caption of Fig. 3 (Fig. 2 in the old version). Please see page 4, Table 1 and page 9, Fig. 3.

      1. w is used in the main text, but only defined in the caption of Fig. 3.

      Response: In the revised manuscript, we gave the definition of w in both Table 1 (see page 4) and on page 7.

      1. w is defined as “the proportion of cell cycle before replication”. Is this in terms of cell cycle stages (i.e. w = N0/N) or actual time?

      Response: In the revised manuscript, we made it clear that w represents the proportion of cell cycle duration before replication, which should be distinguished from the proportion N0/N of cell cycle stages before replication (see page 7). This is because the transition rate between cell cycle stages is an increasing function of cell size, which means that earlier (later) stages have longer (shorter) durations.

      1. Fig. 3 indicates that power spectra are normalized so that G(0) = 1, but G(0) = 10 on the first two graphs.

      Response: Corrected as suggested (see page 12, Fig. 4). Thank you.

      1. Page 11: “bimodality in the concentration distribution is significantly less apparent”. I would suggest rephrasing “bimodality in the concentration distribution is absent” since there should be no reference to “significance” and bimodality is either present or absent (binary), not less apparent.

      Response: Corrected as suggested. Thank you.

      Referees cross-commenting

      1. Regarding the comment from reviewer 3 that ”a direct validity test should use data sets of at least two types (total, nascent RNA, etc)”. I almost made a related comment in my review, but then I held it off: This issue with using nascent RNA data is that their model does not allow an ON state. They assume that gene products are produced in instantaneous bursts, which is a fair assumption if the lifetime of gene products is large compared to the time the gene stays ON. This is ok if the considered ”gene products” are mRNA or proteins, but not nascent RNAs (for which the lifetime is the time to transcribe the gene). I did not make this comment in the end because I think the model is useful regardless. To comply with reviewer 3’s request, maybe the authors could use distributions of mRNA and protein products, but I’m not sure that such data exists (since they need cell-cycle-resolved data).

      Response: It is not possible to validate our model with nascent mRNA data because the model in its present form cannot predict nascent mRNA fluctuations. This is because unlike mature mRNA, nascent mRNA cannot be assumed to decay via first-order kinetics. A detailed response is provided below to the original comment made by Referee 3. Regarding the comment on the use of cell-cycle-resolved data measuring mRNA and protein expression – while we agree it would make an excellent test of our model, we could not find such a dataset in the literature. We point out that our model, in its present form, is interesting as it is, as a detailed biological model of mature mRNA and protein number / concentration fluctuations in growing cells. Its predictions are yet to be fully confirmed and hence may stimulate the development of further experimental single-cell studies.

      Significance

      1. The advance of this paper is essentially technical. The authors present a model that incorporates and unifies previously studied effects (cell volume homeostasis, concentration homeostasis, bursting transcription). There is no major conceptual novelty, but the combination of these different aspects and the derivations that authors present are very valuable and might be applicable to interpret data in various species.

      Response: Thank you for acknowledging the results of the manuscript.

      Referee #3

      Evidence, reproducibility and clarity

      1. The manuscript analyses a phenomenological model of stochastic gene expression. The model couples bursty transcription with cell growth, division and DNA replication. The cell cycle is divided into a large number of stages whose exponential lifetimes depend on the cell volume. It is argued that concentrations of gene products are distributed according to mixed Gamma distributions, whereas the copy numbers follow mixed negative binomial distributions. The number of modes can be different for concentrations and copy numbers, for instance the copy numbers can be unimodal while concentrations are bimodal. The case when the mean concentration does not depend on the cell cycle stage is called perfect homeostasis. It is argued that perfect homeostasis leads to Gamma distribution of the gene product concentration and that deviations from a Gamma distributions result mainly from deviations of the concentration from perfect homeostasis. It is also proposed that concentration homeostasis is difficult to obtain. These qualitative predictions of the model are tested using two data sets, one for E.coli and another for fission yeast.

      Response: Thank you for acknowledging the results of the manuscript.

      Major comments 1. A huge number of states called “cell cycle stages” have exponential life times. On my opinion, this sequence of stages is just a technicality for keeping the model within a discrete Markovian framework. More natural choices are possible, such as piecewise deterministic Markov processes, age structured diffusions, etc. The biological significance (if there is any) of such states should be explained.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it is believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. The timescales of stochastic gene expression are not correctly taken into account. It is considered that during an exponential stage the bursting approximation describes gene expression in terms of Gamma distributions for concentrations and in terms of negative binomial distributions for copy numbers. This approximation is only valid if the lifetime of a stage is much larger than the time needed to generate a burst. For RNA, this condition cannot be fulfilled for a large number of states N and/or for two states promoters with a relatively long ON state. For the protein and/or in the case of translational bursting, the condition is even more difficult to fulfil. I agree with the Reviewer 2 that once the master equation accepted the results make sense. But my criticism is different and concerns the master equation itself. In this equation the burst is considered instantaneous, whereas it needs finite time in reality. Concerning nascent mRNA, ON/OFF etc. I disagree. The notion of instantaneous burst with well defined burst size and burst frequency on a stage has a meaning if the lifetime of this stage (which is not mRNA or protein lifetime) is short. The model validity should be clearly stated.

      Response: Thank you for pointing out this important issue. When we talk about the validity of the model, we should stick to the full model, instead of the mean-field model. This is because once the full model makes sense, the mean-field model must work well when N ? 15, as we have shown in Fig. 3 and Supplementary Fig. S3. Hence our reply is based on the validity of the full model. We will reply to the above comments from the following three aspects. First, we agree with the referee that in our model, we assume that the gene product is produced in instantaneous bursts with the reaction scheme G ρpk (1−p) −−−−−−→ G + kM, k ≥ 1, M d −→ ∅, (1) where the mean burst size scales as V (t) β . Of course, in reality there is a finite time for the bursts to occur. A more general assumption is that within each cell cycle, the gene expression dynamics is characterized by the following three-stage model: G ρ −→ G ∗ , G∗ r −→ G, G∗ sV (t) β −−−−→ G ∗ + M, M u−→ M + P, M v −→ ∅, P d −→ ∅, (2) where the first two reactions describe the switching of the gene between an inactive state G and an active state G∗ the middle two reactions describe transcription and translation, and the last two reactions describe the degradation of the mRNA M and the protein P. Here the synthesis rate of mRNA depends on cell volume via a power law form with power β ∈ [0, 1]. Dosage compensation can be modeled by a decrease in the gene activation rate (for each gene copy) from ρ to κρ/2 upon replication. Previous studies have revealed that the bursting of mRNA and protein has different biophysical origins: transcriptional bursting is due to a gene that is mostly inactive, but transcribes a large number of mRNA when it is active (r ? ρ and s/r is finite), whereas translational bursting is due to rapid synthesis of protein from a single short-lived mRNA molecule (v ? d and u/v is finite). Under the above timescale separation assumptions, both mRNA and protein are produced in a bursty manner with the reaction scheme described by Eq. (1). The burst frequency for mRNA and protein are both ρ before replication and κρ after replication. The mean burst size for mRNA is (s/r)V (t) β and the mean burst size for protein is (su/rv)V (t) β , both of which have a power law dependence on cell volume (see pages 5-6). In Supplementary Figs. S1 and S2, we compare the mRNA and protein distributions for the bursty model with the reaction scheme given by Eq. (1) and the three-stage model with the reaction scheme given by Eq. (2), where both models under consideration have a cell cycle and cell volume description. It can be seen that the distributions for the two models are very close to each other under the above timescale separation assumptions with the bursty model being more accurate as r/ρ and v/d increase. Moreover, we find that the accuracy of the bursty model is insensitive to the value of the number of stages N. Here the values of N are chosen so that the ratio of the average time spent in each stage (T /N, where T ≈ (log 2)/g is the mean cell cycle duration) and the mean burst duration time (1/ρ) ranges from ∼ 0.5 − 2. This shows that the effectiveness of the bursty model does not require that the lifetime of a cell cycle stage is sufficient long. Due to mathematical complexity, we only focus on the bursty model in the present paper. The consistency between the gene product distributions for the two models justifies our bursty assumption. Second, while we assume bursty expression here, our model naturally covers non-bursty expression since the latter can be regarded as a limit of the former. Hence all the conclusions in the present paper are applicable to both bursty and non-bursty expression. In the revised manuscript, we emphasized this point (see page 4 for a detailed explanation). Last but not least, if the lifetime of the gene product is much shorter compared to the lifetime of each cell cycle stage, then the gene expression dynamics will rapidly relax to a quasi-steady state for each stage. In this case, the gene product fluctuations at each stage can be characterized by a gamma distribution in terms of concentrations and by a negative binomial distribution in terms of copy numbers, and hence the distribution of concentrations (copy numbers) for a population of cells is naturally a mixture of N gamma (negative binomial) distributions. However, the powerfulness of our analytical distribution (see page 10, Eq. (8)) is that it serves an accurate approximation when N ? 1 without making any timescale assumptions. The effectiveness of our analytical distributions is validated in Supplementary Fig. S3 for three different cases: (i) the degradation rate d of the gene product is much smaller than the cell cycle frequency f; (ii) d and f are comparable; (iii) d is much larger than f. In the revised manuscript, we also emphasized these points (see page 10).

      1. DNA replication is a stochastic event and does not occur after a fixed number of exponential stages as it is considered in this model. Concerning replication: in the model this occurs after exactly N0 steps. In reality, replication occurs somewhere between the start of S and G2/M. N0 is in fact a random variable. Probably a new mean field assumption is needed here with some justification, but I have seen nothing in the paper.

      Response: We agree with the referee that replication of the whole genome occurs in the S phase, which occupies a considerable portion of the cell cycle and thus cannot be assumed to occur after a fixed number of exponential stages. However, our model is for a single gene and since the replication time of a particular gene is much shorter than the total duration of the S phase, it is reasonable to consider it to be instantaneous. In addition, recent experiments have shown that the time elapsed from birth to replication for a particular gene occupies an approximately proportion of the cell cycle, which is called the stretched cell cycle model. This is also consistent with our assumption that replication of the gene of interest occurs after exactly N0 stages. While replication occurs after a fixed number of stages, nevertheless the time of replication is stochastic since each stage has a random lifetime. In the revised manuscript, we emphasized these points (see pages 4-5).

      1. The results in the Methods were derived heuristically and their relation to the master equation (12) is not explicit (except for the part concerning moments and their power spectrum). Furthermore, one would like to have some estimates of the biases introduced by the mean field approximation. Concerning biases introduced by the mean field approximation: Figure 2 is a numerical simulation, some analytical estimates could be better. As Figure 2 looks rather convincing, I reclassify this as minor comment.

      Response: We agree with the referee that the derivation of moments is rigorous, but the derivation of the analytical distribution given in Methods is not rigorous and cannot be directly obtained from the master equation. In the revised manuscript, we emphasized that the analytical distribution is not exact but it serves as a very good approximation (see pages 10 and 22). We showed that the analytical distribution agrees well with stochastic simulations when the number of cell cycle stages N ≥ 15 (see page 9, Fig. 3 and Supplementary Fig. S3). The logic behind our approximate distribution is that while the gene product may produce complex distribution of concentrations (copy numbers), when the number of cell cycle stages is large, the distribution must be relatively simple within each stage and thus can be well approximated by a simple gamma (negative binomial) distribution (see page 22). Due to the complexity of our model, it is very difficult to provide any analytical estimates on the bias introduced by the mean-field approximation. Often the bias of an approximation can be estimated when the approximation emerges from a systematic method such as van Kampen’s system-size expansion (see Ref. [21]). However, our mean-field model cannot be seen as the zero order term of some expansion and hence it is not possible to calculate the next-order correction which would be needed to estimate the error. However, we have tested very large swathes of parameter space and found that the mean-field approximation always works well when N ≥ 15 which is the physiologically relevant regime for most types of cells (see discussion on P. 7).

      1. The model is not minimal and depends on a huge number of parameters. It is not clear how these parameters were found and if overfitting was avoided. One may have doubts about the identifiability of the parameter N. What difference is between N = 59 and N = 60 (the value of N for the cyanobacterium)?

      Response: In the revised manuscript, we used synthetic data to show that all the model parameters involved in our model (except d and β which can be determined based on a priori knowledge) can be accurately estimated from cell-cycle resolved lineage data of cell volume and gene expression (see Supplementary Note 7). We provided details of the parameter inference method, compared the input parameters with the estimated ones and verify that they are identifiable (see Supplementary Table 1). We did not use real data to test our inference method because we could not find cell-cycle resolved lineage data for mRNA or proteins. As we noted, this is in principle possible via cell-cycle fluorescent markers. We also note that parameter inference for less detailed but similar models have been made in our previous papers — the parameters related to cell volume dynamics have been inferred in E. coli (see Ref. [51]) and fission yeast (see Ref. [52]) using the method of distribution matching, and the parameters related to gene expression dynamics have be estimated in E. coli (see Ref. [40]) using the method of power spectrum matching. Moreover, for our purpose, i.e. to investigate the effect of cell cycle and cell volume on gene expression, we do believe that our model is minimal. We captured cell growth with only one parameter g, the degree of balanced biosynthesis with one parameter β (β = 0 corresponds to the case where the synthesis rate is independent of cell volume and β = 1 corresponds to the case where the synthesis rate scales linearly with cell volume), the variability in cell cycle duration with only one parameter N, gene replication with only one parameter N0, gene dosage compensation with only one parameter κ (κ = 1 corresponds to perfect dosage compensation and κ = 2 corresponds to no dosage compensation), and the variation of size control strategy across the cell cycle with two parameters α0 and α1 (αi → 0 corresponds to timer, αi = 1 corresponds to adder, and αi → ∞ corresponds to sizer). The biological meaning of the cell cycle stages were clarified in the revised manuscript (see page 4). For our purpose, we believe that our model cannot be simpler.

      1. The authors should make clear which cell biology aspects are important, which are less important, and which were neglected in the context of their problem. Thus, in their model, cell cycle acts on gene expression mainly by duplication of burst sources and thus by increase of burst frequency after replication. Another important source of gene expression variability during the cycle, the mitotic transcription repression, is neglected.

      Response: In the revised manuscript, we clarified which cell biology aspects are important for gene expression dynamics (see page 17). Specifically, in our model, cell cycle and cell volume act on gene expression mainly by (i) the dependence of the burst size on cell volume; (ii) the increase in the burst frequency upon replication; (iii) the change in size control strategy upon replication; (iv) the partitioning of molecules at division. Point (iv) strongly affects copy number fluctuations, while it has little influence on concentration fluctuations. In addition, in the revised manuscript, we also elucidated the limitations of our model including mitotic transcription repression and others (see pages 19-20).

      1. The validity test of the model is indirect. It was tested that the concentration distribution deviates from Gamma and that the deviation correlates positively to the lack of accuracy of the concentration homeostasis. However, many models can have this behaviour. A direct validity test should use data sets of at least two types (total, nascent RNA, etc.) allowing direct estimates of some model parameters (such as burst size and frequency using nascent RNA). Concerning parsimony, I think that the authors should test it. Are all the parameters identifiable? Is there any overfitting? They could use parameter uncertainty, comparison of training /testing errors, etc. Some details about the parameter fitting method should be provided.

      Response: Regarding the parameter fitting and identifiability we have provided a detailed response to a previous comment above. However we emphasize that for the generation of Fig. 7, we did not need to estimate all model parameters from data. Hence in the previous version of the manuscript, no such estimation was done — we simply extracted the homeostasis accuracy γ, the height H of the off-zero peak of the power spectrum, and the Hellinger distance D of the concentration distribution from its gamma approximation directly from data. Finally, we point out that our model can be used to predict the dynamics of mature mRNAs, but it cannot be used to describe the dynamics of nascent mRNAs. This is because nascent mRNAs do not decay via a first-order reaction but their removal, i.e. their detachment from the gene which leads to mature mRNA, is better approximated by a reaction with a fixed decay time. This models the elongation time of nascent transcripts which does not suffer from much noise because the RNAP velocity is to a good approximation constant along the gene. See e.g. the following two papers for details: H. Xu, S. O. Skinner, A. M. Sokac, I. Golding, Stochastic kinetics of nascent RNA. Phys. Rev. Lett. 117, 128101 (2016). S. Braichenko, J. Holehouse, R. Grima. Distinguishing between models of mammalian gene expression: telegraph-like models versus mechanistic models. J. R. Soc. Interface 18, 20210510 (2021). Because of the fixed delay, the delay telegraph model (the telegraph model with a delayed degradation reaction) is non-Markovian and very different from the usual Markovian telegraph model which describes the dynamics of mature mRNA within each cell cycle. See e.g. the Supplementary Information of the following paper: X. Fu, et al. Accurate inference of stochastic gene expression from nascent transcript heterogeneity. bioRxiv (2021). Given the mathematical complexity introduced by a fixed delay, using it to describe the dynamics of nascent mRNA within each cell cycle leads to a non-Markovian model that is even more analytically intractable than the present one for mature mRNA. While an interesting research question, this is clearly far removed from the scope of our current manuscript.

      Minor comments 8. The introduction could be more pedagogical. Right now it is just an accumulation of loosely related and sometimes abruptly introduced statements. For instance, we understand that the authors want to oppose their approach to other extant approaches. However, extant approaches should be better reviewed, some of them are aged structured and perfectly suited for analysing cell cycle data. It would be useful for the reader that an example of observation explained by their model and not explained by other models (age structured or not) is discussed in detail. The model of this work does not explain size control, it just assumes that this holds, and does not discuss cell population aspects. A more nuanced positioning of this approach with respect to the literature would be useful for judging its value.

      Response: In the revised manuscript, we rewrote the introduction part to make it more pedagogical (see pages 1-2). In particular, we compared three popular models describing the cell size dynamics and the associated size homeostasis. The advantages and disadvantages of the three models were discussed.

      1. The meaning of N should be discussed from the very start when the model is introduced.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it was believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. The authors call constitutive expression the situation when the mean copy number does not depend on the volume. This choice should be clarified as in general constitutive as opposed to specific, localised or transitory expression refers to non-regulated gene expression. It seems to me that in this context, expression is only partially constitutive (independent on the volume).

      Response: In the present paper, constitutive expression means that the gene product is produced one at a time and is not produced in a bursty manner. It does not mean that the mean copy number does not depend on the volume. In the revised manuscript, we provided a more detailed discussion about how constitutive expression can be viewed as a limit of bursty expression (see page 4).

      1. In figure 1b and for exponential growth the y axis should be log(volume) instead of volume. The mean field approximation is called both “of novel type” (Discussion) and “which has a long history of successful use in statistical physics” (p4). If something is novel, then one should clearly explain why.

      Response: In fact, the y-axis in Fig. 1(b) should be volume instead of log(volume). This is because the x-axis represents the cell cycle stage instead of the real time. Note that for the adder strategy (α0 = α1 = 1), it follows from Eq. (3) on page 7 that the mean cell volume at stage k is vk = v1 + (k − 1)M0/N0, which linearly depends on k. This explains why the red curves in Fig. 1(b) are straight lines instead of exponential curves. In the revised manuscript, we also explained why the mean-field approximation used is novel (see page 7). Specifically, we pointed out that the mean-field approximation is not made for the whole cell cycle, rather we make the approximation for each stage and thus different stages have different mean cell volumes. This type of piecewise mean-field approximation, as far as we know, is novel and has not been used in the study of concentrating fluctuations before.

      1. The word “cyclo-stationarity” is used with not much definition. If this means just stationary distribution of the gene products why not use just “stationarity” instead. What means “cyclo”? A number of properties were called “rare” but it is not clear on what grounds.

      Response: In the revised manuscript, we removed the term “cyclo-stationarity” and simply assumed that the copy number and concentration distributions of the gene product at each cell cycle stage have reached the steady state (see page 8). In addition, for each property that was called “rare”, we explained the reasons in detail (see pages 14 and 17).

      1. I did not find a proof that the copy number distribution has less modes than the concentration distribution.

      Response: In fact, it is very difficult to prove that the concentration distribution has less modes than the copy number distribution. However, we have tested very large swathes of parameter space and found that the number of modes of the concentration distribution is always less than or equal to that of the copy number distribution. In the revised manuscript, we emphasized this point (see page 16).

      Significance

      1. The strength of this work is that it incorporates in a stochastic gene expression model a number of ideas on size control and dosage compensation that were discussed elsewhere from a cell population point of view. However, the proposed model is based on a number artificial choices that are difficult to justify biologically: a huge number of cell cycle discrete states and inappropriate handling of the timescales characterizing stochastic gene expression. Furthermore, the model is not minimal but depends instead on a huge number of parameters. I found the paper difficult to read and in the results presentation is not suitable for biologists that would need more details on the justification of the modelling choices and on the experimental validation of the model.

      Response: All these points have been addressed in previous replies.

      1. For mathematicians, the calculations are rather standard and may seem trivial.

      Response: Our model is complex due to the coupling between gene expression dynamics, cell volume dynamics, and cell cycle events. It is far more complex than standard models of gene expression (see e.g. Refs. [2,84,85]) because of the large amount of biology encapsulated in it and we presented a first analytical- and simulation-based analysis of concentration fluctuations when concentration homeostasis is broken.

      The computations of many quantities in the present paper are non-trivial. First, we showed that the generalized added volumes before and after replication both have an Erlang distribution. Using this property, we computed the mean cell volume in each cell cycle stage which is needed in the mean-field approximation. Furthermore, the computations of the power spectrum of concentration fluctuations are also highly non-trivial. The analytical expression of the power spectrum allows us to precisely determine the onset of concentration homeostasis. While the computations of moments of concentration fluctuations are standard, we used to the moments to construct an analytical concentration distribution which serves as an accurate approximation when N is large. Our concentration distribution is generally valid when concentration homeostasis is broken and goes far beyond recent models for growing cells which require concentration homeostasis and which do not take into account DNA replication, dosage compensation and size control mechanisms that vary with the cell cycle phase (e.g. Ref. [26] ).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript analyses a phenomenological model of stochastic gene expression. The model couples bursty transcription with cell growth, division and DNA replication. The cell cycle is divided into a large number of stages whose exponential lifetimes depend on the cell volume. It is argued that concentrations of gene products are distributed according to mixed Gamma distributions, whereas the copy numbers follow mixed negative binomial distributions. The number of modes can be different for concentrations and copy numbers, for instance the copy numbers can be unimodal while concentrations are bimodal. The case when the mean concentration does not depend on the cell cycle stage is called perfect homeostasis. It is argued that perfect homeostasis leads to Gamma distribution of the gene product concentration and that deviations from a Gamma distributions result mainly from deviations of the concentration from perfect homeostasis. It is also proposed that concentration homeostasis is difficult to obtain. These qualitative predictions of the model are tested using two datasets, one for E.coli and another for fission yeast.

      Major comments:

      The model encompasses a number of artificial choices:

      • A huge number of states called "cell cycle stages" have exponential life times. On my opinion, this sequence of stages is just a technicality for keeping the model within a discrete Markovian framework. More natural choices are possible, such as piecewise deterministic Markov processes, age structured diffusions, etc. The biological significance (if there is any) of such states should be explained.
      • The timescales of stochastic gene expression are not correctly taken into account. It is considered that during an exponential stage the bursting approximation describes gene expression in terms of Gamma distributions for concentrations and in terms of negative binomial distributions for copy numbers. This approximation is only valid if the lifetime of a stage is much larger than the time needed to generate a burst. For RNA, this condition cannot be fulfilled for a large number of states N and/or for two states promoters with a relatively long ON state. For the protein and/or in the case of translational bursting, the condition is even more difficult to fulfil.
      • DNA replication is a stochastic event and does not occur after a fixed number of exponential stages as it is considered in this model. The results in the Methods were derived heuristically and their relation to the master equation (12) is not explicit (except for the part concerning moments and their power spectrum). Furthermore, one would like to have some estimates of the biases introduced by the mean field approximation. The model is not minimal and depends on a huge number of parameters. It is not clear how these parameters were found and if overfitting was avoided. One may have doubts about the identifiability of the parameter N. What difference is between N=59 and N=60 (the value of N for the cyanobacterium)? The authors should make clear which cell biology aspects are important, which are less important, and which were neglected in the context of their problem. Thus, in their model, cell cycle acts on gene expression mainly by duplication of burst sources and thus by increase of burst frequency after replication. Another important source of gene expression variability during the cycle, the mitotic transcription repression, is neglected.<br /> The validity test of the model is indirect. It was tested that the concentration distribution deviates from Gamma and that the deviation correlates positively to the lack of accuracy of the concentration homeostasis. However, many models can have this behaviour. A direct validity test should use datasets of at least two types (total, nascent RNA, etc.) allowing direct estimates of some model parameters (such as burst size and frequency using nascent RNA).

      Minor comments:

      The introduction could be more pedagogical. Right now it is just an accumulation of loosely related and sometimes abruptly introduced statements. For instance, we understand that the authors want to oppose their approach to other extant approaches. However, extant approaches should be better reviewed, some of them are aged structured and perfectly suited for analysing cell cycle data. It would be useful for the reader that an example of observation explained by their model and not explained by other models (age structured or not) is discussed in detail. The model of this work does not explain size control, it just assumes that this holds, and does not discuss cell population aspects. A more nuanced positioning of this approach with respect to the literature would be useful for judging its value.

      The meaning of N should be discussed from the very start when the model is introduced.

      The authors call constitutive expression the situation when the mean copy number does not depend on the volume. This choice should be clarified as in general constitutive as opposed to specific, localised or transitory expression refers to non-regulated gene expression. It seems to me that in this context, expression is only partially constitutive (independent on the volume).

      In figure 1b and for exponential growth the y axis should be log(volume) instead of volume.

      The mean field approximation is called both "of novel type" (Discussion) and "which has a long history of successful use in statistical physics" (p4). If something is novel, then one should clearly explain why.<br /> The word "cyclo-stationarity" is used with not much definition. If this means just stationary distribution of the gene products why not use just "stationarity" instead. What means "cyclo"?

      A number of properties were called "rare" but it is not clear on what grounds.

      I did not find a proof that the copy number distribution has less modes than the concentration distribution.

      Referees cross-commenting

      Part 1

      I agree with the Reviewer 2 that once the master equation accepted the results make sense. But my criticism is different and concerns the master equation itself. In this equation the burst is considered instantaneous, whereas it needs finite time in reality.

      Part 2 (response to Part 2 of Rev2)

      • concerning replication: in the model this occurs after exactly N_o steps. In reality, replication occurs somewhere between the start of S and G2/M. N_o is in fact a random variable. Probably a new mean field assumption is needed here with some justification, but I have seen nothing in the paper
      • concerning biases introduced by the mean field approximation: Figure 2 is a numerical simulation, some analytical estimates could be better. As Figure 2 looks rather convincing, I reclassify this as minor comment.
      • concerning nascent mRNA, ON/OFF etc. I disagree. The notion of instantaneous burst with well defined burst size and burst frequency on a stage has a meaning if the lifetime of this stage (which is not mRNA or protein lifetime) is short. The model validity should be clearly stated.
      • concerning parsimony, I think that the authors should test it. Are all the parameters identifiable? Is there any overfitting? They could use parameter uncertainty, comparison of training /testing errors, etc. Some details about the parameter fitting method should be provided.

      Significance

      The strength of this work is that it incorporates in a stochastic gene expression model a number of ideas on size control and dosage compensation that were discussed elsewhere from a cell population point of view. However, the proposed model is based on a number artificial choices that are difficult to justify biologically: a huge number of cell cycle discrete states and inappropriate handling of the timescales characterizing stochastic gene expression. Furthermore, the model is not minimal but depends instead on a huge number of parameters.

      I found the paper difficult to read and in the results presentation is not suitable for biologists that would need more details on the justification of the modelling choices and on the experimental validation of the model. For mathematicians, the calculations are rather standard and may seem trivial. I am a systems biologist with a background in mathematics and theoretical physics.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Jia et al. introduce a modeling framework to represent stochastic gene expression, with an explicit representation of cell volume growth, cell cycle progression (and its dependency on cell volume) and gene dosage compensation. The model is very elegant and general in that it can represent a variety of situations, simply as a matter of paramterization. Under a simplifying assumption, the authors derive a number of metrics (include stationary distribution of gene product and power spectrum of gene product fluctuation dynamics), for both absolute number and concentration of gene product molecules. They use their model and derivations to examine under which conditions cell can achieve homeostasis in the concentration of the expressed gene product, despite changes in cell volume and gene copy number following replication. They also present and discuss the conditions giving rise to specific features (i.e. bimodality in stationary distribution, peak in power spectrum) and examine these features in experimental data to conclude to infer the underlying homeostasis strategies.

      Major comments:

      The model is rather general and powerful. The simplifying assumption seems reasonable (and the authors investigate to some extent its limitations, i.e. Fig. 2). The conclusions are overall convincing.

      1. My main concern is that the metrics that the authors use to assess concentration homeostasis (i.e. the γ parameter and the presence/absence of peak in power spectrum) do not seem quite appropriate to describe how much variability/fluctuations in concentration are driven by cell cycle effects. Indeed, the γ parameter measures how much the average concentration in each cell cycle stage varies throughout the cell cycle. However, this variability should be compared to the total variability due to both cell cycle effects and stochastic bursting dynamics. A given level of cell-cycle dependency (say γ=0.2) could be very visible if gene expression is weakly noisy (e.g. B low and <n> high) and completely invisible is gene expression is highly bursty (large B and small <n>). In the latter situation, cell-cycle effects would be meaningless for the cell to minimize. In essence, re-using the authors notations, I think γ / ϕ^1/2, would be a more relevant metric to observe.
      2. Similarly, when inspecting the peak in the power spectrum, the weight of the Lorenztian function(s) creating the peak, should be compared to the stationary component (λ_N, u_N in thhe authors' notations).

      A complementary analysis including these two points and a discussion the relative contribution of cell-cycle effects and bursting dynamics in the total variability/fluctuation of concentrations would be important to include.

      Minor comments:

      1. The dashed line on Fig. 3a is defined as κ = sqrt(2)^(1-β). First is this empirical or does it come from a derivation? Second, it seems incomplete since it should depend on ω. Intuitively, this line should correspond to the value of κ that would best mimic balanced biosynthesis in the case where β≠1. In other words, κ should be so that <ρB' / V(t)>_prereplication = <κρB' / V(t)>_postreplication which yields κ = 2^(ω(1-β)) * (ω-1)/ω * [2^(ω(β-1))-1]/[2^((1-ω)(β-1))-1] This indeed simplifies into κ = sqrt(2)^(1-β) when ω=0.5.
      2. η is used in the caption of Fig. 2, which is cited on page 4. But it is defined only 2 sections later, on page 6.
      3. ω is used in the main text, but only defined in the caption of Fig. 3.
      4. ω is defined as "the proportion of cell cycle before replication". Is this in terms of cell cycle stages (i.e. ω=N_0/N) or actual time?
      5. Fig. 3 indicates that power spectra are normalized so that G(0)=1, but G(0)=10 on the first two graphs.
      6. Page 11: "bimodality in the concentration distribution is significantly less apparent". I would suggest rephrasing "bimodality in the concentration distribution is absent" since there should be no reference to "significance" and bimodality is either present or absent (binary), not less apparent.

      Referees cross-commenting

      Part 1.

      I agree with reviewer 1 that a table of symbols would be helpful. On reviewer 3's second Major Comment, I don't think that the "the lifetime of a stage [has to be] much larger than the time needed to generate a burst". From how the authors write and solve the master equation, I don't think that such a separation of timescale is necessary. The authors should indeed clarify this and if reviewer 3 is correct, then that's indeed a major limitation. On reviewer 3's second Major Comment, I don't think that the "the lifetime of a stage [has to be] much larger than the time needed to generate a burst". From how the authors write and solve the master equation, I don't think that such a separation of timescale is necessary. The authors should indeed clarify this and if reviewer 3 is correct, then that's indeed a major limitation. On reviewer 3's comment "DNA replication [...] does not occur after a fixed number of exponential stages", I don't think I agree with this statement. Cell cycle progression relies on an ensemble of biochemical reactions. Representing this as a set of exponential waiting-time distributions with different means is probably amongst the most general and agnostic ways of representing this. Whether these exponential waiting-times only depend on cell volume is another question. This actually links back to reviewer 3's first Major comment and reviewer 1's comment that the concept of "stage" should be better discussed.

      Regarding the need for "estimates of the biases introduced by the mean field approximation" (reviewer 3), I guess that's the goal of figure 2. Maybe reviewer 3 should make more explicit what she/he would like to see.

      Regarding the comment from reviewer 3 that "a direct validity test should use datasets of at least two types (total, nascent RNA, etc)". I almost made a related comment in my review, but then I held it off: This issue with using nascent RNA data is that their model does not allow an ON state. They assume that gene products are produced in instantaneous bursts, which is a fair assumption if the lifetime of gene products is large compared to the time the gene stays ON. This is ok if the considered "gene products" are mRNA or proteins, but not nascent RNAs (for which the lifetime is the time to transcribe the gene). I did not make this comment in the end because I think the model is useful regardless. To comply with reviewer 3's request, maybe the authors could use distributions of mRNA and protein products, but I'm not sure that such data exists (since they need cell-cycle-resolved data).

      I disagree with the statements that "the proposed model is based on a number artificial choices that are difficult to justify biologically" and that "the model is not minimal but depends instead on a huge number of parameters." In my opinion, the model is elegantly simple to capture the mechanisms under study (i.e. the effect of cell cycle and cell volume on stochastic gene expression). It is expressed so that the model captures a broad range of situations (i.e. it reduces to simpler models as a matter of choosing parameter values, e.g. \Beta=0 => transcription independent of cell cycle; \alpha => \infty cell cycle depends only on size ...). I do not think that a series of exponential distributions for cell cycle progression is inappropriate, it is the most agnostic and general way of representing an ensemble of biochemical reactions that would be meaningless to describe explicitly. Instead, only their dependency on cell volume is taken into account (and in a very general way, i.e. parameters 'a' and \alpha). It is fair to ask the authors to clarify the concept of "stage", but I see this model as being as simple as possible, but not simpler, for the authors' purpose.

      Finally, I agree that the paper is probably "not suitable for biologists" but disagree that "for mathematicians, the calculations are rather standard and may seem trivial."

      Part 2. Resp. to reviewer 3 on the master equation (Part 1 of Rev3):

      Ok, I understand better your comment. What you mean by "the time needed to generate a burst" is the time that the gene produces RNAs, not the lifetime of the gene product (which is 1/d). That's true. It is essentially the same ifdea as what I write in my previous comment about nascent RNA data not being well captured by the model. Again, I think this is fine for "gene products" that are somewhat stable (not the case for nascent RNAs, but ok for mRNAs and proteins). This is fine by me as long as the authors explicit better this limitation of their model.

      Part 3. Response to Reviewer 3 (Part 2 of Rev 3)

      • concerning replication: Note that the mean field approximation is on cell volume, not on stage progression ("To simplify this model, [...] we ignore volume fluctuations at each stage but retain fluctuations in the time elapsed between two stages", p3). So the time at which replication occurs is already a random variable in the model. It is the sum of all the exponentially distributed random variables corresponding to stages 1 to N_0. The resulting distribution of replication time from the start of cell cycle is a random variable, which can be anything from very deterministic (N_0 very high) to very variable (N_0 very low).
      • concerning nascent mRNA, ON/OFF etc. : I'm not sure I get your objection, but the best is probably to let the authors respond to your original comment.
      • concerning parsimony: Ok, you're right. The authors should test it.

      Significance

      The advance of this paper is essentially technical. The authors present a model that incorporates and unifies previously studied effects (cell volume homeostasis, concentration homeostasis, bursting transcription). There is no major conceptual novelty, but the combination of these different aspects and the derivations that authors present are very valuable and might be applicable to interpret data in various species.

      The paper is suitable for a physics/mathematics/computational audience. It is rather technical and would not be understood by readers with only a biology background.

      Field of expertise of the reviewer: Gene regulation, single-molecule imaging, stochastic modeling.

    1. The hypocrisy and the cruelty are maddening.

      I have a general idea of Amanda Knox's story but I had never heard any specific details about the story like names or places or how she was treated. I find that with most aspects of society, especially with online activities, the people do tend to go for the crazy and outlandish stories. Once most people make up their minds about a person or story then it can be hard to change their viewpoints. No matter how many times Amanda may want to show the proof that she is an innocent person caught in the wrong place at the wrong time those people who paint her in a certain light will never change their viewpoints. Another story I can think of that shares some similarities is the Gypsy Rose case. Now Gypsy was active in the crime whereas Amanda was not active in her alleged crime. The main similarities between the two stories is how the media grabbed hold of it and that there are shows, movies, characters, etc, that are based of these real life people and the real things that happened to them. There are plenty of people who want to hold Gypsy as accountable as her at the time boyfriend and others who think she was innocent but a product of her surroundings. The way this little girl, that we were told at the time of the crime, was painted as a monster is insane to think about. But if that can happen to a young woman then anything can be thought about a mid twenties adult woman in a foreign country. The way the public romanticizes or dehumanize a person for actions they may or may not take can be insane to think about. These people who get treated this way almost never get to go back as a normal everyday person.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We wish to thank all three reviewers for their thorough examination of our manuscript and their constructive criticism that allowed us to increase its quality. You will see that, following their recommendations, we have included a good amount of new data in the manuscript. Specifically, we added a new figure with experiments proposed by the reviewers (now Fig. 4), as well as Figs. S3 and S4. In addition, we expanded one paragraph of our Discussion to comment on a very recent article published by Huang et al in Nature Structural and Molecular Biology with conclusions pertaining the interplay of Rpd3 and Gcn5 in PHO5 gene regulation. Below we include the point-by-point response (in blue) with the changes we have implemented to address their specific points. All the additions and changes in the manuscript are made in red.

      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Novačić et al., investigate into a mechanisms of the non-coding transcriptiondriven regulation of the phosphate-responsive PHO5 gene. The authors employ CRSPRi system to discern direct contribution of the antisense non-coding transcription (CUT025) expressed during phosphate -rich conditions to transcriptional repression of the yeast PHO5 gene and therefore challenging previous study from the Svejstrup's lab that proposed a positive role for non-coding transcription in control of PHO5 gene. They propose a model where non-coding transcription represses PHO5 by mediating recruitment of Rpd3 histone deacetylase leading to altered chromatin structure at PHO5 promoter due to reduced recruitment of the RSC chromatin remodelling complex. Overall, the data presented in the manuscript are of a good quality, experiments are well controlled and nicely presented. Manuscript is well written. My specific comments are below: 1. I am somewhat confused by the data presented in Figure 5. While there is similar impact on the chromatin structure seen in rrp6D and air1Dair2D strains (Fig 5C) that corresponds to more "closed" configuration of chromatin , it is not consistent with H3 ChIP data that show higher nucleosome occupancy across PHO5 UAS in rrp6D but loss of nucleosomes in the double mutant (or there is a mistake perhaps while plotting the data?)

      We now realize that the data was plotted confusingly, and we apologize for it. While doing the H3 ChIP experiment we only prepared the +Pi samples for the air1Δ air2Δ double mutant. In the figure we only included this one data point for the double mutant, which could lead to the false conclusion that at other timepoints there are no histones at its PHO5 promoter region. We decide to remove this data point from the figure to avoid the confusion and only keep the air1Δ air2Δ data for the ClaI assay. We believe that this should not be an issue as this data point is not critical for the conclusions we are making.

      1. To further explore direct link between nc transcription, Rpd3 and rrp6 mediated effect, I suggest to test the effect on PHO5 induction upon rpd3 and rrp6 deletions in CRISPRi CUT025 background.

      We performed this experiment and now include it as Fig. S3 in the manuscript. As expected, expressing the CRISPRi system only made difference when Rpd3 was present.

      1. It seems that most noticeable effect of blocking nc transcription by an elegant approach that utilizes CRISPRi system on the phosphatase activity is seen between 0-1.5h of induction. I suggest taking additional time points at 30-45 min.

      We took additional timepoints and the results were incorporated as the new Fig. 5E. The CRISPRi effect resulting in higher acid phosphatase activity was still most noticeable after 1,5 h of induction. This was mostly in line with the fact that the difference in PHO5 mRNA levels was most pronounced after 30 min of induction (Fig. 5D), as the time needed to achieve measurable protein level after induction can lag significantly for secretory proteins, such as acid phosphatase. Secretory proteins are cotranslationally translocated into the ER, after which they traverse the secretory pathway and undergo modifications before being finally exported to the periplasm where their activity can be measured. Consequently, the increase in acid phosphatase activity upon induction is only measurable after at least an hour.

      1. How do authors explain that the effect of the exosome mutations are reversed and phosphatase activity is increased at later time point (20 h, Fig 2A)? I suggest using more distinct colour for dis3 mutants.

      That effect is indeed somewhat surprising. We hypothesize that the effects we are seeing after 20 h reflect the specific conditions of prolonged induction, i.e. keeping the chromatin open or semi-open for a very long period of time, which do not necessarily reflect the early gene induction period that we are using as a read-out of the effect of different mutations on acid phosphatase expression kinetics. We previously noticed a similar effect with chromatin remodeler-related mutants (e.g. rsc2Δ, unpublished result from S. Barbarić group), which speak in favour of the prolonged induction conditions resulting in a chromatin state with its own specialized cofactor requirements. We therefore consider the chromatin state after prolonged induction a topic for another study, however, we now comment on this effect in the manuscript. The dis3 mutants are now shown in more distinct colours.

      1. Figure 5A -label "H3 ChIP"

      The label was added.

      1. Error bars are quite high in Fig 1C, perhaps it is worth repeating the experiment

      Since significant differences in PHO5 mRNA levels can be seen between wt and rrp6Δ mutant cells at 0,75 and 3 h of induction, we feel that the higher error bars at 5 h of induction are not worth repeating the experiment – especially since the values are bound to converge to a similar one after a longer induction period, as demonstrated in Fig. 1D.

      Significance

      significant of interest for general audience

      Referee #2

      Evidence, reproducibility and clarity

      The authors study the PHO5 locus, which is known to a have antisense transcript and that has previously been shown the be important for activation of Pho5 sense transcription. The authors challenge the idea by an extensive analyses. They show the Pho5-AS represses sense transcription, and thus fits in the category as AS repressors instead of activators. They show a correlative data that when antisense goes down and sense goes up. They show that increase antisense levels leads to decrease sense levels. They use mutants of decay pathways to increase the levels antisense transcription. Moreover, they used crispri to repress the antisense transcript. Lastly, they show that histone deacetylation represses Pho5 sense. The data in the manuscript is convincing, and well presented. One thing that needs further clarification is the strategy to increase anti-sense levels by deletion mutants of decay or depletion of decay pathways. While it is clear that this stabilizes the pho5-AS and decrease pho5-sense, it is not clear that this causes an increase in transcription. Perhaps, it is possible that antisense transcript itself has a repressive effect. If one really wanted to increase antisense transcription than the antisense promoter should be increased in strength. On the other the CriprI experiment is very convincing. I am surprised how well the crisprI system works, it is thought to be not so efficient at blocking elongating polymerase and good at blocking initiation.

      We thank the reviewer for this feedback. We performed additional experiments which you will find described below. Based on the results, we would like to keep the point about AS transcription causing the effect.

      Major comments: - Are the key conclusions convincing? Perhaps, the conclusion that increased transcription leads to repression is not completely convincing. The authors use mutants in rrp6, exosome, and nrd1 to increase Pho5-AS transcription elongation. However, I am always under impression that these mutants stabilize the transcript. And the authors acknowledge this in their manuscript. So how do you discriminate between increased stability versus increased elongation? I support the conclusion that inhibition of Pho5-AS leads to increase Pho5-S. However, increase in elongation is not directly demonstrated. While still possible, it is equally possible that a more stable pho5-AS transcript has a repressive an effect on Pho5-AS. - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? See above. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. If the authors want to keep the message that increased transcription of Pho5-AS leads to more repression that may need to consider additional experiments. For example, increasing transcription from the antisense promoter.

      We performed the proposed experiment and now include it in the manuscript as Fig. 4AB. Briefly, we inserted the strong constitutive TEF1 promoter in the antisense configuration downstream of the PHO5 gene ORF, so that it drives AS transcription. The results of this experiment very clearly show the inverse relationship between PHO5 mRNA and AS transcripts levels at +Pi conditions. Importantly, this strong constitutive AS transcription had an even more pronounced effect on PHO5 gene expression than deletion mutant backgrounds (in which, like in wt cells, the AS promoter is presumably weak), and did not allow for full level of PHO5 gene expression to be reached. To verify that the AS RNA itself does not have a regulatory role, but rather the act of its transcription represses the corresponding gene, we performed an additional experiment with appropriate diploid strains. The design of this experiment is standardly used to test whether an AS transcript can work in trans (for example see Nevers et al. 2018 NAR Fig. 6). This experiment is now included as Fig. 4C. Together, the results of these experiments paint a clear picture of AS transcription, and not AS level/stability itself, driving the repression of the PHO5 gene.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. To me this is an optional experiment, but it would benefit the manuscript
      • Are the data and the methods presented in such a way that they can be reproduced? yes - Are the experiments adequately replicated and statistical analysis adequate? yes

      Minor comments: - Specific experimental issues that are easily addressable. - Are prior studies referenced appropriately? yes - Are the text and figures clear and accurate? Yes - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? no

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The manuscript challenges previous work where it was claimed that Pho5-AS is important for activation of Pho5-S. As such, it is important work. In the field of noncoding the transcription the Pho5-AS fits in a class of AS transcript that has been well described.
      • Place the work in the context of the existing literature (provide references, where appropriate). See above.
      • State what audience might be interested in and influenced by the reported findings. In researchers in field of transcription, chromatin, and more specifically in yeast gene regulation.
      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Chromatin, transcription, yeast.

      Referee #3

      Evidence, reproducibility and clarity

      Novačić et al present a manuscript entitled "Antisense non-coding transcription represses the PHO5 model gene via remodeling of promoter chromatin structure" which is a locus-specific follow up to previous studies from Soudet and Stutz groups on genome-wide analysis of transcription interference mediated by antisense transcripts in S cerevisiae. Critically, the authors here employ a CRISPRi approach to reduce antisense transcription from reaching the PHO5 promoter and in doing so show that kinetics of PHO5 induction are increased as would be predicted from their previous model. Additionally, they show predicted epistasis between rpd3 and rrp6 on PHO5 expression and gcn5 and rrp6 that are consistent with their model. Comments are relatively minor but should be addressed. Introduction p3. "This mechanism was subsequently explored genome-wide in yeast, which revealed a group of genes that in the absence of Rrp6 accumulate AS RNAs and are silenced in an HDACdependent manner (14)." This sentence appears awkward- perhaps move "in the absence of Rrp6" to after "AS RNAs"?

      Corrected as proposed.

      p3 "Under a high phosphate concentration Pho4 undergoes phosphorylation by the cyclindependent-kinase (Pho80-Pho85)" Since "the" is used, don't use parentheses around Pho80-Pho85

      Corrected as proposed.

      Methods Give amount/concentration of glycine used in quenching formaldehyde for ChIP. Give the exact wash conditions and buffers not "extensively"

      All of those details are now provided in the manuscript. Figure 4C.

      Describe schematic in legend

      It is now described.

      Figure 4D. Indicate time of induction in legend.

      This was lacking for Figs. 4B-C (now 5B-C) so we added it there.

      Figure 5A. air∆ data are missing from later time points?

      Please see our first response to Reviewer 1. We removed the air1Δ air2Δ double mutant data, as we only had one data point for it in this assay.

      Figure 6. Legend needs to indicate what Pi conditions are. Since PHO5 expressed, appears to be low Pi. An issue that needs to be discussed is that rpd3∆ appears to decrease expression of PHO5 AS. Is this simply because of increased PHO5 expression? Does rpd3∆ have any effects on AS in high Pi? This is important to interpret if effects of rrp6 and rpd3 are epistatic or additive.

      We thank the Reviewer for bringing this to our attention. To explore the effect of rpd3Δ on PHO5 AS level, we quantified the PHO5 AS transcript by RT-qPCR with cells grown in (chemically defined) high Pi medium, which we now include in Fig. 7A. We find that rpd3Δ mutation has practically no effect on PHO5 AS transcript level both in the wt and the rrp6Δ mutant background. This result speaks in favor of rrp6Δ and rpd3Δ being epistatic rather than additive.

      Figure 7. Sth1-CHEC data are hard to interpret. Some sort of quantification might be required as effects are not clear from the browser track nor is it clear from browser track that the results are reproducible. Examination of Sth1-AA effects in gcn5∆ background might be more compelling that the effect on RSC is via acetylation. Otherwise it is a bit hard to say as RSC could be functioning in parallel to the acetylation-dependent pathways implicated.

      We agree that the presumption that histone acetylation recruits RSC to the PHO5 gene promoter had to be tested. We therefore include the experiment involving Sth1-AA depletion in the gcn5Δ background as Fig. 8A. This experiment was complicated by the fact that RSC is highly abundant (and at the same time essential for cell viability), but we resolved this by starting to deplete RSC two hours before gene induction. These results position RSC and Gcn5 in the same pathway. In contrast, more complete Sth1 depletion severely impaired viability of the rrp6Δ mutant, making it hard to interpret the effect, so we now include this result as Fig. S4.

      To show the effect of AS transcription on RSC recruitment to the PHO5 promoter more quantitatively, we re-analyzed the Sth1-CHEC data (for two independent biological replicates) and now include the log2 values for the changes in Sth1 binding in the text of the manuscript.

      Significance

      The work is focused and narrower in impact but important because direct tests of locus-specific effects are performed, validating models from previous genomic analyses. **Referees cross-commenting**

      I think the other reviews are very reasonable. I would just suggest to the authors that they think carefully about the reviews and decide what they think is most valuable to improving the work/presentation

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Revision Plan

      1. General Statements

      We really appreciate the positive comments and suggestions of the reviewers on our submitted manuscript. We think we will be able to solve the issues inquired by reviewers by adding new data and revising the phrases as detailed below.

      2. Description of the planned revisions

      Reviewer #1:

      Major comments

      Localization analysis of a transiently expressed MAP70 transgene with inactivating phosphosite mutations would be important to see whether the identified conserved phosphosites are relevant for MAP70 interaction with MTs. This experiment could be performed rapidly using transient expression in BY-2 cells.

      We agree on the importance of this analysis. Therefore we are currently preparing fluorescent markers of Nt-MAP70-2-like and its phospho-blocked (Ala) version to coexpress with MT and nuclear markers in BY-2 cells. We estimate that we need three more months to complete this experimsnt.

      The authors propose that PP2 blocks phragmoplast formation by preventing phosphorylation of class II Kinesin-12 proteins. In support, authors show that PP2 treatment correlates with a decrease in KIN12A phosphopeptide count (not fully abolished) and its failure to localize to emerging phragmoplasts in BY-2 cells and Physcomitrium. As class II Kinesis-12 proteins have been previously implicated in phragmoplast assembly this is a fairly reasonable hypothesis, but would benefit from the analysis of transgenic KIN12A variants carrying inactivating (A) or potentially activating (D/E) phosphosite mutations. Is loss of phosphorylation sufficient to prevent phragmoplast localization? Can an activated variant rescue PP2-induced KIN12A localization and cell division defects? As above, using transient expression in BY-2 cells would be a fast approach to tackle these questions.

      We are currently preparing fluorescent markers of phospho-blocked (Ala) and phospho-mimic (Asp) versions of KIN12A (PAKRP1) to coexpress with MT and nuclear markers in BY-2 cells. We will check whether they localize to phragmoplast and also test PP2 effects. We would need three more months to complete these analyses.

      Reviewer #2:

      Major comments

      • The manuscript would strongly benefit from being revised by a native english speaker. There are many unusual or awkward formulation, in particular in the abstract.

      We apologize for unnatural sentences. After adding new data and correcting the manuscript, we will ask a native english speaker to revise it.

      Reviewer #3:

      Major comments

      The major concern is lack of evidence to connect MAP70 and MT disruption upon treatment with PD-180970, in contrast to PP2, which was shown to affect localization of Kinesin-12. I wonder if authors could use taxol to stabilize MTs, then observe the localization of MAP70 with application of PD-180970?

      As we responded to reviewer 1, we are preparing the fluorescent marker of Nt-MAP70-2-like to coexpress with MT and nuclear markers in BY-2 cells. By using this multi-color marker, we will test whether PD-180970 affects the localization of MAP70 on MTs, also using taxol. However, in our experiene, taxol is not a very effective inhibitor and may not work in our transient expression system in BY-2 cells. In that case, we will analyze whether phospho-mimic (Asp) version can prevent MT disruption in the presence of PD-180970 to assess the relation of PD-180970, MAP70 and MT disruption.

      I have another concern on the action of PD-180970. PD-180970 appears to affect ubiquitously indispensable proteins for MTs. If PD-180970 disrupt MT by inhibiting phosphorylation of some MAPs, it must need time for turnover of proteins phosphorylated before PD-180970 was applied. In the proteomics experiment, author treated the cells with the compounds for 8-9 hr. On the other hand, in BY-2 cells, PD-18970 disrupted MTs only 30 min after application of PD-180970. I wonder if proteins were replaced during the 30 min. Could authors examine how long it takes to affect interphase MTs? If PD-180970 disrupts MTs in a 5-10 min like oryzalin, it is unlikely that inhibition of phosphorylation of proteins like MAP70 caused MT disruption. Rather, it may inhibit some proteins that have activity to disrupt microtubules but are usually inactivated by phosphorylation or inhibit something directly without phosphorylation.

      We agree that there is no evidence that PD-180970 disrupts MTs by inhibiting phosphorylation of MAP70. In our live-imaging system, in which reagents are added to liquid cultivation medium, the time from the reagent application to the arrival to each cell varies. Therefore, in order to accurately measure the time required for the inhibitor to take effect, it is necessary to design a new assay system, such as using fluorescent dyes to monitor the reagent's diffusion. In addition, since some reactions mediated by protein phosphorylation occur rapidly, minute-order observations might not be sufficient. Therefore, as an alternative strategy to assess the direct involvement of MAP70 phosphorylation on MT stabilization, we will examine whether PD-180970 induces MT disruption using strains expressing the phospho-blocked (Ala) and phospho-mimic (Asp) versions of MAP70 described above.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Minor comments

      The authors identified the analogs PD-166326 and PP1 as potent inhibitors of cell division. For completeness, it would be interesting to include a description of these arrest phenotypes and how they compare with that of PD180870 or PP2.

      We have added the effects of all tested compounds on Arabidopsis embryos in Fig. S3C and Table S1. Based on this data and the results of tobacco BY-2 cells, we have compared the effects of PD-166326 and PD180870, and PP1 and PP2 in Results.

      Although there are two more obvious candidates in the phosphoproteome datasets on which the authors focus on, there is very little discussion on whether the other top hits and whether they might be involved in cell division. On a related note, there is no discussion on the specificity of these compounds and the likelihood of phenotypes unrelated to cell division.

      We have added the information of “Similar proteins in Arabidopsis” and “Description and putative functions” for all identified candidates for PD-180970 and PP2 in Table S2 and S3, respectively. With referring this information, we have added the sections to describe the possible contributions of these candidates on MT organization and phragmoplast formation in Results. In addition, we have described the specificity of these compounds and the phenotypes unrelated to cell division in the section for the results of Arabidopsis roots (Fig. S2A).

      1st results section:

      "...developed into the globular stage without causing morphological defects..."

      Should omit the word "causing" or replace with "any/detectable"

      We have omitted the word "causing".

      Reviewer #2:

      Even if the identification of the kinase(s) targeted by these two compounds is missing, the characterisation of at least two downstream effectors of these elusive kinase(s) inhibited by PD-180970 and PP2 is an important step forward. I would recommend to this point make very clear in the writing (e.g. already in the abstract). Upon a superficial reading, the reader could assume that MAP70s and PAKRP1s are the direct molecular targets of these compounds.

      We appreciate the very positive comments. To clarify this point, in addition to the following responses to each suggestion, we have changed the last sentense of the abstract to “These properties make PD-180970 and PP2 useful tools for transiently controlling plant cell division at key manipulation nodes that are conserved in diverse plant species”.

      Major comments

      • I would modify the title to shift the emphasis from the methodology to the biological targets identified.

      We have changed the title to “Identification of novel compounds inhibiting microtubule organization and phragmoplast formation in diverse plant species”.

      • Concerning MAP70s the authors claim that there is little functional data about this family. Yet, a recent paper (https://www.science.org/doi/10.1126/sciadv.abm4974) identifies MAP70-5 as necessary for the proper organisation of CMTs in the endodermis and its ability to actively remodel to accommodate emergence of the lateral root primordium in Arabidopsis thaliana. This could provide a functional context to test several of the predictions that the authors list in the discussion.

      We have referred this paper in Results and Discussion, as “MAP70-5 was reported to increase MT length in vitro and to reorganize cortical MTs to alter the endodermal cell shape for lateral root initiation, suggesting that MAP70-5 mediates dynamic change of MT arrays”.

      Minor comments

      • The narrative would be improved by moving the section "PD-180970 and PP2 do not irreversibly damage viability" before the phosphoproteomic section.

      We have moved the “irreversibly” section to before the “phosphoproteomics” section.

      Reviewer #3:

      Minor comments

      In supplemental data, authors show only 12 or 14 candidates of the target. It is interesting how other MAPs including homologues of MAP70 and Kiesnin-12 in BY-2 cells were scored in the phospho-proteomics assay. I suggest authors show longer lists of proteomics including other MAPs. It would be valuable information for the research community.

      We apologize for not providing the complete dataset. We have added Dataset S1 of total protein sequences that we predicted from published RNA-sea data of BY-2 cells, and all identified proteins of phosphoproteomics assay for PD-180970 and PP2 in Datasets S2 and S3, respectively. We have moved the lists of top candidates to Tables S2 and S3.

      In Abstract, authors should mention that the two compounds reduced phosphorylation level of diverse proteins including MAP70 and Kinesin-12. This is very important results and, otherwise, it may cause misunderstanding of the activity of the compounds. In addition to this, it is better to rephrase the following sentence. "presumably by inhibiting MT-associated proteins (MAP70)" with "presumably by inhibiting phosphorylation of MT-associated proteins (MAP70)."

      To avoid such a misunderstanding, we have changed the descriptions in Abstract to “Phosphoproteomic analysis showed that these compounds reduced phosphorylation level of diverse proteins. In particular, PD-180970 inhibited phosphorylation of the conserved serine residues in MT-associated proteins (MAP70). PP2 significantly reduced the phosphorylation of class II Kinesin-12, and impaired its localization at the phragmoplast emerging site”. Due to this change, the suggested sentence was eliminated. Also in Discussion, we have mentioned the reduction of phosphorylation of various proteins by stating, "we found that PD-180970 and PP2 reduced the phosphorylation levels of diverse proteins. These parts may be further modified depending on the results of the phospho-blocked (Ala) and phospho-mimic (Asp) analyses.

      Page7 line 1st. it would be better to insert "of MAP70 family" after "in the conserved MT-binding domain" because the MT binding domains are unique to the MAP70 family. I could not understand why this is " (2nd line) consistent with PD-18970 severely disrupting all the tested MT structure". At current stage, there is no evidence that dephosphorylation of MAP70 caused the microtubule disruption. I suggest authors remove the sentence (", which was~MT structures").

      We agreed on both points and have corrected them as the reviewer suggested.

    1. the exhibition of Miss Clack’s character.

      I think this is an interesting way to phrase this. Taken by itself, we may have taken Clack's narrative as the truth, but when examined beside the other narratives, her unreliability is exposed and her character traits (and flaws) become clear.

    1. Author Response

      Reviewer #1 (Public Review):

      Viola et. al. compared the electron transfer efficiency of two types of oxygenic far-red photosystem II (PSII) with the "conventional" PSII and analyzed how these far-red PSII use the limited energy from infrared photons to proceed photosynthesis. Oxygenic photosynthesis is an energy-intensive process, and a large headroom is also needed for preventing harmful back-reactions from occurring, which can produce singlet oxygen. This research investigated how the far-rad PSII managed to do their work with limited energy.

      The authors measured and compared the forward reactions of different kinds of PSII (Chl-a-PSII, Chl-d-PSII and Chl-f-PSII), including the flash-induced chlorophyll fluorescence decay and S-states turnover. These results led to a conclusion that the forward reaction quantum efficiency was not changed between "conventional" PSII and far-red PSII. However, the back-reactions of three types of PSII are different based on the measurements of the prompt fluorescence decay, delayed luminescence decay, and thermoluminescence band locations. The authors concluded that the two far-red PSII (Chl-d-PSII and Chl-f-PSII) have a different strategy for utilizing infrared light. Indeed, the authors showed that Chl-d-PSII containing cyanobacteria produced more singlet oxygen than other types, and this result was explained by the energy profile in the electron transfer chain.

      The major strength of this research is the authors made a direct comparison of different far-red PSII under the same conditions. It's exciting to have a side-by-side comparison between two types of far-red PSII. In addition, the authors also measured the singlet oxygen produced from all types of PSII which clearly showed the differences in the routes of recombination.

      We thank the reviewer for the interest demonstrated in our work and for the thoughtful comments, that we have addressed below.

      However, there are some concerns:

      1) The flash-induced fluorescence decay, thermoluminescence, delayed luminescence and S-states turnovers of the Chl-d-PSII and Chl-f-PSII have been characterized before (ref 5, 26, 39), but from intact cells compared to isolated membranes in this study, and similar conclusions have been achieved. The authors mentioned four reasons (lines 115-120, see the manuscript for the authors' arguments "i." to "iv.") why it's important to use isolated membranes. However, in my opinion, these reasons are not sufficiently strengthened:

      i. The transmembrane potentials from cells can be collapsed by adding uncouplers;

      ii. The authors mentioned the quinone pool in the cells is uncontrollable, but the authors didn't actually measure or manipulate the quinone pool in the membrane (e.g., the ratio of QB/QB-/empty-pocket in the samples);

      iii. The phycobilisomes can be controlled by different conditions through state transitions;

      iv. The isolation of membranes may not remove membrane-related quenching mechanisms (e.g., PSII quenching in State II, spillover, etc.).

      We do not agree with the reviewer on this point. We consider the use of membranes (or isolated PSII) as being the best solution to limit the effects listed at the end of the Introduction and to provide consistency between the different measurements, some of which cannot be performed in intact cells (i.e., the UV absorption measurements). More specifically:

      i) The effectiveness of uncouplers in dissipating the membrane potential is likely to vary between species (e.g., Chroococcidiopsis cells form aggregates incapsulated by a protective layer of excreted polymers) and should be assessed by directly measuring the membrane potential. ElectroChromic Shift-based measurements of the membrane potential in cyanobacteria have only been demonstrated in Synechocystis sp. PCC6803 and Synechococcus elongatus sp. PCC7942 (Viola et al. 2019, https://doi.org/10.1073/pnas.1913099116) and still need to be adapted to the far-red species used here. Additionally, commonly used uncouplers such as CCCP and FCCP are ADRY reagents, that interfere with PSII water splitting by directly reducing TyrZ (Ghanotakis et al. 1982, https://doi.org/10.1016/0005-2728(82)90115-3), and would affect all the measurements presented in this work.

      ii) In the dark, the redox state of the PQ pool in cyanobacterial cells has been observed to be kept in a highly reduced state by respiration, with potential consequences on the QB/QB- ratio. This could well vary between species, based on their different physiologies and growth conditions. In isolated cyanobacterial membranes and PSII, the QB/QB- ratio is expected to be around 50% after a short dark adaptation. This seems to be the case in our samples, based on the flash-dependent oscillations of the S2QB- and S3QB- thermoluminescence shown in Appendix 2 compared to the literature (Rutherford et al. 1982, https://doi.org/10.1016/0005-2728(82)90061-5), assuming an initial ~75% S1 population, as confirmed by the flash-dependent oxygen evolution and UV absorption. This is now mentioned in Appendix 2.

      iii) The control of state transitions requires specific illumination regimes incompatible with the conditions required for our experiments. Moreover, state transitions remain largely uncharacterised in the far-red species used in the present work. In some of these species, the situation is further complicated by the presence of both visible and far-red light-absorbing phycobilisomes that have a different spatial distribution in the cell (MacGregor-Chatwin et al. 2022, https://doi.org/10.1126/sciadv.abj4437).

      iv) Non-photochemical energy quenching in cyanobacteria seems to occur in phycobilisomes, due to the action of the Orange Carotenoid Protein (OCP). Both OCP and the phycobilisomes, if present in cyanobacterial cells (and that depends on the strains), are removed when membranes are isolated. It’s been proposed that direct quenching of the PSII core occurs in Synechococcus elongatus 7942 cells in state II (Choubeh et al. 2018, https://doi.org/10.1016/j.bbabio.2018.06.008), but since the mechanism has not been elucidated, no conclusion can be made on whether this could occur in membranes. The same is true for spill-over. Additionally, neither of the two mechanisms could be better controlled in cells than in membranes, so there would be no advantage here from working in vivo.

      In addition, the authors reached a conclusion that the Chl-f-PSII containing species should suffer from fluctuation light-induced membrane potential spikes, but don't actually measure this in physiologically relevant preparations. It will be more beneficial to use intact cells instead of an isolated membrane. I suggest the authors either restrict their conclusions to what the isolated membranes clearly show or make measurements in intact cells.

      The proposal that the far-red forms of PSII (both Chl-d-PSII and Chl-f-PSII) should suffer from increased charge recombination induced by spikes of membrane potential in fluctuating light is not new (see for example Nürnberg et al. 2018, https://doi.org/10.1126/science.aar8313), and is based on the observations made in plant PSII (Davis et al. 2016, https://doi.org/10.7554/eLife.16921) and assumed to be universal in oxygenic photosynthesis. In PSII, the transfer of electrons from the primary donor chlorophyll to QA occurs vectorially in the membrane, against the trans-membrane electric field, thanks to these electron transfer steps being exergonic. Spikes in the electric field due to sudden intensity fluctuations increase the probability of backward electron transfer. If the overall drop in the energy of the electron from the primary donor to QA is smaller (in a long wavelength PSII), it should result in a higher probability of backward transfer for a given trans-membrane electric field, and therefore a greater susceptibility to spikes in the electric field. We did not measure these effects and we do not claim to have done so. As already mentioned in the answer to point i) above, doing so would require the development of ElectroChromic Shift-based measurements of the membrane potential in the cyanobacterial species containing far-red photosystems. This is a separate research project beyond the scope of the present work.

      In conclusion, we believe that our statement justifying the use of isolated membranes at the end of the Introduction is valid.

      1. The authors measured the fluorescence decays as part of the evidence to show the stability of S2QA-. I have several concerns about these measurements:

      i. In figure 2B, the WL C. thermalis (blue) trace has a unique decay phase with a lifetime of about 0.2s, which the authors denoted as S2QA- recombination. Could the author elaborate on how this phase was assigned to this state?

      All decay kinetics in presence of DCMU are bi-phasic (with an additional faster phase in the WL and FR C. thermalis samples, attributed to a small fraction of centres where DCMU did not bind). In the manuscript we did originally assign both phases as arising from S2QA- recombination, but it is true that the middle phase, that is slightly faster in WL C. thermalis, is too fast to originate from that. This phase can rather be ascribed to TyrZ•(H+)QA- recombination occurring in a fraction of intact PSII centres before the full stabilization of charge separation, as shown in Debus et al. 2000 (https://doi.org/10.1021/bi992749w), or in centres lacking a Mn-cluster. We have now modified the paragraph regarding the fluorescence decay in presence of DCMU accordingly (L. 142-145): “The shorter lifetime (~0.22-1 s) of the middle decay phase (amplitude 15-20%) was compatible with it originating from TyrZ•(H+)QA- recombination occurring either in centres lacking an intact Mn-cluster (24) or in intact centres before charge separation is fully stabilised, as proposed in (23).”.

      A luminescence decay phase with a similar lifetime was initially ascribed, incorrectly, only to TyrZ•(H+)QA- recombination occurring in centres devoid of an intact Mn-cluster, in Appendix 5. This has now been rectified.

      ii. In figure S1 (the full version of 2B), all the fluorescence traces seem to rise at the end of the measurements. Could the authors check whether the measuring light intensity was actinic?

      This rise is significant only in the A. marina dataset (now Figure 2-figure supplement 1), and given the low signal to noise ratio in the last points of the fluorescence curve, we consider this small anomaly to be a measuring artefact. The rise is absent in the other traces in Figure 2- figure supplement 1 and in Figure 2B, except for the last point of the A. marina dataset in Fig. 2B. The corresponding Source data provided, shows that a rise in the last point of the measurements is only present in one of the three A. marina replicates (#2), while the non-decaying fluorescence is present in all A. marina samples and discussed in the text. Except for this last anomalous point, the decay curves of the A. marina replicate #2 do not differ significantly from the other two replicates. This clearly suggests an artefact, and is not consistent with the measuring light being actinic. A clarifying sentence has been added in the legend of Figure 2- figure supplement 1.

      iii. In figure S2, it seems to me that the fluorescence decay of Synechocystis + DCMU (Green open squares) was slower than the WL C. thermalis and is similar to the FRL C. thermalis in figure 2B. If the Synechocystis + DCMU is indeed similar to FR C. thermalis, would that be consistent with the authors' conclusions?

      When fitting the Synechocystis+DCMU fluorescence decay kinetics (in what is now Appendix 1-figure 1), we obtain two decay phases with, respectively: an amplitude of ~12% and lifetime of ~0.22 s, and an amplitude of ~81% and lifetime of ~7.9 s. These values are similar to those reported for WL C. thermalis in Table 1, with an overall fluorescence decay faster than in FR C. thermalis. Nonetheless, because of the limited number of Synechocystis biological replicates, we limit ourselves to a qualitative comparison. The luminescence decay kinetics are also faster in Synechocystis (as in WL C. thermalis) than in FR C. thermalis (now Figure 5- figure supplement 2).

      These data are consistent with our conclusions: the energy gap between QA- and Phe in Chl-f-PSII is at least as large as in Chl-a-PSII, or could even be larger, as suggested by the slower S2QA- recombination measured by fluorescence (Figure 2) and luminescence (Figure 3) decay.

      iv. It's known that DCMU will alter the redox potential of QA/QA- in plants. Would it have similar effects to the PSII studied in this research? If so, it will be meaningful to include these effects in the energy diagram in fig 7.

      Yes, we do expect DCMU to change the QA/QA- redox potential in our samples, as it does in plants and other cyanobacteria, although the actual effect in different PSII types would need to be measured. The energy gap values in now Figure 8 are only estimates based on literature values and on the relative changes reported here, they are not calculated from any of our data and do not specifically refer to the experimental conditions we used, including the use of DCMU. For this reason, we think that adding the effects of DCMU in the diagram would not be particularly useful and could be confusing.

      1. The authors didn't use WL C. thermalis for measuring oxygen evolution and the authors claimed that the PSII content in WL C. thermalis is too low. Is that a technical issue (e.g., cannot purify PSII enriched membranes) or a biological issue (i.e., white light condition produced less PSII)? In Fig S9C, the oxygen generated from WL C. thermalis is comparable to FR C. thermalis. Could the author explain how they reached the conclusion that PSII in WL C. thermalis was low? In addition, the author should also provide evidence showing that the samples of WL C. thermalis do not have significant PSII activity under far-red light.

      We did measure the flash dependence of oxygen evolution in WL C. thermalis membranes, and we did observe oscillations with visible flashes (but not with far-red flashes, as expected). However, the data were not good enough to be able to perform any significant analysis. Unfortunately, in the case of WL C. thermalis, we have not been able to isolate O2-evolving cores, as stated in L. 194-195. The WL C. thermalis data have now been added in Figure 3- figure supplement 1, together with the non-normalised traces of all other samples (following the suggestion by reviewer #3), and the text has been modified accordingly. The data in Figure 3- figure supplement 1 also provide evidence that the samples of WL C. thermalis do not have significant PSII activity under far-red light (although this was already clearly demonstrated in Nürnberg et al. 2018).

      We do have evidence that the PSII content per chlorophyll is lower in WL C. thermalis than in FR C. thermalis, based on fluorescence emission spectra, yield of isolated PSII and PSI from purification procedures, and O2 evolution per chlorophyll, as can be seen for example in Figure 3- figure supplement 1. The levels of PSII accumulation depend on the growth stage (among other factors) in model species such as Synechocystis. Since C. thermalis cells grow more slowly than other cyanobacteria species and their physiology has not been studied in detail yet, it is difficult to control the levels of PSII accumulation. This explains the inter-sample variability in the rates of O2 evolution per chlorophyll measured with the Clark electrode, that have now been added in Appendix 6-table 1.

      1. The authors used an indirect method, which used chemical trap histidine and oxygen consumption, for measuring the production of singlet oxygen from different types of PSII. I have several concerns about this approach.

      i. Why not use a probe that reacts directly with singlet oxygen probes like SOSG or EPR probes to unambiguously confirm the production of singlet oxygen? The difficulties of not using SOSG mentioned in Rehman et al (SI Ref#22) should be no longer problems when isolated membranes were used. The advantage would be a validation of the results and perhaps increased sensitivity.

      Although SOSG or EPR probes could also be used to detect singlet oxygen production, these other methods seem to be significantly less sensitive than histidine trapping. For example, Fufezan et al. 2007 (https://doi.org/10.1074/jbc.M610951200) used the EPR spin trap TEMPO and needed 30 minutes of illumination. Extended illumination (up to 1 hour) has also been used to detect singlet oxygen using SOGE (Flors et al 2006, https://doi.org/10.1093/jxb/erj181).With the histidine trapping method used here, less than 2 minutes of illumination were required to measure the singlet oxygen production rates. This allowed potential problems of prolonged illumination (e.g. a loss of intact PSII centres due to photodamage) to be minimised, and allowed us to confirm the results obtained in isolated membranes with those obtained in intact cells.

      As shown in now Figure 6- figure supplement 1E, the histidine-dependent oxygen consumption was suppressed by the singlet oxygen quencher sodium azide, as also shown in Rehman et al. 2013 (https://doi.org/10.1016/j.bbabio.2013.02.016). We also independently confirmed that the singlet oxygen generated by illumination of the dye Rose Bengal can be efficiently detected with the histidine trapping method and suppressed by the addition of sodium azide (Figure 6- figure supplement 1F). For these reasons, we are confident that what we measure with the histidine trapping method is singlet oxygen production.

      ii. In Rehman et al (SI Ref#22), wild-type Synechocystis cells showed significant production of singlet oxygen in the presence of DCMU and His (Figure 3A in SI Ref#22), however, the amount of singlet oxygen measured from the membranes in this study seemed to be less (Fig S10E). Could the authors provide some explanations?

      Fig. 3A in Rehman et al. showed that the production of singlet oxygen was about 10% with respect to the oxygen evolution activity in absence of additions (open squares). The light saturation curves in Fig. 4B of the same paper also show that at saturating light intensity the singlet oxygen production rate is about 10% compared to the O2 evolution rate. The traces we show in Figure 6-figure supplement 1 are only representative. The comparison should be made between the results in Rehman et al. and the averages of biological replicates that we show in Fig. 6 (membranes) and Appendix 6-figure 4A (cells). For WL and FR C. thermalis, we measure singlet oxygen production rates that are about 20% of the O2 evolution rates, slightly higher than those measured in Synechocystis in Rehman et al. Considering the variability between biological replicates, we consider our values in line with those in Rehman et al.

      iii. Can the presented results distinguish the production of singlet oxygen from recombination or other sources (e.g., antenna, free chlorophyll)? Some key controls are needed to strengthen the authors' claims.

      This is difficult to demonstrate unequivocally, but we have different lines of evidence that support the conclusion that the increase in singlet oxygen production in A. marina originates from differences in PSII charge recombination with respect to the other samples:

      i) The high levels of singlet oxygen production are observed in intact cells as well as in membranes. In neither of these samples do we expect to have significant amounts of damaged PSII or free chlorophyll, so these seem highly unlikely as the main sources of the singlet oxygen in our measurements. This is now stated more explicitly in L. 305 and Appendix 6.

      ii) According to the data in Appendix 6-figure 1B, singlet oxygen production in A. marina membranes shows a similar light saturation to that of maximal O2 evolution. This suggests that the singlet oxygen production we measure is related to PSII photochemistry. We have now stated this explicitly in L. 288-290.

      iii) Our thermoluminescence and delayed luminescence results indicate that in Chl-d-PSII the energy gap between Phe and QA is smaller than in Chl-a-PSII, as already suggested in the literature, and Chl-f-PSII. Therefore, this indicates more charge recombination going via repopulation of Phe- in Chl-d-PSII, with a consequent increase of singlet oxygen production.

      The antenna chlorophylls could form triplets under high light, by inter-system crossing, but in intact antennas the chlorophyll triplets are expected to be mostly quenched by nearby carotenoids (see https://www.jstor.org/stable/24030848 for a review on the subject). The generation of antenna triplet states in non-photoinhibitory conditions has been demonstrated in plant and algal thylakoids (Santabarbara et al 2002, 2007 doi: 10.1021/bi0201163, doi: 10.1016/j.bbabio.2006.10.007). Yet, these signals, which are attributed to a small population of damaged antennas, are small compared to those of triplets generated by charge recombination. Due to its apparently stochastic nature, the generation of antenna triplets by inter-system crossing is not expected to be significantly different between the different PSII complexes investigated in this study.

      On the other hand, it is generally recognised that in the PSII reaction centre, the carotenoid on the D1 side is not close enough to ChlD1 to directly quench its triplet state, when formed (see Telfer et al. 1994, https://doi.org/10.1016/S0021-9258(17)36825-4). The singlet oxygen produced in the reaction centre could disrupt the coupling between chlorophylls and carotenoids in the antenna, resulting in singlet oxygen production also from the antenna, in a cascade effect. This can happen with prolonged strong illumination (Fufezan et al. 2002, https://doi.org/10.1016/S0014-5793(02)03724-9).

      iv. I could not fully understand the singlet oxygen production experiments with tris-washed samples. In my opinion, the Mn-cluster depleted PSII should have accelerated charge recombination (100 ms between the YZ/QA, vs ~ 5 sec between the S2/QA), which should lead to an increase in singlet oxygen production. Correct me if I'm wrong about this, but if my reasoning is correct then how do the authors explain the discrepancy?

      Our rationale for performing the tris-washing experiment was indeed to see if this would lead to an increase in singlet oxygen production, thus implying that the high production in the A. marina samples could arise from a higher fraction of PSII centres without the Mn-cluster, as explained both in the main text and in Appendix 6. The fact that the treatment did not increase the singlet oxygen production suggests that this does not specifically arise from PSII lacking the Mn-cluster.

      The lack of singlet oxygen increase following tris-washing is not necessarily controversial, as the fact that TyrZ•QA- recombination is faster than S2QA- recombination does not necessarily imply that more of it occurs via backward electron transfer from QA- to Phe. The removal of the Mn-cluster could decrease the production of singlet oxygen by charge recombination, since it causes an increase in the redox potential of QA and, therefore, of the energy gap between Phe and QA, thus decreasing the probability of charge recombination going via the repopulation of Phe-. This is proposed to be a mechanism to protect PSII during photoactivation of the Mn-cluster (see Johnson et al 1995, https://doi.org/10.1016/0005-2728(95)00003-2).

      Our data show that the singlet oxygen production in A. marina is not specifically related to PSII lacking the Mn-cluster and are not in conflict with what is expected based on our knowledge of PSII energetics.

      v. The y-axes in Figure S10 should either contain "delta" (Δµmol O2 ml-1) or use the measured absolute oxygen concentration. I'd suggest the latter, since the reaction is oxygen consuming, it's good to show that all the samples started with similar amounts of dissolved oxygen. Low O2 levels could decrease 1O2 production, though this would be more of an issue with cells than membranes.

      The y-axis labels in the figures (now Figure 6-supplementary figure 1 and Appendix 6-figures 1D and E, 2, 3 and 4A) have been changed to Δµmol O2 ml-1. We prefer to show the traces after subtraction of the baseline recorded in the dark (now explicitly indicated in the corresponding figure legends) for a better visual comparison. All samples were left to equilibrate with air (stirred) before starting the measurements, so all started with similar levels of dissolved oxygen. This is especially important when measuring PSI-dependent oxygen consumption (Appendix 6-figure 3), because the addition of ascorbate and TMPD leads to a transient drop in oxygen concentration in the sample, which leads to artefacts in absence of the equilibration step. This information has been added to the corresponding Materials and Methods section (4.5). Additionally, when using Rose Bengal to generate singlet oxygen, the histidine-dependent oxygen consumption was about 10 times higher than in any of the measurements done with biological samples, and still we did not observe saturation of the signal in the illumination time used (added panel F in Figure 6- figure supplement 1). Therefore, we are confident that the singlet oxygen measurements in membranes and cells were not skewed by limiting oxygen concentrations in the measuring chamber.

      The y-axis labels of what is now Appendix 6-figure 1B and C have also been corrected (as ml-1 was used instead of h-1).

      Reviewer #3 (Public Review):

      In this manuscript, Viola and co-authors address the question of how far-red-light-adapted (FRL) Photosystem II (PSII) is able to bypass the "red limit", or the minimum photon energy/frequency for charge separation to proceed effectively. They attempt to do so primarily by measuring the consequence of failure to overcome the red limit: charge recombination. From this work they have concluded that FRL PSIIs are able to achieve similar efficiency of flash-induced water-oxidizing complex turnover to those adapted to standard visible light. However, they conclude that FRL PSII which uses chlorophyll-d is significantly more susceptible to charge recombination and singlet oxygen formation, leading to increased sensitivity to high-light conditions. FRL PSII which uses chlorophyll-f, however, is adapted to be more resistant to photodamage. These strategies are differentiated by the number and type of far-red chlorophyll used and tuning of redox potentials of cofactors in PSII.

      The methods employed are well-chosen to present complementary evidence to address the questions posed. The authors have supported themselves using polarography, fluorescence decay, absorption, luminescence and thermoluminescence, and spectrometry, all of which are employed in a manner well-established in the quantification of processes in standard PSII preparations. The results, however, have some loss of data such as total yields which would be useful in interpretation as the authors have chosen to extensively normalize data for ease of visual comparison of certain features.

      Overall, the authors have adequately achieved their aims and their conclusions are well-supported. The authors also clearly state their own expectations of the impact of their work at the end of the Discussion; thanks to these results, we can better understand the ecological niche of each type of FRL-PSII and how these significantly disparate systems may be used in future agricultural research and development.

      We thank the reviewer for the positive evaluation of our work.

      Following the reviewer’s suggestions, the total yields (on a chlorophyll basis) of the flash-dependent oxygen evolution have been provided in Figure 3- figure supplement 1. These include the flash-dependent oxygen evolution data measured in WL C. thermalis membranes, that were previously omitted because of the unsatisfactory quality, and are still omitted from Figure 3 (normalised data and fits) for the same reason. The S-state distributions calculated from the fits of the flash-dependent oxygen evolution have been added in Table 2.

      Additionally, the non-normalised oxygen evolution and consumption rates used for Figure 6A and Appendix 6-figure 4 are now provided in Appendix 6-table 1.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Answers to reviewers’ comments

      (Reviewers comments are in italics. Text modifications in the manuscript file are in blue.)

      Overall, we acknowledge referee’s careful reading of the paper and comments that we think have helped further improvement of the manuscript.

      On the attached pages are our detailed point by point responses to the referees’ comments along with a description of how the manuscript was modified in accordance.

      New data included:

      In response to the comments and suggestions of both reviewers 1 and 3, we conducted new experiments to test genetic interactions between different actors of the BMP and activin pathways. These new results confirm and complement the analyses described in the original manuscript. Furthermore, as suggested by reviewer 2, we have further studied the phenotypes of hiPSC-CM, by analyzing gene expression profiles and by analyzing the morphological changes induced as a result of PAX9 knockdown.

      NB: The title has been slightly modified, to highlight the conserved features of the genetic architecture of cardiac performance revealed in the study

      __Former title: __Genetic architecture of natural variation of cardiac performance in flies.

      __Novel title: __Genetic architecture of natural variation of cardiac performance: From flies to humans.

      Reviewer 1

      1. 1. The authors utilized the RNAi-mediated knockdown approach in their functional validation studies. It is not clear how each genetic variation (SNP) affects its associated genes. Could some of the SNPs activate the candidate gene expression? For the 4 candidate genes that failed to show cardiac defects, could the overexpression of these 4 genes alter cardiac performance? Answer 1- Of course, we cannot predict direction of the effect of the variants on the function of the genes. In this context, loss-of-function experiments are subjected to a risk of false negatives. It is indeed possible that in the case of a lack of effect of the loss of function, a gain of function could reveal an effect. But gain-of-function experiments are difficult to control, and often subjected to non-specific effects because it is complicated to control the level of over-expression compared to endogenous expression. This did not seem suitable for an extensive analysis of a large number of genes. We therefore chose to test only for loss of function.

      In addition, our approach to testing heart-specific RNAi aims to assess the quality of the association results by comparing RNAi for genes identified by GWAS to randomly selected genes. It is not intended to describe precisely the involvement of each gene individually.

      (See also answer to reviewer 2 comment n°2 and the modifications to the manuscript that have been made and which address these criticism)

      * 2. babo is the type I activin receptor, not type 2. *

      Answer 2- Thank you, we have corrected this error.

      • The authors show BMP and activin pathway genetically interacts to affect cardiac performance. But it is interesting to find that these interactions are in a trait-dependent manner. For example, it seems that babo and dpp epistatically interact to regulate FS, while they additively regulate HP and DI. The authors need to discuss the complex genetic interaction further. *

      Answer 3- See reply to reviewer 3, comment N°2 below.

      4*. Both snoo and sog are identified from GWAS. How about babo and dpp? Are there any identified SNPs associated with babo and dpp? *

      Answer 4- Considering GWAS for mean phenotypes, there is no variant in dpp that are within the 100 best ranked SNPs nor within the variants identified using fast epistasis. But given the size of the DGRP population we are far from being exhaustive, as we do not reach saturation. It is therefore difficult to comment on these ‘negative’ results. However, we do identify one variant in babo using fast epistasis (see figure 2B and Table S3).

      5. It is unclear why the mad KD behaves oppositely to dpp mutant, although both proteins are involved in BMP pathway. In Figure S5, the mad KD shows reduced FS and HP, but dpp LOF mutant shows increased FS and HP (Figure S4). Can the authors perform RNAi to knockdown dpp specific in the heart to reexamine the role of dpp in the regulation of cardiac function. The whole body LOF mutant dpp-d14 might not target cardiac tissue directly to control heart performance like mad KD.

      Answer 5- (see also answer to reviewer 3 comment n°2) We did perform heart specific dpp RNAi experiments together with other tests for interactions using new allelic combinations of activin and BMP pathways and therefore can compare heart specific knock down to heterozygotes for amorphic mutations for both dpp and mad.

      Regarding dpp, congruent effects on HP, DI, SI, ESD and EDD were observed between mutant and RNAi, while RNAi had opposite effects on FS compared to heterozygotes dppd14 mutants (decreased and increased FS compared to control, respectively). In the case of mad, heterozygous mutants had no effect on FS, EDD and ESD, but similarly to dpp mutants it increased SI, DI and HP. mad RNAi uniquely decreased HP, DI and SI and increased AI. However, similarly to dpp RNAi, it induced a decrease of FS.

      Thus, systemic versus heart specific knockdown of genes induce specific effects, suggesting cardiac non-autonomous interactions. This complex picture of TGFb involvement is now discussed in the result section (see below, Reviewer 3, major comment 2).

      6*. The authors selected two novel genes to study the conversed regulation in both flies and human iPSC cells. Besides testing these novel genes, the authors should also verify whether the conserved pathways, like TGF-beta, regulate heart performance in human iPSC cells similar to the flies. *

      Answer 6- We focused on poxm/Pax9 and sr/Egr2 because none of these TFs were known to have cardiac function in fly nor in mammals. Our paralleled analyses in fly and hiPS-CM illustrates how the description of the genetic architecture of cardiac traits in flies can accelerate discovery in mammals.

      There is extensive literature describing the involvement of TGF B /BMP and Activin pathways in heart development and diseases in humans, hence the choice not to focus on these pathways in iPS-CM.

      Reviewer 2:

        • It will be interesting to compare this fly GWAS to human heart disease GWAS data (for example, cardiomyopathy, arrhythmia, heart failure) from patients. Such cross comparison could make the data set more valuable. * Answer 1- We actually did make this comparison (Table 2, Table S11) and we agree it significantly validates our approach. This identified a set of orthologous genes associated with cardiac traits both in Drosophila and humans, supporting the conservation of the genetic architecture of cardiac performance traits, from arthropods to mammals.
      1. RNAi is the only experimental approach in this manuscript to validate the functional significance from data analyses. Authors may consider using genetic mutations such as deficiency lines or P-element lines to offer an alternative approach. This is simply a suggestion to improve the rigor and reproducibility, not absolutely required. *

      Answer 2- In an attempt to provide a consistent analysis of loss of gene function, our strategy was to concentrate our analysis on the effects of heart specific knock down. This allows us to compare -in a global way- the effects of the knock down of genes identified by GWAS to those of randomly selected genes.

      Our objective was to provide a global view of the heart specific effects of the identified genes, and not to characterize precisely the involvement of each of them, using a combination of mutant alleles, RNAi and gain of function. Given the experimental burden of analyzing cardiac function, such a strategy would have indeed required us to concentrate only a very small number of genes.

      We however recognize that this strategy has limitations:

      • Some variants may lead to gain-of-function effects of genes, and our strategy is not able to test for these effects.

      • Some variants may come from non-cell-autonomous effects, which would not be replicated by our targeted RNAi strategy in the heart.

      Therefore, the false negative rate of our experiments is difficult to estimate.

      We have tried to put this into perspective and to highlight the limitations of our analysis in the results section describing RNAi validation of GWAS results.

      “To assess in an extensive way whether mutations in genes harboring SNPs associated with variation in cardiac traits contributed to these phenotypes ….. (…)

      …… These results therefore supported our association results. It is important to emphasize that our approach is limited to testing the effect of tissue-specific gene knock down. Since some of the variants may lead to increased gene function and/or expression, this can lead to a false negative rate that is difficult to estimate. In addition, some of the associated variants may influence heart function by non cell-autonomous mechanisms, which would not be replicated by cardiac specific RNAi knock down.”

      *In order to validate the roles of predicted TF binding sites, the best approach would be introducing point mutations using CRISPR/Cas9 within the binding motif then testing out molecular and physiological outcomes. Rather authors chose to test indirectly to knock down those TFs. If so, authors need to at least acknowledge the potential caveats of such approach and the limitation in related data interpretation. *

      Answer 3- The reviewer is right, the definitive proof of the involvement of a potential TF binding site on the regulation of a gene located in cis requires to mutate the binding site and to analyze the effect on the expression of the corresponding gene. But this may not be sufficient to definitely demonstrate that the potential TF is indeed a regulator of that gene (the binding motif may be target of yet another TF): definitive proof may require motifs/TF DNA binding domain swaps. This would have been out of the scope of the present study. In addition, the effects on heart performance of mutating one TFBS at a time (among several dozens) may be too weak to allow their characterization with available tools and approaches.

      We acknowledge however that our approach provides an indirect validation of transcription factors binding sites predictions. This was, in our opinion, the most efficient way to evaluate the potential effect of predicted transcription factors.

      We clarify this in the result section:

      “We did not test individually the effects on cardiac performance of mutations in predicted TFBSs located near the SNPs because any individual effect would probably be too small to be detectable by the available methods. Rather, we tested the potential involvement of their cognate TFs by cardiac specific RNAi mediated KD”

      • hiPSC-CM data is somewhat limited by only showing the HR and AP duration data. It is recommended to include some immunocytochemistry data to show the morphology, sarcomere structure of these hiPSC-CMs. Gene expression data generated by qPCR or RNA-seq in particular focusing CM structure and function genes would be helpful too.*

      Answer 4- As suggested by referee 2, we have now performed gene expression analysis and immunostaining of PAX9 KD which gave the strongest phenotype in iPSC-CM (Figure 4 J-M). This unraveled increased expression of Na+ and K+ channels, which is in line with APD shortening phenotype, as well as down regulation of CASQ2, consistent with calcium transient shortening. Expression analysis also revealed increased sarcomeric genes and NPPA/B expression, which was consistent with increased CM size as quantified by the area of TNNT2 staining per nuclei.

      These new data are described at the end of the result section:

      “APD shortening for PAX9 KD was coincident with increased expression of Na+ and K+ ion channels (SCN5A, KCNH2 and KNCQ1) (Figure 4J), supporting the APD shortening phenotype. In this context, the AP kinetics also correlated with shorter calcium transient duration (Figure S8A-D and H-K), including faster upstroke and downstroke calcium kinetics and increased beat rate (peak frequency) (Figure S8E-G and L, M), consistent with decreased expression of Calsequestrin 2 isoform (CASQ2) associated with PAX9 KD (Figure 4J). Finally, assessment of the PAX9 KD effect on sarcomeric content revealed an increase in sarcomeric gene expression (Figure 4K), and an upregulation of genes associated with an hypertrophic response (NPPA, NPPB and NPR1 (Battistoni Et al Circulating biomarkers with preventive, diagnostic and prognostic implications in cardiovascular diseases, Int J Cardiol, 2012, vol. 157) which was coincident with increased CM size as quantified by the area of TNNT2 staining per cardiac nuclei (Figure 4 L, M).

      Collectively, these data illustrate conserved functions for poxm/PAX9 and sr/EGR2 in setting the cardiac rhythm and identify PAX9 as a novel and key regulator of cardiac performance at the cellular level, via the integrated regulation of expression of genes controlling electrophysiology, calcium handling and sarcomeric functions in hiPSC-CMs.”

      Reviewer 3

      Major Comments:

      1- There is an assumption in the use of RNAi knockdown to validate the genes identified in the quantitative analysis, and that is that natural variants are themselves hypomorphic. It is possible that among the variants identified some are hypermorphic, or among the transcription factor binding sites that variants lead to increased factor binding. While RNAi knockdown is an excellent choice to begin validation, I do not think the authors can rule out that a gene not functionally validated by their RNAi tests does not have a role in cardiac function.

      Answer 1. Please see our answers to reviewer 1 comment n°1 and reviewer 2 comment n°2.

      * 2- After performing RNAi knockdown to validate genes identified by GWAS the authors focus on the TGFbeta signaling pathway for downstream analysis. To do so they examine heterozygotes for sog, a repressor of BMP signaling, and snoo, an activator of Activin pathway. The data from the snoo/sog heterozygote is compelling in its disruption of heart phenotypes, and the authors conclude a "coordinated action of activin and BMP." snoo, however, also works as a transcriptional repressor in the BMP pathway, so it's possible that the effects the authors are seeing here could be confined to an increase in BMP signaling. Unlike snoo and sog, mutations in babo and dpp are both expected to have negative effects on Activin and BMP signaling, respectively. The babo/dpp interaction is not as quantitatively convincing as the snoo/sog data, despite the integral roles both babo and dpp play in their respective pathways. If both pathways are connected, why do snoo/sog heterozygotes affect SI phenotypes, while babo/dpp heterozygotes affect fractional shortening? I think the authors data suggest an interesting potential interaction between these pathways, which could be confirmed by examining further mutant combinations, knockdowns or increased expression transgenes, but falls short of a "confirmed synergystic genetic interaction." It does, however, underscore the value of the data in the paper for opening up new avenues for future study. *

      Answer 2 (and reviewer 1 comments 3 and 5).

      These comments led us to reconsider the analysis of the phenotypes associated with loss of function of the TGFb pathway, and to analyze other pathway components combinations.

      We acknowledge reviewer 3 criticisms on snoo/sog experiments, which are difficult to interpret given the broad action snoo may have on both BMP and activin pathways. We have addressed this in the result section.

      We have also analyzed other allelic combinations of BMP and activin pathways components, which strengthen the analysis performed on dpp/babo. Indeed, we tested babo/tkv heterozygotes (respectively specific activin and BMP receptors) and found significant genetic interactions for ESD and EDD. Albeit non-significant, babo/tkv double heterozygotes display a tendency to non-additive effects on FS (p= 0,054). mad/smox heterozygotes (respectively specific downstream TFs of BMP and activin pathways) display interactions (non-additive effects) on HP, SI, DI, ESD and EDD. These new results (Supplemental Figure 4) are thus supporting the hypothesis of genetic interactions between the pathways, but also reveal, as suggested by reviewer 3, a complex relationship between both pathways since interactions are revealed for specific traits in each of the mutant combinations analyzed.

      The phenotypes related to the individual loss of function of each of the actors of these pathways (dpp, tkv and mad for BMP; babo and smox for activin) are however very similar. When they have an effect, heterozygous amorphic alleles of these genes display increased phenotypes related to rhythmicity (HP, DI, SI, AI) and FS, but decreased cardiac diameters (ESD and EDD).

      Finally, as pointed out by reviewer 1, the picture is certainly even more complex since the phenotypes of RNAi mediated heart specific loss of function are not always similar to those of systemic loss of function. Indeed, mad RNAi causes a reduction of HP, DI, SI and FS (Figure S5) whereas heterozygotes for mad12 have either no or opposite effect on these phenotypes, and mad RNAi causes a significative increase in AI whereas mad12 has no effect (Figure S4). The discrepancy between tissue specific RNAi and heterozygous background was also found in the case of dpp, but specifically for the FS. Indeed, as suggested by reviewer 1 we have analyzed the loss of function of dpp by heart-specific RNAi. dpp RNAi results in a reduction of the FS (like mad RNAi) whereas the loss of function in the whole-body results in an increase of the FS.

      We therefore re-wrote the whole corresponding section of the results and modified Figure S4 to include babo/tkv; smox/mad and dppRNAi data.

      “We further focused on the TGFb pathway, since members of both BMP and activin pathways were identified in our analyses. We tested different members of the TGFb pathway for cardiac phenotypes using cardiac specific RNAi knockdown (Figure 2C), and confirmed the involvement of the activin agonist snoo (Ski orthologue) and the BMP antagonist sog (chordin orthologue). Notably, Activin and BMP pathways are usually antagonistic (Figure 2D). Their joint identification in our GWAS suggest that they act in a coordinated fashion to regulate heart function. Alternatively, it may simply reflect their involvement in different aspects of cardiac development and/or functional maturation. In order to discriminate between these two hypotheses, we tested if different components of these pathways interacted genetically. Single heterozygotes for loss of function alleles show dosage-dependent effects of snoo and sog on several phenotypes, providing an independent confirmation of their involvement in several cardiac traits (Figure S4). Importantly, compared to each single heterozygotes, snooBSC234/ sogU2 double heterozygotes flies showed non additive SI phenotypes (two-way ANOVA p val: 2,1 10-7) suggesting a genetic interaction (Figure 2E and Figure S4A). It is worth noting however that snoo is also a transcriptional repressor of the BMP pathway (PMID: 16951053). The effect observed in snooBSC234/ sogU2 double heterozygotes can therefore alternatively arise as a consequence of an increased BMP signaling without affecting the activin pathway. We thus tested other allelic combinations for loss of function alleles of BMP and activin pathways. babo/tkv heterozygotes (respectively activin and BMP type 1 receptors) displayed non additive ESD and EDD phenotypes (Figure S4C). Synergistic interaction of BMP and activin pathways was also suggested by the analysis of fractional shortening in loss of function mutants for babo and dpp, the BMP ligand (Figure S4B). Of note, babo/tkv double heterozygotes also displayed a tendency to non-additive effects on FS albeit non-significant (two-way anova p= 0,054). In addition, mad/smox heterozygotes (specifc downstream TFs of BMP and activin pathways) displayed non-additive effects on several traits, including phenotypes related to rhythmicity (HP, SI, DI) and contractility (ESD and EDD) (Figure S4D). Altogether, cardiac performance in response to allelic combinations of activin and BMP supported a coordinated action of both pathways in the establishment and/or maintenance of cardiac activity. This was further supported by the observation that simple heterozygotes for the tested loss of function alleles displayed similar trends with respect to cardiac performance, irrespective of the pathway considered (dpp, tkv and mad for BMP; babo and smox for activin). Indeed, they displayed either no effect or increased fractional shortening and rhythmicity phenotypes (HP, DI, SI, AI), and decreased cardiac diameters (ESD and EDD). This suggests coordinated activity of both pathways. Importantly, the genetic interactions were tested using amorphic alleles that lead to systemic loss of function. The observed phenotypes may thus not unravel cardiac specific effects of the pathways. In support of this, mad cardiac specific RNAi knock down was tested (see below, Figure S5) and lead to a decreased HP, DI, SI and FS whereas heterozygotes for mad12 have either no (FS) or opposite (HP, DI, SI) effect on these phenotypes (Figure S4D). Inversely, mad RNAi caused a significant increase in AI whereas mad12 had no effect. However, heart specific dpp RNAi knock down (Figure S4E) lead to similar phenotypic trends compared to dppd14 (increased HP, DI, SI, decreased EDD and ESD) with the notable exception of FS which was reduced following cardiac specific KD (Figure S4E), but increased in dppd14heterozygotes (Figure S4B). Taken together, these data point to a complex picture of TGFb pathway activity in regulating cardiac performance, involving both the activin and the BMP pathways as well as gene specific effects with both systemic and tissue-specific contributions.”

      *Minor Comments: *

      * There is an enormous amount of data in this paper, but there are places where things are summarized a little too briefly. For example, there are no definitions given at the beginning of the Results section for traits like "Heart Period" or "Systolic Interval," which would make this work significantly more accessible for other Drosophila researchers. (They do touch on this when they explain later in the paper that certain variants are "associated with quantitative traits linked to heart size and contractility" but more background earlier would be helpful.) When we consider heart performance traits, what is the baseline from known mutants? In other words, where is the line between variation and defect? *

      Answers:

      • We have detailed the description of the traits analyzed at the beginning of the result section. We hope this improves the ease of reading in the direction suggested by the reviewer. “7 cardiac traits were analyzed across the whole population (Dataset S1 and Table 1). As illustrated in Figure 1A, we analyzed phenotypes related to the rhythmicity of cardiac function: the systolic interval (SI) is the time elapsed between the beginning and the end of one contraction, the diastolic interval (DI) is the time elapsed between two contractions and the heart period (HP) is the duration of a total cycle (contraction + relaxation (DI+SI)). The arrhythmia index (AI, std-dev(HP)/mean (HP)) is used to evaluate the variability of the cardiac rhythm. In addition, 3 traits related to contractility were measured. The diameters of the heart in diastole (End Diastolic Diameter, EDD), in systole (End Systolic Diameter, ESD), and the Fractional Shortening (FS), which measures the contraction efficacy (EDD-ESD/EDD).“

      • With respect to the baseline of cardiac performance, there is no simple answer. The baseline is influenced by the genetic background and the experimental conditions. This is the reason why any analysis of mutants or RNAi is conducted in comparison with its own control, analyzed at the same time. Concerning the DGRP lines, no baseline can be defined, since the objective is to measure the diversity of cardiac performance traits within a natural population.

    1. For what purpose? So that the process of what Becker calls “self-transcendence” may begin. And he describes the process of self-transcendence this way: Man breaks through the bounds of merely cultural heroism; he destroys the character lie that had him perform as a hero in the everyday social scheme of things; and by doing so he opens himself up to infinity, to the possibility of cosmic heroism …. He links his secret inner self, his authentic talent, his deepest feelings of uniqueness … to the very ground of creation. Out of the ruins of the broken cultural self there remains the mystery of the private, invisible, inner self which yearned for ultimate significance. …This invisible mystery at the heart of [the] creature now attains cosmic significance by affirming its connection with the invisible mystery at the heart of creation. “This,” he concludes, “is the meaning of faith.” Faith is the belief that despite one’s “insignificance, weakness, death, one’s existence has meaning in some ultimate sense because it exists within an eternal and infinite scheme of things brought about and maintained to some kind of design by some creative force (90, 9 1).” This, then, is what we might call good faith, not a flight into some immortality system. And clearly, some Christians, some Buddhists–at least the Zen Buddhists Becker himself mentions!–have faith in this sense, a faith that Becker characterizes as growing out of tasting one’s own death, embracing one’s own nothingness, and affirming–not a known ultimate meaningful–but an “invisible mystery” of ultimate meaning.

      Embrace the mystery, the sacred - accepting that one will be gone forevermore is a mighty task as our culture teaches us to seek recognition. The last thing we want to be is unrecognized, a nobody. And yet, when we are dead and dissipated back into the rest of the world, that is exactly what we will become.

      But we have to accept that reality before we can build and think beyond it to a deeper possibility of meaning. Reality brought us forth to begin with. Every moment is already sacred.

    1. Author Response

      Reviewer #2 (Public Review):

      1) “…it was important that the output response was intimately linked to the bound state of the receptor, in this case the TCR, with ligand unbinding rapidly reversing all proofreading steps. This means that dissociation of a single TCR should disrupt signaling, and implicitly assumes a direct physical connection between the bound receptor and the KP modifications. However, this mechanism becomes much harder to argue when the KP steps are physically uncoupled from bound TCR, such as in LAT microclusters or DAG production.”

      We agree that signaling events in the kinetic proofreading chain must be linked to ligand unbinding. We have added discussion to the paragraphs beginning on page 20 line 440 of recent work from Yi et al. 2019 and Lo et al. 2018 suggesting a physical link between bound TCRs and LAT clusters. The full paragraphs are reproduced below.

      “The kinetic proofreading model requires all intermediate steps to reset upon unbinding of the ligand (Fig. 1A). This means that information about the receptor’s binding state must be communicated to all proofreading steps. If kinetic proofreading steps exist beyond the T cell receptor, how is unbinding information conveyed to these effectors? Importantly, there is evidence of physical proximity of LAT with the receptor. While TCR/Zap-70 and LAT/PLCγ microclusters form spatially segregated domains, these domains remain adjacent to one another (Yi et al., 2019). Lo et al. demonstrated that the protein Lck binds Zap-70 with its SH2 domain and LAT with its SH3 domain, potentially bridging the two signaling domains together and propagating binding information (Lo et al., 2018).

      An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding, these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      2) …The data clearly demonstrate a time delay between receptor binding and the measured outputs, but it is not so surprising that this lag would exist in propagating the signal through the intracellular network.

      We apologize for this point of confusion in our methodology. We are unable to measure the time lag between receptor binding and signal propagation through the network because our system is terminated by blue light. Binding is stochastically initiated much like native ligand/receptor interactions. The time values reported in our dataset are the average ligand binding half-lives of the LOV2 ligand under various intensities of constant blue-light illumination, as measured by separate in vitro kinetic washout experiments. Our model is fit to the steady-state signaling output achieved after a 3 minute exposure of cells to LOV2 ligands of an average ligand binding half-life enforced by constant blue light illumination. We clarify this point by including the following paragraphs beginning on page 8 line 170.

      “We are unable to control when binding events start since our optogenetic system is inhibited by blue-light, as opposed to being activated by blue-light. The initiation of binding after blue-light inhibition is a function of both the stochastic relaxation of inhibited LOV2 back into the binding-state as well as the diffusion of binding-state LOV2 from outside the previously illuminated area. Without temporal control over the start of binding, it is difficult to measure the time delay between ligand binding and a downstream signaling event (Yi et al., 2019). Such studies typically require careful single-molecule imaging of numerous stochastic binding events (Lin et al., 2019).

      To overcome this technical limitation of our system, we chose instead to measure the steady-state output of the antigen signaling cascade achieved several minutes after ligand binding. Kinetic proofreading systems behave differently than non-proofreading systems at steady-state. A non-proofreading system’s steady-state output is set by the number of ligand-bound receptors and not the binding half-lives of those ligands (Fig. 3D, left). In contrast, a kinetic proofreading system can produce different steady-state outputs in response to ligands of different binding half-lives, even when ligand densities are adjusted to achieve equivalent occupancy (Daniels et al., 2006) (Fig. 3D, right). Signaling events take varying amounts of time to occur after ligand binding (Lin et al., 2019; Yi et al., 2019). However, the temporal delays between steps are on the order of tens of seconds. By imaging the cells after minutes of constant exposure to a set ligand binding half-life, we measure the steady state output achieved at a signaling event in the cascade on a longer timescale than these delays (Tischer & Weiner, 2019).”

      3) The authors use a simple equation for KP to fit their datasets in Figure 4, equivalently to their previous work. However, no goodness-of-fit metric is provided for these fits, and by manual inspection it is hard to see the defining curves of their KP model in the datasets, especially not for LAT and DAG, where the datasets look much more like vertical bars. The estimated values of steps (n) may well be the best fit to the data, but they are not necessarily a 'good' fit.

      To assist readers in assessing how well our models fit our datasets, we have included heatmaps of the residuals from each model fit (Fig 4S3) on page 52, along with discussion (reproduced below) of the residual plots of regions where our models imperfectly capture our dataset on page 13 line 283.

      “To assess our model fits, we evaluated the residuals of each model subtracted from their respective dataset. For Zap70 recruitment, our model underestimates the degree of activation at moderate binding half-lives and receptor occupancies, as indicated by the positive region in the center of the heatmap. It is possible that Zap70 recruitment reaches saturation at shorter ligand binding half-lives than our model predicts (Fig. 4S3 A). For both LAT clustering and DAG generation, our models performed poorest in the region of lowest occupancy and shortest half-life (Fig. 4S3 B&C). In this region of our dataset, the fluorescent signal from bound LOV2 above the background fluorescence of unbound LOV2 is smallest. To compensate for fluorescence of unbound LOV2, we subtract off the local background fluorescence of unbound LOV2 around each cell. In doing so we may be underestimating the amount of LOV2 bound to each cell, leading to an underestimation of signaling output by the models. Future studies at LOV2 densities approaching single molecule would better capture this regime of receptor occupancy, but cell-to-cell variation in activation would be too high to be compatible with our current steady-state analysis (Lin et al., 2019).”

      4) The values of n are also very high, which would imply that the kp rate constant might be very fast to compensate; no estimates of this value are presented. Recent data from the Dushek lab (Pettmann et al, eLife 2021) measured n to be ~3, which seems much more physically realistic. Furthermore, in their previous published work, Tischer & Weiner measured n to be 2.7 for DAG production but in the present study it is now n=11.3, using the same equation

      We are unable estimate the kp rate constant, as our datasets are at steady state and do not provide temporal information. To assess the plausibility of our higher n value fits, we explored the steady-state model presented in Ganti et al. PNAS 2020, which defines a kp rate of 0.1 s-1. This model predicts the minimum number of signaling steps required to achieve a defined Hopfield error rate at defined cognate-ligand/self-ligand concentration and half-life ratios. Our exploration of this model is detailed in Fig. 4S4 on page 53 and detailed in discussion on page 14 line 299

      “In our previous work our model fit fewer (N=2.7) steps to DAG generation. We now fit a higher number of steps (N=11.3) to DAG generation. This change could be due to the incorporation of ICAM into our current study, which has been shown to potentiate ligand discrimination (Pettmann et al., 2021). Furthermore, our previous antibody-based adhesion may have short-circuited some proofreading steps by irreversibly holding the cell membrane close to the supported lipid bilayer. To evaluate if our higher value fits are indeed the best fit values for our datasets, we fit our model to each dataset while holding the value of N constant in the range of zero to fourteen steps, and evaluated the average residual value for each model fit (Fig 4S3 D). For all signaling steps, the fit value of N was near the minima of average residual and had a lower average residual value than a model with 3 proofreading steps.

      To assess the plausibility of a larger number of proofreading steps, we implemented the steady state kinetic proofreading model from Ganti et al. (Ganti et al., 2020). The model estimates the minimum number of proofreading steps required to discriminate between cognate-ligands and self-ligands with different binding half-lives present at a given concentration ratios at a given Hopfield error-rate (Hopfield, 1974). First, we evaluated what combinations of ligand half-lives and concentration ratios an 11-step kinetic proofreading network could discriminate at an error rate less than 10-3 (Fig 4S4 A). We chose the error rate of 10-3, as it is an order of magnitude less than the theorized 10-4 upper limit error rate of the native TCR (Ganti et al., 2020). At moderate half-life ratios, an 11-step network can discriminate cognate peptides present in small concentrations (e.g. 1 cognate-ligand per 1000 self-ligands at a half-life ratio of 6).

      In our optogenetic system, the ratio of the average ligand binding half-life between the longest suppressive half-life and the shortest fully activated half-life is about 2. However, an 11-step network is insufficient to discriminate between ligands with a half-life ratio of 2, even at the high ligand ratio of 1 (equal concentrations of cognate- and self-ligand). This suggests our cells are unlikely to be detecting the average ligand binding half-life of each blue-light condition, but are more likely detecting longer-lived binding events from the underlying distribution of half-lives. Another possibility is that our in vitro washout measurements, which measure average ligand binding half-lives of soluble ligands diffusing in three dimensions, differ from the half-lives of ligand-receptor interactions between the cell’s plasma membrane and the supported lipid bilayer diffusing in two dimensions (J. Huang et al., 2010).

      To better explore the kinetic proofreading model space, we generated heatmaps reporting the required number of steps to discriminate combinations of ligand and half-life ratios at an error rate of 10-3 (Fig 4S4 B). To discriminate between ligands with a half-life ratio of two, at least 14 steps are needed when the ligands are at equal concentrations, and more than 25 steps are needed if cognate-ligands are 1 per 1000 self-ligands. The required number of proofreading steps decreases rapidly as the half-life ratio increases, reaching a minimum of 8-steps needed for a concentration ratio of 1/1000 and a half-life ratio of 10, which is more in line with physiological half-life ratios between agonist and non-agonist peptides (M. M. Davis et al., 1998).

      After comparing our results with the Ganti model, this analysis suggest that our number of fit proofreading steps may be somewhat inflated as a function of our use the average ligand binding half-lives of three dimensional washout experiments in place of the two dimensional single molecule information T cells use to make activation decisions. However, the higher fit N values are more consistent with the required number of steps to discriminate ligands under more physiological conditions than our previous measurements of ~3 steps, which would not be expected to discriminate ligands with half-life ratio of 10 even at a ligand ratio of 1 (Fig 4S4 B, right).”

      5) If the fitted value of n provides no realistic insight into the KP mechanism, it should not be discussed as though it does.

      The many assumptions of our simplistic model likely results in error in determining the absolute number of fit proofreading steps. We feel the strength of our model lies in capturing the relative increase in the strength of proofreading as signal propagates through the cascade, and not determining the absolute number of proofreading steps, though it is comforting that our values are broadly consistent with the expectations of Ganti et al. To highlight the point that relative values are the most important feature of our experiments, we are open to normalizing our n fit values by the fit n of Zap70 for all discussion of our results and the proofreading strength increase shown in Fig 4D if the reviewers think this will better highlight the relative increase in proofreading strength.

      6) While it is good to confirm it, the result that downstream signaling complexes reset more slowly than distal ones is surely to be expected, given the increased number of steps over which ligand unbinding must traverse, as in their Erlang distribution. You would not expect ERK phosphorylation to decrease at the same rate as LAT cluster dissociation for this same reason. However, the fact that the lifetime of LAT clustering (14.2s) or ZAP70 (9.6s) is so different to LOV2 (3.3s) provides good evidence that it is not proofreading, as by definition the measured outputs should rapidly return to the 'unbound' state in line with ligand unbinding. At least for LAT, there must be a 'memory' from previous signalling lasting several seconds, which means the system has not reset, as required for true KP.

      Slower resetting of downstream signaling events in a kinetic proofreading cascade is not a given, as it could be the case that all events reset at the same rate. One requirement for kinetic proofreading is that events in the chain be irreversible on the timescale of the ligand binding half-life. The steps are reset through an orthogonal pathway, opposed to traversing back down a chain of reversible reactions. Both the TCR and LAT are dephosphorylated by the phosphatase CD45, and it would be possible for CD45 to dephosphorylate both proteins at the same rate (or even dephosphorylate LAT faster than the TCR). To clarify this point, we have expanded discussion on possible reset mechanism on page 21 line 451 as reproduced below

      “An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      We also added discussion of recent work from Harris et al. quantifying the slower timescale of Ca++ and ERK reset upon TCR signal termination on Page 23 line 498 as reproduced below.

      “Recently Harris et al. quantified the reset rate of the downstream signaling events Ca++ release and ERK phosphorylation upon signal inhibition to be 29 seconds and 3 minutes respectively (Harris et al., 2021). They showed both Ca++ and ERK levels can persist across short inhibitions of signaling. What makes LAT clusters different than these persistent downstream events? The dissolution of LAT clusters is directly triggered by the unbinding of ligand from the TCR, and both the TCR and LAT are de-phosphorylated by CD45. To our knowledge, the rate of ERK dephosphorylation or cytosolic Ca++ depletion are not accelerated by TCR unbinding, and are turned over through constant rather than agonist-gated degradation. A useful future line of inquiry would be to quantify the reset rate for signaling steps throughout the cascade upon ligand unbinding versus orthogonal signal inhibition (e.g. kinase inhibition).”

    1. Author Response

      Reviewer #1 (Public Review):

      The paper presents a Bayesian model framework for estimating individual perceptual uncertainty from continuous tracking data, taking into account motor variability, action cost, and possible misestimation of the generative dynamics. While the contribution is mostly technical, the analyses are well done and clearly explained. The paper provides therefore a didactic resource for students wishing to implement similar models on continuous action data.

      First off, the paper is lucidly written - which made it a very pleasant read, especially compared to many other modeling papers, and the authors are to be congratulated for this. As such, the paper provides a valuable resource for didactic purposes alone. While the employed methods are not necessarily individually novel, the assembly of various parts into a coherent framework appears nonetheless valuable.

      Thank you for the positive evaluation!

      I have two major concerns, though:

      1). My main comment regards the model comparison using WAIC (Figure 4E) or cross-validation (Figure S4a): If we translate these numbers into Bayes factors, they are extraordinarily high. I assume that the p(x_i|\theta_s) in equation 7 are calculated assuming that the motor noise on u_{i,t} is independent? This would assume that motor processes act i.i.d with a timeframe of 60ms, which is probably not a very realistic assumption- given that much of the motor variability (as stated by the authors) comes likely from a central (i.e. planning) origin. Would the delta-WAIC not be much smaller if motor noise was assumed to be correlated across time points? Would this assumption change the \sigma estimates?

      Thank you for posing this question. First, sequential models tend to have much larger differences in the likelihood of parameters given data because of the large number of individual data points within a single sequence. Thus, it is not uncommon for model comparison to show much more extreme differences between models for sequential data, as is the case in the present manuscript.

      Second, since our computational framework is based on LQG control, the model indeed assumes that motor noise is independent across time steps. We agree that this assumption might not be realistic for time steps of 16ms duration. While this assumption is certainly a simplification, the assumption of independent noise across time steps is very common both in perceptual models as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework. It thus applies to all of the models considered in this paper, as they all assume temporally uncorrelated noise, both in perception and action. Therefore, the ranking between the models in the model comparison should hopefully not be affected in a systematic way favoring individual models disproportionately more than others, although the magnitudes of differences in WAIC might be smaller. Since the differences in WAIC are currently in the range of 1e4, we think that they will still be significant, even when accounting for correlated noise.

      Third, we think that the simplifying assumption of independent noise does not invalidate the calculation of the WAIC, which assumes independence across trials. The p(x_i | theta_s) in equation (8) are the likelihoods of whole trials. To compute them, we assume independence of the motor noise across time steps.

      We have added a short passage in the subsection ‘model comparison’:

      “Note that the assumption of independent noise across time steps might lead to WAIC values that are larger than those obtained under a more realistic noise model involving correlations across time. However, this should not necessarily affect the ranking between models in a systematic way, i.e. favoring individual models disproportionately more than others.”

      and a passage in the discussion that points out that modeling the noise as being independent across time points is a simplifying assumption:

      “Finally, assuming independent noise across time steps at the experimental sampling rate of (60Hz) is certainly a simplifying assumption. Nevertheless, the assumption of independent noise across time steps is very common both in models of perceptual inference as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework.”

      2). While the results in Figure 4a are interesting, the deviation of the \sigma estimates from the standard psychophysical estimates for the most difficult condition remains unexplained. What are the limits of this method in estimating perceptual acuity near the perceptual threshold? Is there a problem that subjects just "give up" and the motor cost becomes overwhelming? Would this not invalidate the method for threshold detection?

      We fully agree that for the most difficult conditions at the lowest contrasts all sequential models we considered are biased with respect to the uncertainties obtained with the 2AFC experiment, which is supposed to be equivalent. Interestingly, when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards an additional mechanism such as a computational cost or computational uncertainty, that is not captured by the current models at very low contrast.

      For the results in Fig. 4, we assumed a constant behavioral cost across all conditions. The assumption that the cost is independent of perceptual uncertainty might not hold in reality, exactly in line with your hypothesis that subjects might just "give up". There are other possible explanations, though, that could potentially be relevant here. For example, the visual system is known to integrate visual signals over longer times, when contrast is lower. This may introduce additional non-linearities in the integration, which could affect the sensitivity, as already pointed out in the study by Bonnen et al. (2015).

      We have added the following passage in the discussion section:

      “In the lowest contrast conditions, all models we considered show a large and systematic deviation in the estimated perceptual uncertainty compared to the equivalent 2AFC task. Note that when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards additional mechanisms such as a computational cost or computational uncertainty, that are not captured by the current models at very low contrast. One reason for this could be that the assumption of constant behavioral costs across different contrast conditions might not hold at very low contrasts, because subjects might simply give up tracking the target although they can still perceive its location. Another possible explanation is that the visual system is known to integrate visual signals over longer times at lower contrasts [Dean & Tolhurst, 1986; Bair & Movshon, 2004], which could affect not only sensitivity in a nonlinear fashion but could also lead to nonlinear control actions extending across a longer time horizon. Further research will be required to isolate the specific reasons.“

      Reviewer #2 (Public Review):

      This manuscript develops and describes a framework for the analysis of data from so-called continuous psychophysics experiments, a relatively recent approach that leverages continuous behavioral tracking in response to dynamic stimuli (e.g. targets following a position random walk). Continuous psychophysics has the potential to dramatically improve the pace of data collection without sacrificing the ability to accurately estimate parameters of psychophysical interest. The manuscript applies ideas from optimal control theory to enrich the analysis of such data. They develop a nested set of data-analytic models: Model 1: the Kalman filter (KF), Model 2: the optimal actor (which is a special case of a linear quadratic regulator appropriate for linear dynamics and Gaussian variability), Model 3: the bounded actor w. behavioral costs, and Model 4: the bounded actor w. behavioral costs and subjective beliefs. Each successive model incorporates parameters that the previous model did not. Each parameter is of potential importance in any serious attempt to human model visuomotor behavior. They advertise that their methods improve the accuracy the inferred values of certain parameters relative to previous methods. And they advertise that their methods enable the estimation of certain parameters that previous analyses did not.

      What were the parameters? In this context, the Kalman filter model has one free parameter: perceptual uncertainty of target position (\sigma). The optimal actor (Model 2) incorporates perceptual uncertainty of cursor position (\sigma_p) and motor variability (\sigma_m), in addition to perceptual uncertainty of target position (\sigma) that is included in the Kalman filter (Model 1). The bounded actor with behavioral costs (Model 3) incorporates a control cost parameter (c) that penalizes effort ('movement energy'). And the bounded actor with behavioral costs and subjective beliefs (Model 4) further incorporates the human observer possibly mistaken 'beliefs' about target dynamics (i.e. how the human's internal model of target motion differs from the true generative model. Model allows for the true target dynamics (position-random-walk with drift = \sigma_rw) to be mistakenly believed to be governed by a position-random-walk with drift = \sigma_s plus a velocity-random-walk with drift = \sigma_v).

      The authors develop each of these models, show on simulated data that true model parameters can be accurately inferred, and then analyze previously collected data from three papers that helped to introduce the continuous psychophysics approach (Bonnen et al. 2015, 2017 & Knoll et al. 2018). They report that, of the considered models, the most sophisticated model (Model 4) provides the best accounting of previously collected data. This model more faithfully approximates the cross-correlograms relating target and human tracking velocities than the Kalman filter model, and is favored by the widely applicable information criterion (WAIC).

      The manuscript makes clear and timely contributions. Methods that are capable of accurately estimating the parameters described above from continuous psychophysics experiments have obvious value to the community. The manuscript tackles a difficult problem and seems to have made important progress.<br /> Some topics of central importance were not discussed with sufficient detail to satisfy an interested reader, so I believe that additional discussion and/or analyses are required. But the work appears to be well-executed and poised to make a nice contribution to the field.

      The manuscript, however, was an uneven read. Parts of it were very nicely written, and clearly explained the issues of interest. Other parts seemed organized around debatable logic, making inappropriate comparisons to--and misleading characterizations of--previous work. Other parts still were weakened by poor editing, typos, and grammatical mistakes.

      Overall, it is a nice piece of work. But the authors should provide substantially more discussion so that readers will develop a better intuition and how and why the inference routines enable accurate estimation, and how the values of certain parameters trade off with one another. Most especially, the authors should be very careful to accurately describe and appropriately use the previous literature.

      Thanks for the generous overall assessment and the thorough review! We hope that we can address the points you raised in our revised manuscript with the answers to your specific comments below.

      To summarize, we have substantially revised the discussion section to clarify our reasoning and avoid potential misinterpretations of parts of our manuscript as a misrepresentation of previous work. We have also extended the introduction and the exposition of our models in the results section to help readers develop an intuition about the models and inference routines.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes a systematic biochemical analysis of UBX proteins in facilitating protein unfolding by the p97-UFD1-NPL4 (referred to here is the p97 complex). The p97 complex binds Ub and unfolds it to allow the ubiquitylated protein to be translocated into the p97 ATPase pore for unfolding. This paper demonstrates that UBX proteins are able to reduce the necessary ubiquitin chain length in order to support unfolding by p97. They explore this using ubiquitylated CMG helicase as a substrate. Removal of CMG helicase from replicated DNA is required for completion of DNA synthesis.

      First the authors demonstrate that the p97 complex only only unfolds CMG with very long Ub chains. The then show that the high threshold for Ub is reduced when UBXN7, FAF1 or FAF2 are added. These proteins bind to both the p97 complex and Ub in substrates. This is then followed up in cells by demonstrating that removal of UBXN7 and FAF1 reduces CMG disassembly and is synthetic with reduced CMG ubiquitin ligase activity.

      The conclusion that human p97 requires UBX proteins to support unfolding/segregase activity when Ub chains are short would be strengthened by more precise characterization of the length of ubiquitin chains being studied, as the methods do not precisely determine the chain lengths and how this is overlapping with the number and location of primary ubiquitylation sites on Mcm7.

      Please see our reply above to essential revision point 2 (data in Figure 1-figure supplement 1 and Figure 2-figure supplement 3)

      The in cellulo results, while consistent with a contributing role for FAF1 and UBXN7 in disassembly of the CMG by p97, indicate that either other factors are required in cells or that p97 can disassemble CMG with relative short chains in cells without the need for the UBX proteins. This needs to be reconciled with the proposed model.

      We now discuss on lines 444-450 that CMG disassembly in the absence of UBXN7 and FAF1 might be promoted by additional UBX proteins not characterised in this study, or else be due to extensive CMG-MCM7 ubiquitylation that bypasses the requirement for UBX proteins (as predicted by our data in Figure 1). Note that short ubiquitin chains on CMG-MCM7 in cells treated with p97 inhibitor need to be interpreted with caution, as it is likely that p97 inhibition lowers the pool of free ubiquitin in cells. This point is discussed on lines 444-445 of the revised manuscript.

      Reviewer #3 (Public Review):

      The ATPase p97 (Cdc48 in yeast) unfolds ubiquitinated substrates with the help of its heterodimeric cofactor UFD1-NPL4 (U-N). Using the previously established CMG helicase complex as model substrate in a fully reconstituted biochemical assay, Fujisawa and Labib show that p97-U-N can efficiently disassemble the helicase complex only when it is modified with multiple, long ubiquitin chains. This is in contrast to the yeast Cdc48-U-N complex, which disassembles helicase complexes carrying long or short (6-10 ubiquitin moieties) chains with similar efficiency. The authors demonstrate that the requirement of p97-U-N for long chains can be overcome by the presence of p97 cofactors of the UBA-UBX type, including UBXN7, FAF1, FAF2 and (much less so) UBXN1. They show that this reduction in the 'ubiquitin threshold' of p97-U-N by UBXN7, FAF1 and FAF2 requires their UBX domain mediating p97 binding. They further show that the UBA and UIM domains of UBXN7 contribute to its activity in the assay, whereas the UBA domain of FAF1 and FAF2 is dispensable. Instead, a coiled-coil domain preceding the UBX domain of FAF1 and FAF2 is required for their activity, and both the coiled-coil-UBX domain organization and its activity are conserved in the worm homologue UBXN-3. Using UBXN7 and FAF1 knockout cells, Fujisawa and Labib then demonstrate that UBXN7 is required for efficient CMG helicase disassembly during S phase, with a minor contribution of FAF1, whereas both cofactors possess redundant roles in mitotic CMG helicase disassembly. Finally, the authors show that UBXN7 and FAF1 double knockout cells are hypersensitive to the NEDDylation inhibitor MLN4924 and suggest that this reflects their importance for p97-U-N unfoldase activity under conditions of restricted ubiquitination activity.

      This manuscript describes the intriguing observation that the yeast and mammalian Cdc48/p97-U-N complexes have distinct requirements, at least in the in vitro assay used, with respect to the substrate´s ubiquitination state and to the presence of additional cofactors. While the concept of UBA-UBX cofactors assisting/stimulating Cdc48/p97-U-N activity is well-established, their link to ubiquitin chain length is novel and unexpected. The experiments are performed to a high technical standard, and the conclusions are mostly supported by the data. However, a shortcoming of the paper is that it remains entirely descriptive regarding the effect of the UBX proteins on the ubiquitin threshold, without providing mechanistic insights into their function or the molecular basis underlying the distinct thresholds.

      1) It remains unclear if the failure of p97-U-N to disassemble the helicase complex carrying short ubiquitin chains reflects impaired binding, priming or translocation of the substrate. It should be straightforward to test if the UBA-UBX cofactors simply stabilize the p97-U-N-substrate complex.

      As shown in previous studies, human UFD1-NPL4 bind stably to p97 in the absence of UBX proteins (our new data in Figure 3-figure supplement 2D illustrate this).

      The distinct domain requirements for UBXN7 (UBA, UIM, UBX) and FAF1/FAF2 (coiled-coil-UBX) suggest different mechanisms of stimulation, which should be discussed in more detail.

      We discuss further the roles of UBXN7 and FAF1/FAF2 on lines 533-548.

      The additive defects of the UBXN7 and FAF1 double knockout cells could indicate either redundant functions (as the authors propose) or synergistic function of both cofactors. To that end, the authors could test if UBXN7 and FAF1 can bind simultaneously to the same p97-U-N-substrate complex and if they act synergistically in helicase disassembly, e.g. at limiting cofactor concentrations.

      Previous studies have found that UBXN7 binds to p97 and UFD1-NPL4 with a 1:6:1 ratio and the same is true for FAF1, without any evidence of both UBXN7 and FAF1 binding to the same p97-UFD1-NPL4 complexes (Hanzelmann et al., 2011). Correspondingly, we did not observe any synergistic effect of FAF1 with UBXN7 upon the disassembly of ubiquitylated CMG by p97-UFD1-NPL4, when comparing reactions with a single UBX protein or reactions with both (our unpublished data).

      2) Having all purified proteins at hand, the authors should test which component of the system causes the elevated ubiquitin threshold of mammalian p97-U-N, by combining yeast Cdc48 with mammalian U-N and vice versa, etc.

      We thank the reviewer for this very interesting suggestion. The data are presented in Figure 3, showing that human UFD1-NPL4 and yeast Ufd1-Npl4 set the ubiquitin threshold for their cognate unfoldase enzymes.

      Can yeast Ubx5, which is a clear homologue of UBXN7, substitute for the mammalian UBA-UBX cofactors?

      This was also an interesting suggesting – we tested Ubx5 and didn’t see any stimulation. We didn’t include the data as we lack a positive control for Ubx5 activity.

      3) The authors emphasize that mammalian p97-U-N in the absence of UBA-UBX cofactors requires long ubiquitin chains for activity. However, they should consider the possibility that the critical property is chain topology, rather than chain length. There is evidence that p97-U-N prefers substrates with branched chains (see PMIDs 28512218, 29033132), and multiple ubiquitin chains on the helicase substrate may mimic those.

      We thank the reviewer for raising this important point and we now cite the two papers mentioned above, on lines 171 and 177.

      In the revised version of the manuscript, we characterise carefully the ubiquitin chains that are formed under the various conditions used (Figure 1-figure supplement 1). Importantly, we also show that human p97-UFD1-NPL4 can disassemble highly ubiquitylated CMG, regardless of whether there are several or just one ubiquitin chains attached to CMG-Mcm7 (Figure 1-figure supplement A+C; Figure 2-figure supplement 3A).

      Moreover, we also show that human p97-UFD1-NPL4 is comparable to yeast Cdc48-Ufd1-Npl4 in being able to disassemble CMG that is highly ubiquitylated with ‘K48-only’ ubiquitin that cannot form mixed chain linkages (Figure 2-figure supplement 3B).

      These data indicate that p97-UFD1-NPL4 can disassemble heavily ubiquitylated CMG complexes with long K48-linked ubiquitin chains on CMG-Mcm7, regardless of the number of chains and regardless of the presence of other chain linkages (in addition to K48-linked chains).

      It appears that worm CDC48-U-N in the absence of UBXN-3 cannot efficiently disassemble substrate carrying even long chains (Fig. 3 - supplement 2). The authors should discuss this finding in the context of their ubiquitin threshold model.

      This is an interesting point, suggesting that the threshold of C. elegans CDC-48_UFD-1_NPL-4 is even higher than human p97-UFD1-NPL4, in the absence of UBX proteins. However, we think that this issue is beyond the scope of our manuscript and likely requires structural biology to provide a definitive explanation. Our manuscript just uses the C. elegans enzymes to make one simple and clear point – namely that the essential role of the coiled coil domain of human FAF1 is conserved in its worm orthologue UBXN-3.

    1. Author Response

      Reviewer #1 (Public Review):

      This work by Wei-Jia Luo and colleagues elegantly employs in vitro and in vivo models to demonstrate that within the mouse liver, macrophages respond to lipopolysaccharide (LPS) by releasing active IL-12 (IL-12p70), which is a heterodimer of IL-12p35 and IL-12p40. They observed that the availability of "free" IL-12p35 to this heterodimerization process is governed by the molecular chaperone HLJ1. In response to LPS, HLJ1 separates homodimerized IL-12p35 into monomers, which then can heterodimerize with IL-12p40 to form active IL-12p70. This active IL-12 is released from macrophages in the liver, which then act on neighboring natural killer T cells to release interferon gamma. This interferon gamma circulates systemically and is responsible for mortality in a mouse model of endotoxic shock.

      Overall, this work is mechanistically compelling and demonstrates a novel multicellular inflammatory pathway that contributes to death in a murine model of endotoxic shock. However, it is unclear if the observed pathway is limited to this highly reductionist model, or if it applies to models that better approximate the complexity of human sepsis. Indeed, the long-standing concept of "cytokine storm" as the major mediator of sepsis has largely failed to yield benefits in clinical trials. These numerous and repeated translational failures cast doubt on the translational validity of reductionist in vivo animal models of sepsis.

      Thank the reviewer’s affirmation. One of the major aims of our work is to identify a novel multicellular inflammatory pathway mediated by HLJ1 that contributes to endotoxic shock. We agree that although our understanding of cytokine storm as the major mediator of sepsis had made dramatic progress over the past decade, these findings could not translate yet into effective treatments. As the reviewer mentioned, almost all clinical trials targeting cytokine effects failed, especially in the context of sepsis. We also know that among several explanations, the appropriateness of in vivo animal models should be concerned (Chousterman et al., 2017). Some approaches to treat cytokine storm were aimed to target the direct tissue consequences of inflammation cascade such as the blood vessel (London et al., 2010). Another possible strategy to treat cytokine storm was to target signaling that promotes cytokine synthesis and secretion (Maceyka et al., 2012). It may be feasible to quell the cytokine storm after infection by targeting upstream signaling, and reducing cytokine synthesis as well as secretion is a valid alternative to direct cytokine antagonism (Chousterman et al., 2017). Furthermore, in this study we found Hlj1−/− mice showed reduced IFN-g and improved survival when treated with daily systemic antibiotics after CLP surgery (Figure 6), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. Combined, we think HLJ1-targeting strategy might be a potential therapy to treat cytokine storm-associated sepsis. We emphasized and discussed the concept in the Discussion of our revised manuscript (Page 19, line 441-453).

      We highly appreciated the reviewer #1 and other reviewers raised the same issue. We worked hard and attentively to response comments point-by-point below.  

      This raises several specific concerns with regard to the model used by the investigators:

      (1) The authors use a massive dose of LPS that rapidly leads to the death of mice in 24 hours. This massive and rapid mortality is not consistent with human sepsis, which is a more crescendo course with a mortality of ~30%. Indeed, when the authors used a more clinically-relevant model of mild endotoxemia, HLJ1 appeared to have no impact on mortality (Figure 1A).

      Thank for the comment. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on these obvious and significant phenomena. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS treatment. The data are showed in Figure 1C and D of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      (2) LPS is a model of endotoxemia, not a model of sepsis. Accordingly, it is unclear if the protective benefit of blocking IL-12 will similarly be seen as a live-infection model of sepsis, in which inflammatory signaling may be necessary for pathogen clearance.

      Thank the reviewer for raising these critical issues and providing valuable suggestions. This issue was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, according to the reviewers’ suggestion, we performed additional live-infection model of sepsis including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 on sepsis. As a consequence, we found IFN-γ expression was lower in liver and spleen of Hlj1−/− mice comparing to Hlj1+/+ mice (Figure 6A and B). We analyzed serum markers of organ dysfunction and Hlj1−/− mice showed lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP surgery, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ (Figure 6D). We further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics (Figure 6E). Combined, we demonstrated the effect of HLJ1 deletion on attenuation of CLP-induced sepsis with down-regulated IFN-γ, and concluded that the benefit of blocking IL-12 and HLJ1 can similarly be seen as a live-infection model of sepsis. The result is showed as below (revised Figure 6). The corresponding result was also added in the revised manuscript (Page 11-12, line 268-286). Please check it as well as the above responses to other reviewers.

      Page 11-12, line 268-286 "HLJ1 deletion protect mice from CLP-induced organ dysfunction and septic death To address the question whether HLJ1 also regulates IFN-γ-dependent septic shock in live infection model, we performed CLP (cecal ligation and puncture) surgery which more resembles clinical disease and human sepsis. CLP significantly induced transcriptional levels of IFN-γ in the liver of Hlj1+/+ mice comparing to mice receiving sham surgery while Hlj1−/− mice showed significantly lower IFN-γ mRNA than Hlj1+/+ mice (Figure 6A). This phenomenon was not restricted to the liver since lower expression of splenic IFN-γ was also found in Hlj1−/− mice (Figure 6B). The CLP surgery resulted in serious renal and liver damage while Hlj1−/− mice showed alleviated organ dysfunction with significantly lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 6D). However, there was no significant difference in survival when comparing Hlj1+/+ and Hlj1−/− mice (Figure 6E). We hypothesized that severe bacteremia contributed to mortality in mice that did not receive any treatment, so we treat mice with systemic antibiotics. As a result, Hlj1−/− mice displayed significantly improved survival compared with Hlj1+/+ mice when mice received daily systemic antibiotics after CLP (Figure 6E). These results implied the agent responsible for bacteria clearance can be combined with immune modulation such as HLJ1 targeting to improve the outcome of sepsis."

      (3) Finally, it is unclear if the findings are only relevant to mice, or if they also have relevance to humans.

      We admit human studies is important, while there are some objective difficulties need to be overcame; for example, cohort identification, individual variation, and clinical considerations. This is our limitation since our findings were only based on animal models and human cell lines. We further performed CLP experiments which is more relevant to human sepsis, while it is not a true human study. That had been added as Figure 6 of our revised manuscript (Figure 6). Actually, based on the present result, we plan to initiate some specific clinical human studies. For example, we plan to collect blood monocytes from critically ill patients from ICU to see whether HLJ1 expression levels in monocytes is higher in patients with sepsis than in patients without sepsis. On the other hand, we also want to know whether HLJ1 expression levels in monocytes or in serum are correlated to inflammatory markers such as C-reactive protein, procalcitonin, and lactate in sepsis patients, because we found serum levels of HLJ1 correlated to IL-12 in mouse. In our unpublished preliminary result, HLJ1 can be detected in serum of patients with sepsis. This inspires us to investigate whether HLJ1 can be a diagnostic or prognostic marker in the future. We anticipate these results can be in our future publications. Thank you very much for your understanding.  

      Reviewer #2 (Public Review):

      The authors show that HLJ1 converts misfolded IL-12p35 homodimers to monomers, which maintains bioactive IL-12p70 heterodimerization and secretion. In turn, this contributes to increased IL-12 activity, leading to enhanced IFN-gamma production and lethality in mice challenged with LPS to model sepsis.

      Strengths:

      • Huge and diverse dataset (e.g. in vivo, in vitro, single cell RNAseq, adoptive transfer etc.) with interesting findings that could be of relevance to the field.

      We deeply thank the reviewer for the affirmation. We hope our comprehensive dataset can provide a novel insight of relevance to the field. With this information, we also keep investigating the underlying molecular alteration resulting from endotoxin-induced immune responses. Thank you very much. At the mention of our weaknesses raised by the reviewer, we totally agreed on it and take it very seriously and revised point-by-point. Thank you very much.

      Weaknesses:

      • The flow/narrative of the paper is very hard to follow. This may result from the fact that the order of presented results is a bit puzzling. Normally, one would add-in the cytokine results (now figure 3), after the survival curves in Figure 1. Furthermore, the flow cytometry data presented in Figure 4 is more or less a validation of the scRNAseq data presented in Figure 2 in another organ. Likewise, Figure 5 is sort of a validation of Figure 3 in another organ. The authors seem to jump from organ to organ, from in vivo to in vitro and vice-versa all the time which makes the paper extremely difficult to follow.

      Thank the reviewer for the valuable suggestion. Actually, we were also hesitant to this arrangement in our first submission. We rearranged our results so that the flow/narrative of the paper can be easier to follow:

      1. We moved the result of figure 3 to become figure 2 so that the cytokine array results would after the survival curve results.

      2. The flow cytometry result presented in Figure 4 was moved to Figure 5 so that it would after the result of sc-RNA sequencing.

      3. The qPCR result of pro-proinflammatory cytokines presented in figure 5 was moved to Figure 2-figure supplement 1 so that it would be a validation of cytokine array in another organ.

      In addition, along with other suggestions from reviewers, we have rewritten the introduction and the discussion sections and reorganized whole manuscript so that we can focus more on important issues. All the modification and rearrangement can be checked in the revised manuscript with changes tracked. Please check our revised manuscript. Thank you for your kind suggestions.

      • Use of extremely high dosages of LPS.

      Thank for the comment. This issue had been raised by several reviewers and the editor. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on this obvious and significant phenomenon. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, Creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS injection (Figure 1C). H&E staining showed kidney injury at the histology level after LPS treatment, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 1D). The data are showed in Figure 1C and D (in below) of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C (in below) of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      • Much of the presented data is replication of previous work. For instance, neutralization of IFN-γ (e.g. Billiau et al., Eur. J. Immunol. 1987; Car et al. J. Exp. Med. 1994) and anti-IL-12 (e.g. Zisman et al., Shock 1997) has been shown to lower mortality in LPS models in mice.

      Thank reviewer for the reminding. We apologized for our unclear description leading to misunderstanding. To carefully and firstly identify the novel role of HLJ1 in sepsis, we actually investigated it on several well-known bases. Indeed, the role of IFN-γ and IL-12 has been recognized in previous studies and their neutralization attenuating LPS-induced endotoxic shock have been reported. However, our study focused on the effect of HLJ1 deletion on IL-12/IFN-γ-axis and septic death. Firstly, we observed IFN-γ and IL-12 decreased after HLJ1 deletion during sepsis. On the one hand, we use IL-12/IFN-γ neutralization and found it could improve survival in wild-type mice rather than in Hlj1 knockout mice, suggesting the importance of HLJ1 in IL-12/IFN-γ-mediated mortality. On the other hand, if the difference of mortality rate across genotypes could become no difference after IL-12 or IFN-γ neutralization, then we can infer that HLJ1 contributes to mortality mainly through IL-12 and IFN-γ signaling. These ideals came from a previous study published in Cell (Ponzetta et al., 2019). The authors elegantly proved the role of Csf3r in IL-12/IFN-γ-axis and subsequent tumor incidence by showing that IFN-γ neutralization can alter the phenotype in wildtype mice rather than in knockout mice. This rationale inspired and prompted us to perform the similar neutralization experiment for understanding the precise role of HLJ1 in sepsis.

      • No true sepsis model is used, only LPS. This is important, as for instance neutralization of IFN-γ and IL-12 has been shown to improve outcome in endotoxemia before (see above), but had no effect on survival in more relevant sepsis models such as cecal ligation and puncture (e.g. see Romero et al., Journal of Leukocyte Biology 2010; Zisman et al., Shock 1997). Furthermore, IFN-γ is even proposed (and used on a small scale) as therapy in sepsis patients to reverse immunosuppression.

      Thank the reviewer raised these critical issues and provided valuable suggestions. It was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, we performed additional model including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 to human sepsis. Please see our revised Figure 6 (Figure 6) and responses to other reviewers above.

      In accordance with the previous result from Romero et al showing that IFN-γ neutralization did not improve survival rate, we observed similar survival rate between Hlj1+/+ and Hlj1−/− mice after CLP. However, when they treated mice with systemic antibiotics, IFN-γ knockout mice survived significantly better than wild-type mice (Romero et al., 2010). In CLP model, it is possible that severe bacteremia contributed to mortality in mice that did not receive antibiotics in an IFN-γ-independent manner, so we treated mice with systemic antibiotics immediately after CLP. As a result, we further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics after CLP surgery (Figure 6E), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. The result is showed in Figure 6E (in below) of revised Figure 6 (Figure 6). This suggests that HLJ1-targeting strategy can be combined with antibiotics to become combined therapy for future clinical applications. We emphasized and discussed the concept in the Discussion of the revised manuscript (Page 18-19, line 441-453).

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript "CompoundRay: An open-source tool for high-speed and high-fidelity rendering of compound eyes", the authors describe their software package to simulate vision in 3D environments as perceived through a compound eye of arbitrary geometry. The software uses hardware accelerated ray casting using NVIDIA Optix to generate simulations at very high framerates of ~5000 FPS on recent NVIDIA graphics hardware. The software is released under the permissive MIT license, publicly available at https://github.com/ManganLab/eye-renderer, and well documented. CompoundRay can be extraordinarily useful for computational neuroscience experiments exploring insect vision and robotics with insect like vision devices.

      The manuscript describes the target of the work: realistic simulating vision as perceived by compound eyes in arthropods and thoroughly reviews the state of the art. The software CompoundRay is then presented to address the shortcomings of existing solutions which are either oversimplifying the geometry of compound eyes (e.g. assuming shared focal points), using an unrealistic rendering model (e.g. local geometry projection) or being slower than real-time.

      The manuscript then details implementation choices and the conceptual design and components of the software. The effect of compound eye geometries is discussed using some examples. The speed of the simulator depending on SNR is assessed and shown for three physiological compound eye geometries.

      I find the described open source compound eye vision simulation software extraordinarily useful and important. The manuscript reviews the state of the art well. The figures are well made and easy to understand. The description of the method and software, in my opinion, needs work to make it more succinct and easier to understand (details below). In general, I found relevant concepts and ideas buried in overly complicated meandering descriptions, and important details missing. Some editorial work could help a lot here.

      Thank you for the very positive feedback.

      Major:

      1) The transfer of the scene seen by an arbitrary geometry compound eye into a display image lacks information and discussion about the focal center/ choice of projection. I believe that only the orientation of ommatidia is used to generate this projection which leads to the overlap/ non-coverage in Fig. 5c. Correct? It would be great if, for such scenarios, a semi-orthogonal+cylindrical projection could be added? Also, please explain better.

      For clarification, CompoundRay allows for a number of projection modes from any 3D sampling surface to visualised 2D projections. This has now been made clearer with an updated Methods section “From single ommatidia to full compound eye” (lines 171-188), and also a more clarified explanation of the display pipeline within the “CompoundRay Software Pipeline” section (lines 245-247).

      We note that Fig 5 is simply intended as an example of the extreme differences in information that can be provided by nodel (the current state of the art) and non-nodal imagers (as in biological systems). A user could indeed produce custom projections (as now noted in the future work section of the Discussion), such as semi-orthgonal+cylindrical projections by modifying the projection shaders but we do not feel that this adds substantially to the desired message of Fig 5 as currently all view images are generated using the same projection method allowing them to be compared. Further to this, a semi-orthogonal+cylindrical projection would only serve to display these types of eyes and not be of significant use outside of this category of design. Rather, the utility of CompoundRay for research is now demonstrated by the inclusion of an entirely new example experiment (lines 394-467) (Fig 10) which compares artificial and realistic compound eye models in a visual tracking task.

      In additional we note that specific references to the “orientation-wise spherical mapping” of images have been added to appropriate image captions (Fig 5 & 6).

      Finally, we have attempted to be more explicit about about the way that 2D projection systems work within CompoundRay (182-185)

      2) It is clear that CompoundRay is fast and addresses complex compound eyegeometries. It remains unclear, why global illumination models are discussed while the implementation uses ray casting to sample textures without illumination which is equivalent to projection rendering which runs fast on much simpler hardware. If the argument is speed and simplicity of writing the code, that's great, write it so. If it is an intrinsic advantage of the ray-casting method, then comparison with the 'many-cameras' approach sketched below should be done:

      In your model, each ommatidium is an independent pin-hole camera. Instead of sampling this camera by ray-casting, you could use projection rendering to generate a small image per ommatidium-camera, then average over the intensities with an appropriate foveation function (Gaussian in your scenario, but could be other kernels). The resolution of the per-camera image defines the number of samples for anti-aliasing, randomizing will be harder than with ray-casting ;). What else is better when using ray-casting? Fewer samples? Hardware support? Possible to increase recursion depth and do more global things than local illumination and shadows? Easier to parallelize on specific hardware and with specific software libraries? Don't you think it would make sense to explain the entire procedure like that? That would make the choice to use ray-casting much easier to understand for naive readers like me.

      Thanks for this feedback, and can see that it was misleading to include this in our previous Methods section. We have now reduced and moved discussion of global illumination models to the future work section at the end of the Discussion. We have also added a clarification to the end of this document that summarises this point as it was raised by multiple reviewers (see Changes Relating to Colour and Light Sampling)

      3) CompoundRay, as far as I understand, currently renders RGB images at 8-bitprecision. This may not be sufficient to simulate the vision of arthropod eyes that are sensitive to other wavelengths and at variable sensitivity.

      Thanks for pointing out this easy-to-miss implementation detail. Indeed, you are correct that the native output is at 8-bit level as is standard to match display equipment. However, we note that the underlying on-GPU implementation operates at a 32-bit depth, so exposing this to the higher-level Python API should be possible, which could then be used as you suggest. We view adding enhanced lighting properties including shadows, illumination and higher bit depths so as to better support increased-bandwidth visual sensor simulation as future updates which we have now outlined in the Discussion (line 549-553).

      Reviewer #2 (Public Review):

      In this paper, the authors describe a new software tool which simulates the spatial geometry of insect compound eyes. This new tool improves on existing tools by taking advantage of recent advances in computer graphics hardware which supports high performance real-time ray tracing to enable simulation of insect eyes with greater fidelity than previously. For example, this tool allows the simulation of eyes in which the optical axes of the ommatidia do not converge to a single point and takes advantage of ray tracing as a rendering modality to directly sample the scene with simulated light rays. The paper states these aims clearly and convincingly demonstrates that the software meets these aims. I think the availability of a high-quality, open-source software tool to simulate the geometry of compound eyes will be generally useful to researchers studying vision and visual behavior in insects and roboticists working on bio-inspired visual systems, and I am optimistic that the describe tool could fill that role well.

      Thankyou for the positive feedback.

      As far as weaknesses of the paper, the most major issue for me is that I could not find any example of why the additional modeling fidelity or speed is useful in understanding a biological phenomenon. While the work is technically impressive, I think such a demonstration would increase its impact substantially.

      An example experiment has been added as requested.

      I can identify a few more, relatively minor, weaknesses: the software tool is not particularly easy to install but I think this is due primarily to the usage of advanced graphics hardware and software libraries and hence not something the authors can easily correct. In fact, the authors provide substantial documentation to help with installation.

      Indeed, we have tried to ease installation as much as possible by provided detailed documentation. This has been updated since initial submission and proven sufficient for multiple users. We have looked into dockerising the code but as correctly identified by the reviewer there are significant challenges due to proprietory hardware and their drivers.

      Another weakness of the tool, which the authors might like to address in the paper, is that there are some aspects of insect vision and optics which are not directly addressed. For example, the wavelength and polarization properties of light rays are hardly addressed despite extensive research into the sensation of these properties. Furthermore, the optical model employed here is purely ray based and would not allow investigating the wave nature of light which is important for propagation from the corneal surface to the photoreceptors in many species.

      Indeed, it is correct that the current implementation does not allow such advanced light modellign features but as our initial aim was to allow arbitrary surface shapes this was considered beyond the scope of this work. However, we have added a short description of extensions that the method would allow without significant architectural changes which include many of those listed by the reviewer. As the renderer simulates light as it reaches the lens surface, it is hoped that further works will be able to use this natural boundary between the eye surface and it’s internals to build further computational models that use the data generated in CompoundRay as a starting point to then simulate inside-eye light transport.

    1. but before we do that let me talk about something that's even more fundamental um and helps us to understand the progression of thinking through those four schools to the what's 00:42:10 usually considered the most sophisticated in my jamaica school um and that is the distinction which is really important between existence and intrinsic existence 00:42:23 and the ex and the distinction between no existence and no intrinsic existence so this is these distinctions um if one doesn't fully comprehend the the 00:42:37 majamika system uh not fully comprehend but have some idea of the of the uh my jamaica system one then usually make is not able to make these distinctions so 00:42:49 let's talk about them for a moment um so existence um we when we talk about existence we talk about our ordinary understanding of what's real okay that things are 00:43:03 objects uh things are you know they may be in relationship but what's in relationship are two different distinct objects or entities that are in relationship and that's kind of our normal understanding of existence 00:43:15 so lacking inherent existence or intrinsic existence begs the issue to understand what is intrinsic existence okay and that's the 00:43:27 object of negation for the buddha for nagarjuna and for all those following in this tradition of nagarjuna the uh the majamika school and so 00:43:39 that's not so easy to wrap our heads around uh what is intrinsic existence in a way it's so close that we miss it you know it's it's a little bit like you know 00:43:51 staying in a in a new hotel room in a new city waking up and looking for your glasses and you can't find them and then realizing that they're already on your faces and so 00:44:05 intrinsic existence is things existing independently things existing uh through relationship um things not not things existing dependently not in independently 00:44:19 and so if we look at dependence now we can look at that at several levels and the more obvious levels you've mentioned that carlo is cause and effect causality okay but there are also more uh 00:44:33 subtle levels of dependence that the buddha and nagarjuna talk about and are real central to the philosophy so the second level is the relationship between whole and parts and parts to whole it 00:44:46 goes both ways okay that's a a a little bit you know another level if you will of of dependence uh in the particularly you know highlighted by nagarjuna and 00:44:58 then the third level which is the most uh subtle level the subtlest level which is really what we have to start to understand because the opposite of that is this independent or intrinsic 00:45:10 existence okay so this third level we call dependence through designation or sometimes called dependent designation but it's dependence through designation 00:45:22 it's a type of naming or labeling so for example barry we label or name barry my parents gave this name to barry based on a body 00:45:34 okay maybe a little tiny infant body at that time right and also uh in terms of maybe some kind of behaviors or you know how they thought this emotional structure is for this little baby right 00:45:47 he's very calm or he's very you know he's acts out a lot he's very active or you know all those things so upon all that a name is placed in this case barry okay 00:45:59 so that relationship of you know dependence through designation is really what nagarjuna is talking about when we talk about dependence um and so that's very uh 00:46:11 important to understand so the opposite of that coming back to understanding this inherent or intrinsic existence there are many words in english we use synonymous for 00:46:23 ranging not existing intrinsically or inherently or independently or from its own side those are all synonyms um to the tibetan 00:46:36 terminology that i just mentioned um so when people don't have a good appreciation for intrinsic existence and you say then so the second there were two comparisons 00:46:53 the second comparison is uh non-existence and not inherently existent so when when when when regarding says no inherent existence what often people interpret is no 00:47:07 existence at all and they fall into a nihilism that nothing exists at all so they haven't fully under appreciated this notion of um intrinsic existence so they're throwing the baby out with the 00:47:20 bathwater right when we're throwing out or negating uh intrinsic existence that they don't quite understand what that really means they think it's all of existence and therefore they you know think that nothing exists they throw the 00:47:33 baby out with a backlog so that's that's okay can i interject something before you go ahead and you you you promised us before uh the full schools before uh but but can i 00:47:44 can i make a comment here um of course about you to say because this is free flow so yeah yeah so we you know we gave the title uh 00:47:56 what is real to this uh to this i that seems to me um that's exactly that distinction that that you you made between existence 00:48:09 and intrinsic existence um inherent existence it's a it it's it's uh it's idea that that i found central and and and 00:48:22 essentially essentially useful for me for for the following reason first of all um i mean the notion of reality the notion of existence here are close i mean what what exists is what is real what is that i want to say a couple of things one is 00:48:40 that um we make a distinction with an illusory and real in our everyday life uh which it's well founded i mean if i if i see 00:48:53 the chair and there's a mirror there and i see a chair of the other side of the mirror there's a precise sense in which the chair in which the other side of the mirror is not real well this chair is real 00:49:06 um this distinction has a meaning because i can sit on the chair i can touch that one but i cannot sit on that and touch that one but 00:49:18 then we realize that some aspects of what is illusory in the chair in the mirror also are shared by the chair which i just called real which is also illusory in 00:49:31 some other sets um for instance uh the fact of being a chair uh it's uh cut out and back on so i missed you up until now please could you repeat it oh 00:49:44 uh for where for where did you be speak uh when you were saying this distinction between existence and inherent existence and non-existence non-inheritances is 00:49:56 very helpful uh and then after that i lost you yeah i wanted to um make a couple points one is that uh we use a distinction between illusory and real in everyday life for instance we say that 00:50:10 a chair but then i was saying of course then um through science uh we realized that there are illusory aspects in the chair which are just called real as well 00:50:30 but then one is tempted and that's um to say all right so there are many luxury aspects of that chair but there is a a more fundamental level in which uh 00:50:45 there is a description of what is going on there which is a real one and edinton uh made it very very vividly in a well-known uh distinction between the scientific table 00:50:57 and the everyday table when he says look i have two images two tables there there's a table of which i eat which is solid and then there's a table which i view with my scientific eyes which is made by atoms 00:51:09 uh and is not solid there's a lot of emptiness of of not emptiness negatives empty completely different sense i i've heard that that emptiness is 99.9 to the 12th 00:51:20 power based in the atom is that right yes yes but that's of course not negative emptiness that's just the lack of presence of atoms yeah um and adidas says and people use that 00:51:34 by saying the the the the chair of my uh the chairman which i see the solitude is illusory the real chair is the atoms uh this way of using the notion of real and the 00:51:49 notion of um of uh existence so what exists in the atoms uh is dangerously misleading that's what 00:52:01 i uh because uh it uh um it pushes us to try to resolve the relational and illusory aspect of reality that we see 00:52:15 in terms of some basic fundamental physical reality from which to derive it or in western subjective idealism 00:52:28 in terms and its derivation in terms of some sort of uh fundamental mind or fundamental subject which is a real existing entity 00:52:41 the cartesian mind that is certain of existing itself um or the kantian subject or even the the the fundamentality of the perception 00:52:53 itself in whosoever uh and in phenomenology so there is this western need to anchor um the uh what we mean by real or something final 00:53:07 so uh to to realize that there is dependence but then there is some basic grounds on which everything builds up on which to uh on which to sit and this is what i take emptiness 00:53:23 the notion of empty negative notion of emptiness to be useful uh to to get rid of this urge of finding beyond the uh 00:53:35 the illusory aspect of the world a a basic level which is not um uh real in in in the uh 00:53:47 in the sense of uh uh of of uh uh in which this chair is is real compared to the uh to the chair uh in the mirror but but really the fundamental way so the the the bottom line of the story the 00:54:02 the solid terrain on which to anchor the ultimate um uh uh the end point of the line of dependence the line of dependence ends to some point that's what is real 00:54:15 and and what is this nagarjuna is that that's the wrong question i mean uh it's not only that the chair the table is empty because i can understand it's something else but it's 00:54:26 also that something else is also empty because i can understand it's something else until the point in which there is this emptiness itself it's a it's empty because we shouldn't take it as a 00:54:40 as a fundamental sort of metaphysical principle on which to ground all the rest so this putting this this is yeah just putting this in slightly different 00:54:51 terminology emptiness is where it allows functionality emptiness is the lack of any kind of essence even on a you know atomic level and i agree with you what you said 00:55:04 that's i think very true um right and this is a look at when we look at the chair versus the reflection of the chair in the mirror it gets a little more complicated because both of them of course lack any 00:55:17 independent existence both okay they're both empty uh in terms of shunyata having said that the metaphor that the buddha used he gave about 10 different 00:55:29 metaphors for you know something to be illusory and one of the important ones that he used was reflection you know he used the reflection of the moon or the full moon in in the still 00:55:41 water that it looks like the moon but in fact of course it's not it's a reflection he used such things as water in a mirage sound of an echo and you know things 00:55:55 like that to illustrate okay now um let me mention two experiments if i may and you correct me where i'm wrong i'm a 00:56:07 pop physicist from the new york times okay um and one is the uh the thought experiment of ed edwin schroedinger okay the so-called shorting her cat paradox 00:56:21 or thought experiment and you have double steel box in which you have a cat there's no doors no windows right and you have a vial of very powerful acid that's 00:56:33 connected to a radioisotope the half-life of the isotope is the same duration as the duration of your experiment your thought experiment so the chance of the cat so if the radioactive material 00:56:46 decays 50 chance it you know somehow pulls a lever and the acid spills killing the cat if that radioisotope does not decay there's no spillage of the of the 00:56:59 of the acid and the cat remains alive so quantum physicists call this superposition where the cat is both alive and dead when you crack open this steel box 00:57:13 then um you observe what's inside and then the cat is either dead if the radio isotope you know decayed and knocked over the acid or 00:57:25 it's alive it didn't okay and it's it's either or whereas when you can't observe it it's both it's superposition okay second is the double slit you know you you shoot these electrons or photons you 00:57:40 know through two slits in a metal thing and then you have a screen behind and you look at the the pattern and if you have a little camera observation device at the slit level of the slits observing 00:57:52 you find a pattern below on the back on the screen that suggests what passed through the splits were particles whereas if you remove the observation device you have an interference pattern 00:58:05 suggesting what went through this list were waves okay so these two experiments at least in my very uh you know superficial understanding tell us that observer dependence is very 00:58:18 important in terms of reality okay that whether or not there is or isn't or or maybe you can what type of observer you know presence there is very much influences and determines what's real 00:58:31 and so that then uh jumps into the four you know buddhist schools of philosophy and if we go from the so-called least sophisticated up the third one would be the one you alluded to that's somewhat 00:58:45 similar to bishop barkley in the west and other idealists that say that everything is consciousness everything is mine and things that seem to be solid out there in an external reality are nothing more than projections of our 00:58:58 mind and that's actually a very sophisticated philosophy it's a very sophisticated philosophy one of the things it starts to do is it breaks down this notion of a solid external reality 00:59:10 okay but it's con it's it's critique as you have you also mentioned is that it takes the mind you know to be somehow you know uh absolute or ultimate you 00:59:22 know existing and so then the highest if you will most sophisticated school of mediumica says well what the chidoma modulus the mind-only school says that's correct up to a point but the criticism is 00:59:36 there's no uh you know absoluteness about the mind either so then you end up with that you accept an external reality you accept a mind but both you know that is every existent thing uh exists 00:59:49 without having any uh exist in relationship without having any independence or objectivity um and so that's very roughly the at least the the the last two of the three buddhist schools the 01:00:03 third one is divided again into prasannika madhyamaka and spatrontikamanjamaka using tibetan terms that are borrowing from the sanskrit um and the prasangika mud yamaka is considered the most 01:00:16 sophisticated where nothing at all has intrinsic existence the whereas the uh svaltronticom and yamaka they say that some uh conventional reality does exist uh 01:00:30 from its own side having some essence uh so there's a little bit of a distinction in the debate there um so just wanted to to mention those things i'd like you to comment

      Kerzin differentiates between existence and intrinsic existence. Intrinsic existence is what the Buddha and what Nagarjuna is trying to negate.

      Rovelli makes a good point about a prevalent attitude that science offers a truer perspective than common sense, while Nagarjuna is pointing out that even the scientific explanation is not the final one. For one thing, it implicitly depends on the existence of a reified self who is the ultimate solidified existing agent and final authority, which Nagarjuna negates with his tetralemma.

    1. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. The physician, puzzled by a patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.

      Very interesting the way it describes different professions. Although there have been some approaches to the memex none of them have been this universally usable.

    2. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

      Even with current Zettlkasten technology like Logseq, a way to create a trail, and send off a particular trail to a friend is not present. I wonder what the copyright laws would look like when it comes to sharing excerpts as part of annotated trails like this. Would it be covered under Fair Use? What would a file format or a renderer for this look like?

    3. He can add marginal notes and comments, taking advantage of one possible type of dry photography, and it could even be arranged so that he can do this by a stylus scheme, such as is now employed in the telautograph seen in railroad waiting rooms, just as though he had the physical page before him.

      We have gotten away from written annotations for digital work and I'm not entirely sure it's a good thing. I want to think through the trade-offs of this.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Villalta, Schmitt, Estrozi and colleagues report their results on genome compaction in one of the most complex known viruses, the Mimivirus. This work will be of interest to a broad readership, and particularly to virologists and structural biologists. The authors describe a novel mechanism used by mimivirus to compact and package its 1.2 Mb dsDNA genome. In particular, the mimivirus genome is shown to be packed into magnificent cylinder-like assemblies composed of GMC-type oxidoreductases, presenting yet another remarkable case of enzyme exaptation. By using cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET), the authors determined the structures of such fibers in several relaxation states, which presumably represent different stages of nucleoprotein unpacking upon delivery into host cytoplasm. The authors also suggest (although do not directly visualize) that the lumen of the genomic fibers contains several viral enzymes, most notably, DNA-dependent RNA polymerase, which is necessary for cytoplasmic replication of the mimivirus. Overall, this is an important discovery, which further expands our appreciation of the "inventiveness" of viruses.

      We thank this reviewer for the positive and constructive comments. We provide now some additional data corresponding to unpublished follow up studies, we hope will help all reviewers assessing the quality and reliability of our work.

      I am not an expert on helical reconstructions and cannot evaluate the validity of the models. Thus, my specific comments will focus on aspects of the work with which I am more familiar.

      1) In light of the presented results, it is reasonable to assume that GMC-type oxidoreductases of the mimivirus are very important for the formation of functional virions. However, in a previous study (PMID: 21646533), it has been shown that the genes encoding GMC-type oxidoreductases can be deleted from the virus genome (M4 mutant) without the loss of infectivity. The M4 virions were devoid of the external fibers decorating the icosahedral capsid, but the genome was still packaged. How do the authors reconcile these results with those presented in the present manuscript? This should be addressed in the Discussion section.

      In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers. We managed to extract the genomic fiber of M4 (the isolate without GMC oxidoreductases). The fiber also has a rod-shaped structure but protein composition analysis of the purified fiber shows that different proteins are involved in its assembly.

      We hope the reviewers will accept to reserve our finding for a following publication.

      2) The authors state that mimivirus encodes two GMC-type oxidoreductases (qu_946 and qu_143) and that both could be fitted into the electron densities. However, I could not understand whether the authors think that the fibers are heteroassemblies of both oxidoreductases or different fibers are composed of different proteins, or only one is used for fiber formation. Please clarify. In case you are not able to distinguish between the two homologs (e.g., due to limited resolution), state so explicitly.

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      3) I am slightly puzzled by the observed "ball of yarn". It is hard for me to imagine that a cylindrical container/fiber containing a continuous dsDNA genome could be bent or fragmented into bundles because this would break the protein-protein interactions holding the fiber together. In Figures 1C and S1, are these parts of the same fiber or multiple fibers coming out of one capsid? Related to this question - is there evidence (e.g., from qPCR) that Mimivirus carries a single copy of genomic dsDNA per capsid?

      We believe this reviewer should think in terms of packaging. The folded genome is packaged through two lipid membranes (the one lining the capsid interior and the one in the nucleoid) concomitantly with its wrapping by the protein shell ribbon. Thus, there is plenty of space in the nucleoid at the beginning of the packaging and the genomic fiber is gently folded inside. But as more genome needs to be packaged, this compresses the flexible fiber into the nucleoid until it is totally encased in the nucleoid. That also defines the size of the nucleoid in the icosahedral capsid. This tight packaging is exemplified in Fig 1A for instance or the AFM images of the nucleoid enclosed in P3 of this file.

      We provide a more general answer in the answers requested by the editor.

      We think that the entire genome can only be packaged in the capsid through its assembly within the protein shell. We also think the genomic fiber is progressively built on the genomic DNA while it progresses into the capsid, most likely by an energy driven packaging machinery. This process can be compared to bacterial pili assembly, except that pili are built on the surface of the cell, while the genomic fiber is built into a compartment, the nucleoid, forcing it to fold in this compartment, which is only possible due to the high flexibility of the genomic fiber. Thus, the entire genome corresponds to ~40 µm of genomic fiber, which when folded as a ball can entirely fit into the nucleoid. The organization of the genome in a large “tubular structure” and its folding inside the nucleoid compartment has been previously reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15), which the authors refer to as “highly condensed nucleoprotein masses about 350 nm in diameter within the inner membrane sacs of virions”, with the presence of tubular structure they refer to as “thick cables of the nucleic acid” (image P3 herein).

      4) The authors describe the interactions between the monomers in the dimer of qu_946 as well as between qu_946 and DNA. I would also like to see a brief description of protein-protein interactions between subunits within the same helical strand as well as between helical strands, which hold the whole assembly together (i.e., what are the contacts between green subunits as well as between green and yellow subunits shown in Fig 2C). The authors suggest that the shell "would guide the folding of the dsDNA strands into the structure" (L310). To support this statement, the authors could show the lumen of the fiber rendered by electrostatic potential.

      We thank this reviewer for these suggestions. An additional supplementary Table (Table S4) is now provided listing the various contacting residues in each genomic fiber map and for each GMC-oxidoreductase. The number of contacts obviously decrease in the relaxed structure, but even in the compact forms, we noticed there are relatively few contacts intra and inter-strands, which may also explain the flexibility of the structure. We now provide a new figure 3 in which the lumen of the fiber is rendered by electrostatic potential for the Cl1a map and each of the two GMC-oxidoreductases.

      5) Please provide some background information on the distribution of GMC-type oxidoreductases in other families of giant viruses, so that it is clearer whether the described packaging mechanism is specific to mimiviruses or is more widespread.

      This is a central point, also linked to the question about M4. In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers.

      If this reviewer still thinks this is essential to this manuscript we can provide a multiple alignment of the GMC-oxidoreductases of members of each clade upon request.

      Reviewer #3 (Public Review):

      Since it was presented to the scientific community as a viral entity, mimivirus has the unlimited capacity to cause surprise and admiration. In this manuscript, Villalta, Schmitt, Estrozi, et al. and Abergel present how the mimivirus gigantic genome is organized into the virion. The authors succeeded in developing a protocol to trigger virus genome uncoating followed by genome-associated proteins purification. The presented data indicates that a helical shield composed of two GMC-type oxidoreductases is associated with the mimivirus genome, named genomic fiber. By cryo-EM, and cryo-tomography different forms and stages of the genomic fiber were detailed described, indicating the dynamics of fibers conformational changes, likely related to genome packing and uncoating during the virus replication cycle. In-depth analysis of a substantial number of individual virus fibers revealed that the mimivirus genome is folded and organized inside the aforementioned helical shield, which seems to be novel among giant icosahedral viruses. Proteomics in association with image analysis indicates that mimivirus packed genome forms a channel, which accommodates key enzymes related to early phases of the replication cycle, especially RNA polymerase subunits.

      I must disclose that I am not an expert on structural virology and proteomic analysis. Therefore, I don't feel I can contribute to the improvement of this kind of analysis. That said, I congratulate the authors for their efforts to make the manuscript story understandable to nonexperts.

      We are grateful to this reviewer for these positive comments.

      I have a few suggestions and comments:

      1) Please consider the "nucleocapsid" concept during genomic fiber presentation. I believe it fits in;

      We fully agree and this was why we referred to APBV-1. Obviously, it was not clear and we now explicitly use the word “nucleocapsid” in the text.

      2) The "ball of yarn" analogy is nice, but fig 1C shows several fibers unconnected (free) in one of their ends. I am wondering if it means that the genomic fiber is not a long-single structure covering the whole genome, but a bunch of several independent helical structures covering the whole genome and attached in such "ball of yarn". Like several threads connected. Could the authors clarify that please?

      In the “ball of yarn” structures, there are clearly breaks that give the impression of multiple fibers. Yet, these breaks are due to the multiple steps of the extraction, enrichment and purification treatment. The genomic fiber is built as a long (~40 µm) single structure folded in the nucleoid while it is loaded. As a result, it is tightly packed into the nucleoid and broken into fragments upon release due to the fragilizing treatment. As exemplified in the CryoEM image provided above (P9) on freshly opened capsids, these breaks appear to depend on the treatment. This reviewer could also look at the answer we provided to Reviewer 2 point 3 as this could help clarify how it is possible to package the genomic fiber and subsequently fold it into the nucleoid to the point where it is tightly packed and under pressure.

      3) Considering previously published data on proteomics of viral factories and transcriptomics of mimivirus: is there any temporal association between GMC-type oxidoreductases' peak of expression and genome replication during the viral cycle? what about RNA pol subunits? Are all those proteins highly expressed during the late cycle? or do they reach the peak concomitantly with genome replication? This information can support the discussion on the genome-fibers assembly during the cycle.

      We thank this reviewer for these suggestions. We now added time of expression of the proteins involved in the genomic fiber composition along the manuscript. We added explicit sentences in the main text both for the GMC-oxidoreductases and RNA polymerase subunits. The RNA polymerase as well as proteins involved in mRNA maturation are in the virion (Table S2 B) and studies by others demonstrate early transcription takes place in the nucleoid once transferred in the host cytoplasm (Reference 24). We also provided a link to the reviewers where to find the expression data for the different mimivirus genes. http://www.igs.cnrs-mrs.fr/mimivirus/

      4) Taken together, data seem convincing to demonstrate that the virus genome is located inside the helical shield. However, I believe that the authors could better explain why we only see 20 kb fragments in the gel, including in the control (in Fig S2).

      We hope our answers to this comment will convince this reviewer.

      Fig S2 corresponds to a regular 1% agarose gel and not to a PFGE gel. This gel was simply to show there is DNA associated with the genomic fiber and not to show the size of the DNA as the genomic fiber has been broken into pieces and we thus do not expect to have very high molecular weight. I must point out that when extracting the DNA form Mimivirus capsids using standard kits and pipetting, it also migrates at the top of the gel (Lane 1 in Fig. S2) while it would likely appear as a smear above 20 kb on a PFGE. By contrast when the viral particles are put into plugs prior lysis, the genomic DNA migrates at the proper size, as shown in the publication from Boyer et al. 2011 (reference 31), showing the genome of Mimivirus is a linear genome migrating around 1.37 Mb (Fig 1, Panel B, Lane M1). In P9 of this letter, an image of a long (> 6 µm) and flexible fiber is presented.

      Reviewer #4 (Public Review):

      In the manuscript "The giant Mimivirus 1.2 Mb genome is elegantly organized into a 30 nm helical protein shield", the authors show that, when subjected to low pH stress, the Mimivirus particle releases 30nm-diameter filamentous assemblies. These filaments consist of a protein shell that envelopes the Mimivirus genomic DNA. The protein shell is composed of two GMC-oxidoreductases, the same protein that forms the long fibers emanating from the capsid of the Mimivirus.

      Overall, despite being interested in the subject, this scientist was left confused about several aspects of the paper described below. The presentation of the material is also confusing.

      We hope the answers and images we provide to all Reviewers in page 2 to 12 herein will clarify the various points raised by this reviewer.

      1) The presented data do not allow the estimation of the amount of mimivirus genome organized into 30 nm diameter filaments. Hence, the title of the paper is misleading.

      The entire genome should be packaged in the genomic fiber. That was already observed by other and we now provide an image of the nucleoid imaged by AFM that was published. The image was extracted from Kuznetsov et al. J. Virol. 2013. See p9 of this letter.

      2)The filamentous structures are a result of extremely harsh treatment of the virus particle, which starts with a 1.5 hour-long incubation at pH 2. Do the filaments actually exist inside the virus particle as the title of the paper implies?

      The 1 h incubation at 30°C and pH 2 was only applied to recover the nucleoids (see material and method section “Nucleoid extraction”) presented in Fig S1A. Acidic treatment was never applied to produce the genomic fiber as we noticed it is sensitive to both temperature and acidic treatment. All steps of the extraction protocol were performed at pH 7.5 (section: “Extraction and purification of the mimivirus genomic fiber”). We must emphasize that the release of the genomic fiber can be seen at the very first step of the extraction protocol (protease treatment). The sample was also controlled at each step of the protocol by negative staining TEM to assess the status of the genomic fiber. We had to optimize the protocol as using a too soft proteolytic treatment led to too few opened particles but with mostly a compact genomic fiber released, if it was too harsh, all particles were opened but the genomic fiber was mostly in the ribbon state. We had to compromise to get a decent amount of compact and relaxing structures to be able to perform the present work. We would like to stress out that we could reproducibly obtain the genomic fiber from many preparations and that we could observe them with different virions (including M4), even using different protocols (only the one with the better yield is reported in the manuscript).

      In the Figure 1B the genomic fiber can be seen inside a virion and is still encased in the membrane compartment. These structures were not reported in previous cryo-EM analyses of the virions. As said above, they were only reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15). See p9.

      Or [might] these filaments [form during] host take over?

      Or [perhaps] these filaments [result from a harsh in vitro treatment] and have nothing to do with either?"

      The first two questions can be answered with the help of cryoFIB tomography, which might be beyond the scope of a "paper revision". However, the properties of the two GMCoxidoreductases in the presence and in the absence of genomic DNA must be examined in greater detail. Can these proteins, by themselves, form similar hollow filaments (or any filaments) when subjected to the same treatment as the virus?

      I personally have difficulties to imagine that such a complex structure could be the result of an artefact due to the treatment for several reasons: - It is unlikely that by simply putting the GMC-oxidoreductases with DNA would result in a helical structure where the DNA is folded 5 times and internally lining the protein shell (extended data video1 of one tomogram). It would be like crystallizing the proteins (in a heterogeneous sample) onto the folded DNA to form a helix with a hollow lumen. The crystallographic data obtained by others by on the mimivirus GMC-oxidoreductase did not produce tubular structures either and they reported 3 crystal forms. They overexpressed the proteins in E. coli and did not report such structures bound to DNA either.

      • Given the presence of compact and relaxed forms, once relaxed the helix cannot go back to a compact state passively by simply rewinding suggesting the relaxed forms are the result of decompaction of a constrained structure. This is also supported by the loss of DNA in the relaxed state Cl3. Last steps of unfolding correspond to the loss of one ribbon strand after the other.

      • The contacts between chains intra and inter strand are also scarce supporting an active assembly of the structure. We now provide an additional supplementary Table S4 with the different contacts for the different states of the genomic fiber.

        3) Although the assignment of the qu_946 oxidoreductase to the corresponding cryo-EM density is correct (as the resolution is high enough), I am confused about the other oxidoreductase (qu_143). Where does it fit to? Which structure does it form?

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      Equally important, what is going on with the N-terminal 50-residue domain of qu_946? Is there a space for it in the cryoEM map? Is it disordered?

      The N-terminal domain is only present in the fibrils decorating the capsids. As illustrated in Fig S12, when analyzed by MS-based proteomics, the comparison of the peptide coverage of the GMC-oxidoreductases whether they compose the fibrils or the genomic fiber is not the same. The N-terminal domain is clearly covered when the fibrils (data not shown) or intact virions are analyzed and not covered when the analysis is performed on the genomic fiber. That is why we propose this N-terminal domain could be an addressing signal (see main text) and that a protease could be cleaving it in the case of the genomic fiber assembly.

      Main text: The proteomic analyses provided different sequence coverages for the GMCoxidoreductases depending on whether samples were virions or the purified genomic fiber preparations, with substantial under-representation of the N-terminal domain in the genomic fiber (Fig. S12). Accordingly, the maturation of the GMC-oxidoreductases involved in genome packaging must be mediated by one of the many proteases encoded by the virus or the host cell.

      Indeed, there is no space to accommodate this domain as it would prevent the interaction between the protein shell and the DNA or/and induce an increase of the genomic fiber diameter that would be too big to be accommodated into the nucleoid.

      4) The bubblegram analysis is not very convincing. The bubbles appear to correlate with the length or thickness of the structure - the long or overlapped structures form bubbles. The bubbles may not be due to the presence of DNA.

      The point is, as demonstrated by our structural studies, that the relaxed structure lost the DNA. This is why bubble cannot be seen in the relaxed broken fibers. On long fibers still in compact form, the DNA is visible in the structure and bubble can be seen. Yet the evidence for the presence of DNA in the structure is also provided by the agarose gel of the purified genomic fiber and the cryo-EM structures. Bubblegrams are just one additional analysis which was provided.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are sincerely grateful to the reviewers for several key comments that led us to correct some mistakes and better appreciate how to put our findings in the context of recently published data. These changes undoubtedly improved the manuscript.

      Many other reviewer comments seem to equate chaperone binding with a functional chaperone role in de novo folding. These are not the same. Cytosolic chaperones presumably “sample” nearly every protein that is synthesized by cytoplasmic ribosomes. This does not mean that every such protein would misfold if even one of those chaperones failed to bind it. If we want to understand what chaperone mutations might cause human disease due to septin misfolding, for example, it will not be enough to catalog all the chaperones that bind septins. We have already done that. What will help is to understand which chaperones make functional contributions to septin folding and complex assembly. Our study is the first to experimentally address chaperone roles in de novo septin folding, period. We take responsibility for not being sufficiently clear about the goals of our work, and, to emphasize these points, we added one sentence to the Introduction and revised another.

      Another consistent criticism was that the use of the E. coli system, both in vivo and in vitro, limited our ability to gain insight into the folding of septins in eukaryotic cells and led to a “tessellated view”. For example, reviewers claimed that our model about translation elongation rates for Cdc12 were “based mainly on the E. coli system and bioinformatics analysis”. We disagree with this interpretation. Key evidence in support of our model come from published data in yeast, specifically the much higher density of ribosomes on Cdc12 and the accumulation of ribosomes on the Pro-rich cluster near the Cdc12 N terminus. These are precisely the kinds of “more stringent analysis” in “authentic yeast” (to use Reviewers’ language) that we would have wanted to do to test our model, had they not already been done by others. Without specific suggestions, we struggle to imagine what other kinds of experiments the Reviewers have in mind, apart from a eukaryotic version of a reconstituted cell-free translation system, which Reviewer #1 admits “would be substantially difficult” and “time consuming”. While we are intrigued by the reconstituted eukaryotic cell-free translation system that was published last year (which we mentioned on lines 994-995) and look forward to exploring it in future studies, it is not commercially available and we agree that the amount of effort required to prepare it ourselves is unrealistic for the current study. Most importantly, we do not find in the critiques provided any specific reason why our E. coli-based systems experiments are intrinsically less “stringent” or “rigorous”.

      Accordingly, we think that, together with the results of multiple new experiments (detailed below), the extensive re-writing and re-ordering that we have done in the revised manuscript will be enough to better emphasize the importance and rigor of our findings and thus to address all of the Reviewers’ specific concerns.

      Reviewer 1 thought that our manuscript “does not even provide new information, since the involvement of CCT and the Hsp70 system is not novel” and thought that “the key finding of this manuscript is how chaperones are involved in the de novo folding of septins, which is not conceptually new because of previous findings, including those of the authors”. Reviewer #3 also stated that “the function of Tric/CCT in septin folding and assembly is well documented”.

      We were quite surprised at this reaction, since we dedicated a significant portion of the original manuscript (lines 68-76 and 319-322) to explicitly discussing the only other paper in the literature that specifically addresses the question of whether or not CCT is required for de novo septin folding. As a reminder, that paper explicitly stated that “it is unlikely that CCT is required to fold septins de novo” and “septins probably do not need CCT for biogenesis or folding”. With regard to involvement of the Hsp70 system, the only existing evidence in the literature on this subject is the aggregation of some septins in ssb1∆ ssb2∆ cells. Like the CCT study, that study did not distinguish whether this was a result of problems during septin synthesis and before septin complex assembly, or, alternatively, whether pre-folded and assembled septins were subject to disassembly, misfolding, and aggregation. Our experiments specifically test the fate of newly-synthesized septins prior to assembly in living cells. Our previous findings documented physical interactions between wild-type septins and multiple chaperones but did not address whether these interactions had any functional relevance. We previously reported functional effects of interactions between chaperones and MUTANT septins but, again, these studies did not address functional chaperone requirements for WILD-TYPE septins. While we did our best to highlight these points in the original document without devoting excessive amounts of text, we accept responsibility for not making these points sufficiently clear and to address this issue we added additional text, including the text quoted above, to the Introduction.

      While Reviewer #3 commented that the manuscript “is overall well presented”, Reviewer 1 thought that the manuscript was “complicated to read” with “no logical connections, just a list of many results” and mentioned that part of the difficulty was “that it contains many negative results”.

      In addition to reorganizing the manuscript, as suggested by the reviewers, we added more text at the beginning and end of nearly every section to even more explicitly state the logical connections between results. In our opinion, negative results of properly controlled experiments are valuable to the research community, and we do not understand what it is about negative results that makes them difficult to read about. Many of the extra experiments we performed were in anticipation of being asked to perform them by reviewers, some of which generated negative results. We are reluctant to remove negative results unless there is a more compelling reason. For example, to address another reviewer concern, we did remove the negative results with the Ydj1–Ssa2 compensatory mutants.

      Reviewer #2: “4) Figure 2: The labeling on the protein structure makes it seem like the exact region for Ydj1 and Hsp70 was experimentally identified, when it hasn’t.”

      We acknowledge that the first sentence of the figure legend (“the colored ribbon follows the color scheme in the sequences at right for overlapping β-aggregation, Ydj1 and Hsp70-binding sites”) could be misinterpreted, since only in the second sentence does it say “Sequence alignments show predicted binding sites”. We corrected this mistake, and added the text “Predicted chaperone binding sites” as the first words in the legend to this figure.

      Reviewer #2: “8) The authors confusingly jump back and forth between different Septins and different chaperone (Ssa1-4, Ydj1, Sis1, Hsp104). We would ask the authors to re-arrange the manuscript, collating all the yeast work in one section and bacterial work in another.”

      We re-arranged the manuscript and put all the yeast work in one section and all the bacterial work in another, with the exception of the studies of individually purified Cdc3 and Cdc12, which we put in between the yeast studies of the kinetics of de novo assembly and the yeast studies of post-translational assembly. Our reasoning is that the studies with the purified proteins demonstrate challenges with maintaining native conformations in the absence of chaperones and other septins, which flows naturally into the yeast studies asking about the ability of “excess” septins to maintain oligomerization-competent conformations in the absence of other septins and when we experimentally eliminate specific chaperones. All of the work actually manipulating E. coli genes/proteins is now together.

      Reviewer #3: “1. The co-translational binding of CCT to nascent polypeptide chains has been studied (Stein et al., Mol Cell 2019). While the authors indicate that septin subunits are engaged co-translationally, they do not comment which ones are interacting with CCT and at which state of translation. This information is crucial and should also be mentioned in the discussion section.”

      We are grateful to the Reviewer for bringing up this point, which we had overlooked. We hadn’t noticed that, in the end, only Cdc3 met the CCT confidence threshold to be included in the supplemental data of the Stein et al. paper. All septins co-purified with CCT in an earlier Dekker et al proteomic study, so we strongly suspect that the failure of the other septins to meet the confidence threshold in the Stein et al paper reflects the sensitivity of that assay, rather than a significant difference in how septin GTPase domains interact with CCT. We also hadn’t appreciated that according to that study, the main sites in the Cdc3 GTPase domain bound by CCT and Ssb are the same. Hence our statement that Ssb bound to septins “earlier” during translation, and CCT bound “later” was wrong. Instead, the overlapping Ssb and CCT site in Cdc3 turns out to be remarkably consistent with a conclusion from Stein et al paper, that CCT binds Rossmann-fold proteins like septins at sites where “early” beta strands have been translated and expose a chaperone-binding surface that later becomes buried by an alpha helix. We corrected our mistake in the text and in our model figure and added: (1) a new supplemental figure with predicted septin structures and a sequence alignment indicating where CCT and Ssb bound; and (2) text discussing the confidence thresholds for “calling” septin-CCT interaction, the Rossmann-fold binding, and how we interpret Ssb and CCT binding to the same site.

      Reviewer #3 “3. Figure 3: It is recommended to also follow Cdc10-GFP and Cdc12-GFP fluorescence. This will on the one hand generalize the presented findings and provide a direct link to other parts of the study (e.g. crosslinking analysis of Cdc10).

      We carried out the requested experiment for Cdc12, using Cdc12-mCherry rather than Cdc12-GFP because of the formation of non-native foci that we observed with Cdc12-GFP. We also attempted to analyze Cdc10, using an existing GAL1/10-promoter-driven Cdc10-mCherry plasmid that we’d made a few years ago, but it did not behave as expected, with high expression even in the absence of galactose (not shown), which prevented us from performing the requested experiment. We have a Cdc10-GFP plasmid with the inducible MET15 promoter, but this promoter does not provide sufficiently low levels of expression in repressive conditions, so there would be too much expression at the beginning of the experiment for us to accurately follow accumulation thereafter. Instead, we tried the only other plasmid we had with the GAL1/10-promoter controlling a tagged septin: Cdc11-GFP. Above a certain threshold of expression, Cdc11-GFP formed unexpected cortical foci, but we were still able to perform the analysis and found a clear delay in septin ring signal in cct4 cells, providing the requested generalization to other septins, if not Cdc10.

      Reviewer #3 “5. Figure 4C: The finding that only ssb1 but not ssb2 knockouts have an effect on joining of free Cdc12-mCherry subunits into septin rings is puzzling. Similarly, Ssb1 largely acts co-translationally, while in this assay post-translational septin ring assembly is monitored. The authors need to comment on these two points.”

      We did not examine ssb2 knockouts, so we do not know to what the Reviewer is referring in the first point. If the Reviewer means that they are puzzled by the fact that we saw a phenotype in cells in which only SSB1 was deleted and SSB2 remained, we offer two explanations. As can be seen in the Saccharomyces Genome Database entry for SSB1 (https://yeastgenome.org/locus/S000002388/phenotype), there are at least a dozen known phenotypes associated with deletion of SSB1 in cells with wild-type SSB2. We even showed a very clear septin misfolding/mislocalization phenotype in Supplemental Figure 4D. Thus while our findings are new and provide novel insights into Ssb function, they are not unprecedented. The Reviewer is correct that most Ssb is ribosome-bound and thus Ssb1 “largely acts co-translationally” but ~25% of Ssb is not ribosome-associated (PMID: 1394434). Furthermore, the lack of a strong phenotype for ssb1∆ cells in our new kinetics-of-folding experiment (see below), plus the realization that Ssb and CCT both bind the same site in Cdc3, leads us to a new model: Ssb acts both co- and post-translationally in septin folding, but only the post-translational function is associated with a phenotype in ssb1∆ cells, because in that assay we drastically overexpress a tagged septin and thereby exceed the Ssb chaperone capacity that remains when we delete SSB1. This logic also explains the first ssb1∆ phenotype we saw, when overexpressing Cdc10(D182N)-GFP. In the kinetics-of-folding assay, on the other hand, tagged septin expression is much lower and reducing the amount of total Ssb by ~50% (via SSB1 deletion) likely does not compromise Ssb function in folding the tagged septin. We therefore removed our statement that “Ssb dysfunction leaves nascent septins in non-native conformations that are aggregation-prone and unrecognizable to CCT”, revised our model figure accordingly, and added new text and citations to explain our new model.

      Reviewer #3 “Additionally, they should test whether the appearance of septin ring fluorescence is slowed down in ssb1 mutants (as shown for cct4-1 mutant cells in Figure 3B).”

      We agree that slower septin folding in ssb1∆ cells is a prediction of our model, and we performed the requested experiment and include the results in our revised manuscript. The new data show that the appearance of septin ring fluorescence is not delayed in ssb1∆ mutants, which is easily explained by the ability of Ssb2 to chaperone the folding of the low levels of tagged septin that we express in these kinds of experiments (see above).

      Reviewer #3: “7. Figure 5G: The data is not convincing. This reviewer cannot detect a specific Cdc12 band accumulating in presence of GroEL/ES.”

      We re-ran the reactions again with fresh reagents and this time ran the gel longer to reduce excess signal from free fluorescent puromycin and the bright Cdc10 bands. We now see a very clear band for full-length Cdc12 in the reaction with added GroEL/ES, fully consistent with our mass spectrometry results. We updated the figure with the new results.

      Reviewer #3: “Furthermore, the activity tests done for the chaperonin system are confusing (Supplemental Figure 7). The ATPase rate (slope!) of GroEL/GroES seems higher as compared to GroEL but according to the authors it should be opposite.”

      In our assays, the ATPase activity is so fast that for our “time 0” timepoint, much of it has already occurred by the time the reaction can be physically stopped and measured. In other words, the handling time is such that we can’t visualize what happened in the earliest stages of the reaction, where the rates could accurately be estimated as slopes. This is obvious from the fact that at time 0, the absorbance for the “GroEL alone” reaction is already more than twice the absorbance for GroEL+ES. We added clarifying text to the figure legend.

      Reviewer #3: “The refolding assay using Rhodanese as substrate is also confusing: What is the activity of native Rhodanese? The aggregated Rhodanese sample seems to have substantial activity that is not too different from a GroEL/ES-treated one. From the presented data it is not clear to the reviewer to which extend GroEL/ES prevents aggregation and supports folding of denatured Rhodanese.”

      We thank the Reviewer for bringing this to our attention, because made we mistakenly left out the values for native Rhodanese with the reporter. With regard to the aggregated Rhodanese, we failed to note that this sample contains urea. When the urea absorbance is subtracted, it is clear that the GroEL/ES-treated sample has higher activity. Furthermore, some native enzyme is likely still active within the aggregated sample, explaining the “substantial activity” that the Reviewer correctly notes. We corrected the figure and added clarifying text to the figure legend.

      Reviewer #3: “the study goes astray following aspects that does not seem relevant to this reviewer (e.g. the role of N-terminal proline residues for Cdc12 translation, Fig. 5E/F).”

      We acknowledge that we did a poor job of introducing the N-terminal Pro-rich cluster in Cdc12 with relation to our model of slow Cdc12 translation. Instead, we have revised and reorganized the manuscript to set up these experiments as a direct test of our model: if ribosome collisions on the body of the ORF drive mRNA decay, then decreasing the spacing of those ribosomes should exacerbate the problem, and eliminating the Pro-rich cluster (where published yeast data already show ribosomes accumulate) is the most logical way to test the prediction. Far from being irrelevant, the results fit the prediction perfectly and thus support the model. We expect that this change will highlight the importance of these experiments for the reader.

      Reviewer #2: “1) Fig. 1 Is the folding of Cdc3 being measured in cells lacking chaperones mentioned towards the end of the paper or are the authors referring to the lack of yeast proteins?”

      We are unclear as to what the Reviewer is asking here. The title of Figure 1 states that these are “purified yeast septins” and the figure legend further emphasizes this fact. Additionally, the Coomassie-stained gel in Figure 1A shows a single band, corresponding to purified 6xHis-Cdc3. The proteins were purified from wild-type E. coli cells, so all E. coli chaperones were present when Cdc3 initially folded, but chaperones and all other proteins were removed during the purification and prior to the analysis. We do not know what change to make.

      Reviewer #2 asked “How do the authors account for the septin defect in Ssa4 delete cells in unstressed conditions where Ssa4 would be very low already? According to the authors previous work, Ssa2 and 3 should be able to compensate.”

      We explicitly addressed this point in the original manuscript (lines 893-898). Again, we think here the Reviewer is equating chaperone binding with chaperone function. According to our previous work, Ssa2 and Ssa3 are able to bind septins, but this does not mean that they can fold septins the same way as Ssa4. We cite several papers that discuss the distinct functional roles for the different Ssa proteins. We do not think that additional clarification of this point would strengthen the manuscript.

      Reviewer #3: “6. Figure 5B: It is unclear why Cdc3 is observed in the pulldown of His-tagged Cdc12 (37˚C), although no Cdc12 was isolated under these conditions. How is that possible?”

      That is not possible. As we indicate in the figure legend and with the red asterisk, the only band appearing in that lane is a non-specific band that cross-reacts with the anti-Cdc3 and/or anti-Cdc11 antibodies. This is why it is also present in the “No septins” control lanes. We made the asterisk larger to help accentuate this point.

      Reviewer #3: “Furthermore, the authors observe a specific effect on Cdc12-Cdc11 assembly in the E. coli groEL mutant. How do they rationalize this specific effect as Cdc12-Cdc3 assembly remained unchanged? This observation also seems in conflict with the suggestion of the authors that Cdc12 preferentially recruits Cdc11 before interacting with Cdc3 (page 45, lane 1024).”

      Cdc11 was not expressed in the groEL mutants because no Cdc11 gene was present in those cells, as explained in the body text and indicated in the labeling above the lanes in Figure 5A. The band near the size of Cdc11 is a non-septin protein that bound to the beads in the groEL-mutant cells, as is shown in the immunoblot using anti-Cdc11 antibodies in Figure 5B. Thus there is no conflict to rationalize.

      Reviewer #1: “The only evidence that CCT binds to septin is the list of LC-MS/MS. Western blotting would provide more solid data.” and “2) The cross-linking experiments appears not to have been successful. Why are the Ssas, Ydjs etc not detected here? “

      First, CCT subunits are relatively low-abundance, expressed at 5- to 50-fold lower levels than other chaperone families in the yeast cytosol (see PMID: 23420633). To the Reviewer’s second point, we did in fact detect other chaperones in our crosslinking mass spectrometry experiments, including Ydj1, multiple Ssa and Ssb chaperones, Hsp104, etc., as can be seen in Table S1. However, they were also detected in negative control experiments. This is not surprising, given that these chaperones are among the most common “contaminants” of affinity-based purification schemes (see the CRAPome database at https://reprint-apms.org/). It was for this reason we had to perform so many negative control experiments, which likely produced some false negative results, as some “real” interactions were likely discarded when the same chaperone showed up in our controls. We added a figure panel with a Venn diagram of overlap between experimental and control samples, and text pointing out this caveat of our approach.

      Second, in this experiment we attempted to identify proteins that transiently interact with a specific region of Cdc10 that will later become buried in a septin-septin oligomerization interface. Due to the transient nature of the interaction, we do not expect to detect high levels of crosslinked chaperones. Mass spectrometry is significantly more sensitive than immunoblotting, so there is no guarantee that we would be able to detect a band even if the crosslinking works as desired. Indeed, the crosslinked bands we saw by immunoblot for GroEL were quite faint (see Figure 2F), despite the fact that GroEL and the T7-promoter-driven Cdc10 were among the most abundant proteins in those E. coli cells.

      Third, there is no commercially available, verified antibody recognizing yeast Cct3 for which to perform the requested immunoblot experiment. Since both the N and C termini of CCT subunits project into the folding chamber, it is unwise to use a standard epitope tagging approach, as the tags may compromise function. Indeed, for purification purposes others inserted an affinity tag in an internal loop in Cct3 (PMID: 16762366). We have a yeast strain with Cct6 tagged in an analogous way, but to perform the requested immunoblot experiment with Cct3 would require creating or obtaining the Cct3-tagged strain, deleting NAM1/UPF1, and introducing our Bpa tRNA/synthetase and GST-6xHis-Cdc10 plasmids. Given the sensitivity of detection concerns stated above, we doubt this would help.

      In summary, we prefer not to attempt the requested immunoblot experiments.

      Reviewer #1: “-Fig. 3B ant related Figures: The experiment to see if GFP-tagged septin accumulates in the bud neck is important, but only the graphs after the analysis are shown. The authors should provide the readers with representative examples from imaging data.”

      We are confused, because the images at the bottom of Figure 3A already show what the Reviewer requests. As stated in the figure legend, these are representative examples of the imaging data from a middle timepoint of one of the experiments. It would be nearly impossible (for space reasons) to provide representative images for all of the timepoints for all of the genotypes for all of the experiments. Since in our new experiments we introduce new tagged septins (Cdc11-GFP and Cdc12-mCherry), we also now include representative images of cells expressing these proteins, as well.

      Reviewer #2: “3) If the authors had evidence of chaperone interaction from their previous study, why did they not simply do IPs with fragments of the septins/chaperones?”

      We are unclear why the Reviewer is suggesting IPs after referring to our previous study. IPs are a poor choice for transient interactions, which is why we mostly avoided them in previous studies, and instead used a novel approach (BiFC) to “trap” chaperone–septin interactions. Moreover, we seek to identify chaperones that bind wild-type septins at future septin-septin interfaces on the path towards the native conformation. Fragments of septin proteins would likely misfold and would therefore likely attract chaperones that wouldn’t normally bind the full-length septin. Indeed, our previous studies demonstrated that even a single non-conservative amino acid substitution was sufficient to alter chaperone-septin binding. Thus IPs with fragments of septins or chaperones would be highly unlikely to yield informative results for the questions we seek to answer. We strongly prefer not to attempt these suggested experiments.

      Reviewer #2: “5) While differences between Ssa paralogs are highly interesting, using deletions of Ssas is not useful, given that yeast compensate by overexpressing other paralogs. The yeast GFP Septin assays should be repeated in yeast lacking all Ssas and expressing one paralog on a constitutive promoter (See numerous papers by Sharma and Masison).”

      We disagree that ssa deletions are “not useful”, since if the overexpressed paralogs cannot fulfill the same function as the deleted SSA, then we will see a phenotype. Which we do. Furthermore, we had already obtained and thoroughly tested a strain like the ones mentioned by the reviewer (ECY487, a.k.a. JN516, from Betty Craig’s lab, with ssa2∆ ssa3∆ ssa4∆ and SSA1, which is constitutively expressed, PMID: 8754838), but we found that, as published, it divides slightly more slowly even under the most permissive of conditions. The requested strain cannot be analyzed using our method, because slow accumulation of ring fluorescence could be attributed to other defects unrelated to septin folding. Thus we strongly prefer not to attempt the suggested experiments.

      Reviewer #2: “7) The authors need to clarify the experiment with the Ydj1 D36N and Ssa2 R169H. In Reidy et al, they never fully biochemically test this system and it was never examined for Ssa2-Ydj1. The authors would need to do some fundamental experiments to demonstrate the validity and functionality of this double mutant in yeast.”

      Given that this experiment was unable to generate meaningful data, since the mutations affected the kinetics of induction of the GAL1/10 promoter, we do not think the requested biochemical experiments would add any value to the study. Instead, we removed these studies from the manuscript.

      Reviewer #3: “4. Figure 3B: The difference between wt and cct4-1 cells in appearance of septin ring fluorescence is observed at one timepoint. Since this experiment is considered highly relevant, the authors are asked to include another timepoint to bolster the conclusion that Cdc3-GFP folding and thus septin ring assembly is delayed in the CCT mutant.”

      We carried out new experiments with cct4-1 cells using Cdc12-mCherry and Cdc11-GFP with more timepoints than in our original cct4-1 experiments with Cdc3-GFP. Since these experiments provide the same kinds of results, but at multiple timepoints, we do not see the value in repeating the Cdc3-GFP experiment.

      Reviewer #3: “If Ssb1 functions to maintain Cdc12 in an assembly competent state preventing misfolding, one would expect either enhanced degradation or aggregation of Cdc12-mCherry in ssb1 mutant cells. Did the authors check for such scenario? Septin aggregation has been shown in a ssb1 ssb2 double deletion strain (Willmund et al., 2013), yet the data shown here predict that aggregation might already occur in single ssb1 mutants.”

      We already examined septin aggregation in single ssb1 mutants and showed these data (Supplementary Figure 4D). Indeed, this phenotype was the rationale for testing post-translational septin assembly in ssb1 single mutants. We have seen no evidence of septin degradation in any context (as we mentioned on line 889), so we would not expect it here. While we added new text and a very new citation showing that many “misfolded” conformations of wild-type E. coli proteins avoid aggregation and degradation, we do not think that the suggested experiments would add enough value to the current study to justify the effort, time and expense.

      Reviewer #3: “Fig. 3C: The figure showing septin ring fluorescence does not include error bars. This is crucial, also because the difference between wt and ssa4 mutant cells is not large.”

      There are, in fact, error bars included in the figure, as can be most clearly seen for the final timepoint for the ssa4∆ cells. For most of the other timepoints the error bars are smaller than the data point symbols (the circles and squares). We do not think that adjusting the size or opacity of the symbols to better show the error bars will be sufficiently valuable to justify the effort.

    1. At the same time, like Harold, I’ve realised that it is important to do things, to keep blogging and writing in this space. Not because of its sheer brilliance, but because most of it will be crap, and brilliance will only occur once in a while. You need to produce lots of stuff to increase the likelihood of hitting on something worthwile. Of course that very much feeds the imposter cycle, but it’s the only way. Getting back into a more intensive blogging habit 18 months ago, has helped me explore more and better. Because most of what I blog here isn’t very meaningful, but needs to be gotten out of the way, or helps build towards, scaffolding towards something with more meaning.

      Many people treat their blogging practice as an experimental thought space. They try out new ideas, explore a small space, attempt to come to understanding, connect new ideas to their existing ideas.


      Ton Zylstra coins/uses the phrase "metablogging" to think about his blogging practice as an evolving thought space.


      How can we better distill down these sorts of longer ideas and use them to create more collisions between ideas to create new an innovative ideas? What forms might this take?

      The personal zettelkasten is a more concentrated form of this and blogging is certainly within the space as are the somewhat more nascent digital gardens. What would some intermediary "idea crucible" between these forms look like in public that has a simple but compelling interface. How much storytelling and contextualization is needed or not needed to make such points?

      Is there a better space for progressive summarization here so that an idea can be more fully laid out and explored? Then once the actual structure is built, the scaffolding can be pulled down and only the idea remains.

      Reminiscences of scaffolding can be helpful for creating context.

      Consider the pyramids of Giza and the need to reverse engineer how they were built. Once the scaffolding has been taken down and history forgets the methods, it's not always obvious what the original context for objects were, how they were made, what they were used for. Progressive summarization may potentially fall prey to these effects as well.

      How might we create a "contextual medium" which is more permanently attached to ideas or objects to help prevent context collapse?

      How would this be applied in reverse to better understand sites like Stonehenge or the hundreds of other stone circles, wood circles, and standing stones we see throughout history.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General response to the reviewer

      We thank all reviewers for their constructive comments on our manuscript. We were very pleased to see that the reviewers found our study ‘…represent new insight in the field’ (rev#1) and ‘…contains important and exciting novel findings’ (rev#2), and ‘…gives a more detailed perspective on how Src proteins (Src42A in Drosophila) control epithelial stability and the contraction of specific surfaces of epithelial cells’ (rev#3). The reviewers raised a number of specific points that we partially addressed already in a preliminary revision of the manuscript. Some more points will require some additional experiments that we will incorporate in a fully revised version of the manuscript.

      Reviewer #1

      (Evidence, reproducibility and clarity (Required)): Highest priority: 1) The Src42A knockdown and germline clone experiments both cause defects in cellularization (Fig. 2B and 9A), which could result in differences in the state of the blastoderm epithelium (cell size, cell number, structural integrity, organization, etc.) between the experimental and control conditions. In addition, Src42A knockdown appears to affect the size and shape of the egg (Fig. 9A and 9C). The manuscript would be strengthened if the authors included data to demonstrate that the initial structure of the epithelium is mostly normal (quantifications of cell size, number, etc.) in the Src42A RNAi condition, as this would bolster the argument that germband extension, rather than due to indirect effects resulting from the cellularization defects. The authors may have relevant data to do this on-hand, for example using data associated with figures 1, 3, 6, and 9.

      Response:

      The cellularization phenotype of src42A knockdown embryos has a penetrance of about 50% and exhibits a variable expressivity. We attempted to characterize this phenotype in detail, but failed to identify any dramatic differences in cellularization of the src42A knockdown embryos compared to wild type. The localization of E-cadherin, in turn is not affected, but occasionally, nuclei are dropping out of the blastoderm before cellularization is accomplished. This can result in patches of irregular cellularization, but the blastoderm epithelium in stage 6 embryos did not display major defects in overall structure. We will present additional data on the cellularization phenotypes in the fully revised manuscript. As the referee suggested, we will analyze our data to determine potential effects on the cell size, cell number and overall organization of the blastoderm before germband extension. We plan to present these data as an additional Suppl. Mat. Figure in the full revision.

      Lower priority:

      5) Figure 8 - in my opinion, using a FRAP or photoconversion approach would be a more convincing demonstration of differences in E-cadherin residency times / turnover rate than time-lapse imaging of E-cadherin:GFP alone. Authors should decide whether this improvement is worth the investment.

      Response:

      We thank the reviewer for this comment. While we believe that the data presented in Fig. 8 demonstrates a significant difference in the E-cadherin residence time based on E-cadherin-GFP fluorescence intensity, we agree with the referee that FRAP analyses would provide additional evidence to support our conclusion. For the full revision, we will therefore attempt to perform FRAP-experiments on src42A knockdown embryos expressing E-cadherin-GFP and compare the recovery time to the wild type.

      Reviewer #1 (Significance (Required)):

      The manuscript by Backer et al. examines the function of Src42A in germband extension during Drosophila gastrulation. Prior studies in the field have shown that Src family kinases play an important role in the early embryo, including cellularization (Thomas and Wieschaus 2004), anterior midgut differentiation (Desprat et al. 2008), and germband extension (Sun et al. 2017; Tamada et al. 2021). In this study, the authors showed that Src42A was enriched at adherens junctions and was moderately enriched along junctions with myosin-II. They then showed that maternal Src42A depletion exhibits phenotypes, starting with cellularization and including a defect in germband extension. The authors focus on defects in germband extension and found that Src42A was required for timely rearrangement of junctions and that the Src42A RNAi phenotype is enhanced by Abl RNAi. Finally the authors show that E-cadherin turnover is affect by Src42A depletion.

      Overall, this study provided a higher resolution description of how Src42A regulates the behavior of junctions during germband extension. I thought the authors conclusions were well supported by the data and represent new insight in the field.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: Chandran et al. investigate the role of Src42A in axis elongation during Drosophila gastrulation. Using maternal RNAi and CRISPR/Cas9-induced germline mosaics, they revealed that Src42A is required to contract junctions at anterior/posterior cell interfaces during cell intercalations. Using time-lapse imaging and image analysis, they further revealed the role of Src42A in E-Cad dynamics at cell junctions during this process.

      By analyzing double knockdown embryos for Src42A and Abl, they further showed that Src42A might act in parallel to Abl kinase in regulating cell intercalations. The authors proposed that Src42A is involved in two processes, one affecting tension generated by myosin II and the other acting as a signaling factor at tricellular junctions in controlling E-Cad residence time. Overall, the data are clear and nicely quantified. However, some data do not convincingly support the conclusion, and statistical analyses are missing for an experiment or two. Methods for several quantifications also need improvement in writing. Also, several figures (Figures 6-8) do not match the citation in the text and need to be corrected.

      Page and line numbers were not indicated in the manuscript. For my comments, I numbered pages starting from the title page (Title, page 1; Abstract, page 2, Introduction, pages 3-6; Results, pages 7-14; Discussion, pages 15-18; M&M, 19-23; Figure legends, 28-30) and restarted line numbers for each page. For Figures 6-8 that do not match the citation in the text, I still managed to look at the potentially right panels. All the figure numbers I mention here are as cited in the text. My detailed comments are listed below.

      Response:

      We apologize for the lack of organization of the manuscript and the figure numbering. In the revised version we have added page numbers, line numbers and we corrected the figure numbers.

      Major comments: 1. b-Cat/E-Cad signals at the D/V and A/P junctions in Src42Ai (Figs. 5-6). These data are critical for their major conclusion and should be demonstrated more convincingly.

      In Fig. 5A, the authors said, "When the AP border was cut, the detached tAJs moved slower in Src42Ai embryos compared to control (Fig. 5A)". However, even control tAJs do not seem to move that much in the top panels, and I found the images not very convincing.

      Response:

      We thank the referee for commenting on the lack of clarity in the presentation of the data. The overall movement within the first 10 seconds after the laser cut (determined by movement of adjacent D/V tAJs from each other) was about 2 µm in the wildtype, while in the mutant it was 1 µm. Despite this 50% difference, it may be difficult to appreciate this difference from looking at Fig. 5A in our original submission. The yellow lines in Fig 5A only showed the region of the cut, but did not indicate the movement of the tAJ from each other, which may have led to a distraction from the actual movement. We will change the annotation and the marks within the figure to visualize the movement much more clearly in the full revision. In the fully revised manuscript, we will also add movies from the experiments including marks of the tricellular junctions to follow the displacement as part of the Supplemental Material.

      Based on the genetic interaction between Src42A and Abl using RNAi (Fig. 7), the authors argue that Src42A and Abl may act in parallel. However, the efficiency of Abl RNAi has not been tested. It can be done by RT-PCR or Abl antibody staining. Also, the effect of Abl RNAi alone on germband extension should be tested and compared with Src42A & Abl double RNAi embryos. I expect the experiments can be done within a few weeks without difficulty.

      Response:

      We agree with the referee that it is important to determine the level of depletion in Abl RNAi embryos in order to interpret the genetic relationship between Abl and Src42A. In the full revision of the manuscript, we will follow the advice of the referee and analyze the knockdown, preferably by antibody labeling with an anti-Abl antibody. We will also generate single knockdowns of abl in embryos and determine their effect on germband extension compared to wildtype and src42/abl double knockdown.

      Minor comments:

      Fig. 2 - Fig. 2B: Higher magnification images of the defective cytoplasm can be shown as insets.

      Response:

      We will add some higher magnification images of the cellularization phenotype in the full revision of the manuscript. In addition, as mentioned in the response to reviewer #1, we will provide a more detailed analysis of the cellularization in src42Ai embryos in the fully revised manuscript.

      • Fig. 2E: A simple quantification of the penetrance of cuticle defects in Src42A mutants and RNAi will be helpful, as shown in Fig. S3.

      Response:

      In the full revision, we will add the quantification of the occurrence of the different classes of cuticle phenotypes.

      Fig. 9 - Fig. 9A: Magnified views of the cytoplasmic clearing can be added as insets.

      Response: As described in our response to the comments made by referee #1, we will add a more detailed analysis of the cellularization phenotype in the full revision.

      Page 14, lines 9-10: More explicit description of the phenotype rather than just "stronger compared to Src42Ai" will be helpful.

      Response:

      In the full revision, we will add a more detailed description of the phenotype and re-analyze and present data on the hatching rate, stage of lethality and cuticle phenotypes.

      Reviewer #2 (Significance (Required)): This work revealed the role of Src42A in regulating germband extension. A previous study suggested the roles of Src42A and Src64 in this developmental process using a partial loss of both proteins (Tamada et al., 2021). Using different approaches, the authors demonstrated a role of Src42A in regulating E-Cad dynamic at cell junctions during Drosophila axis elongation. Most of the analyses were done with maternal knockdown using RNAi, but they successfully generated germline clones for the first time and confirmed the RNAi phenotypes. Overall, this work contains important and exciting novel findings. This work will be of general interest to cell and developmental biologists, particularly researchers studying epithelial morphogenesis and junctional dynamics. I have expertise in Drosophila genetics, epithelial morphogenesis, imaging, and quantitative image analysis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Chandran et al. report on the function of Src42A during cell intercalation in the early Drosophila gastrula. They create a Src42A-specific antibody (there are two Src genes in the fly genome) and examine the localization of Src42A and observe a planar-polarized distribution at cell interfaces. They then measure cell-contractile dynamics and show that T1 contraction is slower after Src42A disruption. The authors then argue that Src42A functions in a parallel pathway to the Abl protein, and that E-cadherin dynamics (turnover) is altered in Src42A disrupted embryos. Src function at these stages has been studied previously (though not to the degree that this study does), and in some respects the manuscript feels a little preliminary (please label figures with figure number!), but after editing this should be a polished study that merits publication in a developmentally-focused journal.

      1) Does the argument that Src42A has two functions fully make sense? Myosin II function is known to affect E-cadherin stability (and vice versa), so it seems that Src42A could affect both MyoII and Ecad by either decreasing Myosin II function/engagement at junctions or by destabilizing Ecad.

      Response:

      We thank the referee for raising an important point that we may not have discussed appropriately in our initial submission. We agree that the reciprocal relationship between actomyosin and E-cadherin might not be reflected equivocally in our manuscript. As the referee points out, Src42A could affect both MyoII planar localization and E-cadherin dynamics through the same pathway. Previous studies showed that Src is involved in translating the planar polarized distribution of the Toll-2 receptor by recruiting Pi3-Kinase activity to the Toll-2 receptor complex resulting in planar polarized distribution of MyoII at the A/P interfaces. These data, however do not address the possibility that a well-known Src target, the E-cadherin/ß-Catenin complex, which is extensively remodeled in germband extension contributes to the delay in germband extension. The observed defects in both studies can be attributed to both a defect in abnormal planar polarization of MyoII and the abnormal dynamics of the E-cadherin/ß-catenin complex. In either of these cases, we suggest that Src42A phosphorylates distinct substrates, the Toll-2 intracellular domain in the MyoII planar polarity pathway and the E-cad/ß-Cat complex controlling E-cad dynamics. Given the relationship between MyoII and E-cadherin, however, it is not possible to decide whether these two effects are independent functions of Src42A or are consequences of each other. Since we cannot resolve a possible epistatic relationship between these potential two activities of Src42A, we decided to extend the discussion on this topic by taking both possible scenarios into account and discussing them appropriately. We will add this discussion in the full revision of the manuscript.

      ) One obvious question that arises is the nature of cleavage defects that are mentioned that happen previously to intercalation. For example, is E-cad normal prior to intercalation initiating? How specific are the observed defects to GBE?

      Response:

      please see response to referee #1

      3) Pg. 10, "the shrinking junction along the AP axis strongly reduces its length with an average of 1.25 minute" - what is this measurement? How much is "strongly"?

      Response:

      We thank the referee for pointing out our inappropriate qualitative statement of the experimental data, which was indeed misleading. The measurement of the shrinking junction was based upon the time it takes for the AP interface junction between two adjacent vertices on the DV axis to shrink into a single 4-cell vertex. The time for this contraction was on average 1 minute 25 seconds. The data in Fig.4 A’,C show that after 2 minutes in the control embryo 100% of the observed AP junctions have collapsed and the extension of the new DV junction along AP axis has begun. At the same timepoint of 2 minutes in the src42A knockdown, we show in Fig. 4B’,D that the shrinking of the AP junction interface has still not been completed in 60% of the cases.

      In the full revision, we will remove the qualitative statement and replace it with a correct description of the measurements taken and will refer to the data described in Fig. 4 A-D.

      4) Also pg. 10, "the AP junction was not markedly reduced after 1 minute" - what is the criteria for this statement? X%? 1 minute is very specific, it feels like how much of a reduction/non-reduction should also be specific.

      Response:

      please see response to point 3.

      Reviewer #3 (Significance (Required)):

      This study gives a more detailed perspective on how Src proteins (Src42A in Drosophila) control epithelial stability and the contraction of specific surfaces of epithelial cells.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #2 and #3 noted that the manuscript was somewhat unorganized with regard to lacking the numbering of pages, lines and figures. We also noted that in the submission process the figures were not presented in the correct order. In the preliminary revision of the manuscript, we fixed these problems to facilitate the evaluation of our transferred manuscript by editorial boards.

      In addition, we also addressed issues that the referees mentioned by editing the text according to their comments. We also addressed problems regarding the presentation of the figures and statistical analyses of the data. The following changes were made:

      1. We added page numbers and line numbers.
      2. We added figure numbers to the figure panels.
      3. We corrected ordering of figures in the transferred manuscript.
      4. We addressed the following comments by statistical analyses, editing the text and the figures:

        Regarding comments from Reviewer #1:

      Highest Priority:

      2) There is a discrepancy in the staging of embryos used between some of the analyses, which make it hard to interpret some of the data. For example, characterization of the knockdowns in Fig. 1A and B are based on stages 10 and 15, whereas the majority of the paper is focused on earlier stages 6 - 8 during germband extension (e.g., Fig. 1D). The analysis for Fig. 1B would be more meaningful if it was done on the same stages used for subsequent phenotypic analysis so they can be directly compared.

      Response:

      We thank the referee for pointing out an apparent misunderstanding caused by the description of Fig. 1A,B. The data presented in Fig.1A and 1B do not show RNAi knockdown experiments, but show a comparison between embryos that are heterozygous or homozygous for the loss-of-function allele src42A26-1. These data were intended to demonstrate that zygotic mutants still maintain levels of maternal Src42A protein up until late stages of development. Data for embryos at an earlier stage (stage 5) were shown in the Supplementary Fig. S1E, where no difference in protein levels of Src42A can be observed between heterozygous and homozygous zygotic src42A26-1 embryos.

      At the beginning of the results sections 1 and 2 of the preliminary revised manuscript, we added a sentence to address the referee’s concern that earlier stages exhibit no difference in protein levels and will refer to Fig. S1E. We also more explicitly spelled that out that the experiment (referring to Fig.1A,B and S1) was intended to look at zygotic mutants and to demonstrate that our novel Src42A antibody was able to detect the reduction of maternal Src42A protein in mid- to late-stage homozygous zygotic embryos.

      3) There is incongruence between figures in terms of which junctional pools (bAJs vs. tAJs) of beta-catenin and E-cadherin are quantified that makes it difficult to draw comparisons between analyses. For example, pTyr levels are examined for both bAJs and tAJs in Figure 3, however, only tAJs are considered in Fig. 8. Similarly, in some cases planar cell polarity is considered (e.g., comparison of levels at AP vs DV bAJs in Fig. 6 and 9), and in other cases (e.g. Fig. 8) it is not.

      Response:

      We thank the referee for commenting on the different readouts for different pools of cell junctions in our experiments. In our study we considered effects on src42A on both, bAJs and tAJs by RNAi knockdown of src42A. We decided to present the data for bAJ and tAJ in separate figures for clarity and structure. For example, the data for the effect of src42A knockdown on the planar polarized distribution on bAJs of E-cadherin were presented in Fig.6, while the effect on E-cadherin residence time in tAJs were presented in Fig.8. The analysis pTyr levels considered both pools in order to determine whether src42A knockdown leads to an overall reduction of pTyr levels or to a reduction in a specific junctional pool. From our data we conclude that pTyr levels show a similar reduction in both, the bAJ and the tAJ junctions.

      In order to address the reviewer’s comment, we have linked the figures more stringently with the results text of the preliminary revision. We only referred to the reduction in PTyr levels in Fig. 3 to point out that both junctional pools are affected by reduced PTyr in src42i embryos. Furthermore, we referred to the individual figure panels when addressing junctional pools and explain the rationale to focus on particular pools (bAJs or tAJ) in the experiments in detail. For Fig. 6 we point out in the preliminary revised manuscript that we focus the analyses on the known planar polarized distribution of beta-catenin and E-Cadherin.

      Lower priority: 1) Introduction, 2nd paragraph - The modes of cell behaviors described to drive cell intercalation leaves out another clear example in the literature - Sun et al., 2017 - which describes a basolateral cell protrusion-based mechanism. While the authors cite this paper later, leaving it out when summarizing the state of the field misrepresents the current knowledge of the range of mechanisms responsible.

      Response:

      We thank the referee for this remark. In the preliminary revision, we have added to the introduction that the cell behaviors associated with germband elongation include apical and basolateral rearrangements of the cells indicating that basolateral protrusions also contribute to the set of mechanisms that drive germ band elongation.

      2) 'defective cytoplasm' - this term is confusing, and could perhaps be replaced with 'cellularization defect', or something similar.

      Response:

      We agree that the term we applied for the cellularization defect may be misleading. The observation, we intended to describe with the term was a defect in the cytoplasmic clearing which occurs in the last syncytial division and the beginning of the cell formation process. We changed the description of this observation according now refer to the defect in the preliminary revised manuscript as ‘cytoplasmic clearing defect’.

      3) Tests of statistical significance are not uniformly applied across the figures. For instance, Figures 3G + H indicate statistical significance, but Fig. 3D + E do not. Performing statistical tests throughout the paper, or clearly articulating a rationale when they are not used, would strengthen the manuscript. Specifically, the authors should consider this for Fig. 3D + E, and Fig. 7D + E, to support their arguments that rates of germband extension are different between conditions.

      Response:

      We agree with the reviewer and have provided statistical analysis for the data displayed in Fig. 3D,E and Fig. 7D,E in the preliminary revision of the manuscript.

      4) Page 12 - "We found that Src42A showed a distinct localization at the tAJs (Fig. 1B)": Figure 1B shows a quantification of levels at bAJs, not tAJs.

      Response:

      In the preliminary version of the revised manuscript, we added a quantification of the localization of Src42A at the tAJs as a part of Suppl Fig. S4. In Fig. S4A-C we show that Src42A is enriched in comparison to the bAJs.

      Regarding comments from reviewer #2:

      Major Comments:

      In Fig. 6A, b-Cat signals look fuzzier and dispersed and have more background signals in the control, compared to the Src42Ai background. Also, b-Cat signals in the control image do not seem to show enrichment at the D/V border, as shown in Tamada et al., 2012.

      Response:

      We agree with the referee that the image in Fig. 6A for the control is fuzzier and looks dispersed. This is due to the fixation method that we used. In this experiment we did not apply heat fixation, but used formaldehyde fixation in which b-catenin protein, in addition to the junctional pool, is also maintained in the cytoplasm creating the fuzzy cytoplasmic staining. We chose to do this in order to be able to co-immunolabel the embryos with b-catenin and E-cadherin antibodies; the latter staining is not working with the heat fixation applied in the Tamada et al. 2012 paper. Despite the slightly lower quality of the staining, the quantification of the data clearly indicated an effect of src42A knockdown on the planar polarized distribution of E-cad/b-cat complex does show an enrichment. In the preliminary revision added a note to the figure legend to indicate the fact that the fixation procedure was not optimized for b-catenin junctional staining. In the preliminary revision we also added a quantification of live imaging data recording E-cadherin-GFP in wild-type and src42Ai embryos. We present these additional data in Fig. S5 in the preliminary revision of the manuscript. These data are consistent with the results in Fig. 6 from the immunolabeling and support our conclusion that E-cadherin AP/DV ratio is increased in Src42A knockdown embryos.

      In Fig. 6B, C, it is not clear how the intensity was measured and how normalization was done. Was the same method used for these quantifications as "Protein levels at bicellular and tricellular AJs" on pages 21-22? Methods should be written more explicitly with enough details.

      Response:

      We thank the referee for pointing out the lack of detail in explaining how the quantification was done. In the preliminary revision of the manuscript, we extended a paragraph entitled ‘Protein levels at bicellular and tricellular junctions’ in the methods section that will serve this purpose and describe the methods that were applied for each quantification and the method as to how the data were normalized.

      Does each sample (experimental repeat) for the D/V border in Fig. 6B match the one right below for the A/P border in Fig. 6C? It should be clearly mentioned in the figure legend. The ratio of the DV intensity to AP intensity will better show the compromised planar polarity of the b-Cat/E-Cad complex.

      Response:

      We thank the reviewer for pointing out a lack of clarity in our presentation. The experimental repeats for each measurement do indeed match, i.e. the measurement of the DV border matches the same adjacent 4-cell pair in the same embryo and in total 5 distinct embryos were analyzed for each experiment. In the preliminary revision of the manuscript, we explain this detail of the experimental design in the figure legend. In the preliminary revision, we also determined the ratios of DV/AP cell interfaces for b-Cat and E-Cad and added this quantification as panel 6C and 6E for a clearer presentation of the data.

      Minor notes: Page 4, missing comma after "For example"

      Response: The text was edited accordingly.

      Page 4, "inevitable" does not make sense in this context

      Response: We eliminated ‘inevitable’ and replaced it with ‘critical’ to better indicate the importance of Canoe protein for germband elongation.

      Page 7, lines 6-7 - The localization of Src42A in control should be described in more detail and more clearly here.

      Response: In the preliminary revised manuscript, we extended our description of the distribution of Src42A in more detail pointing out its dynamics and differential distribution at distinct plasma membrane domains.

      Supplemental Fig S1 - Fig. S1D: Based on the head structure and the segmental grooves, the embryo shown here is close to late stage 13/early stage 14, not stage 15. - Fig S1E: It will be helpful if the predicted protein band and non-specific bands are indicated by arrows/arrowheads in the figure.

      Response:

      We thank the referee for their careful observation of the embryonic stage. We agree that the embryo was actually a younger stage. In the preliminary revision, we replaced the images with an example of an older stage. We will also add clear annotations as arrows to clearly mark the specific protein bands in Fig. S1E.

      Page 7, lines 21-22 - "Src42A was slightly enriched at the AP interface" - To argue that, quantification should be provided.

      Response:

      We thank the referee for pointing out a qualitative statement that we made with regard to the distribution of Src42A at the AP cell interfaces. In the preliminary revision of the manuscript, we present an additional quantification of the imaging data of Src42A immunolabeling. In Figure S4A-C, we now present a quantification of the enrichment of Src42A at the tricellular junctions. In addition, the new Fig. S4D,E shows a quantification of the planar polarized distribution of Src42A at the AP cell interfaces.

      Figure 1 - Fig. 1B: Src42A levels should be compared between control (Src42A/+) and Src42A/Src42A for each stage. It currently shows a comparison between Src42A/Src42A of stages 10 and 15.

      Response:

      We thank the referee for the comment. As indicated in our response to referee #1, the point of this analysis was to (1) provide evidence for the specificity of our new anti-Src42A antibody and (2) to demonstrate the presence of substantial material contribution of Src42A protein in zygotic mutant. We do not see the advantage to provide a detailed developmental Western-blot analysis, but we provide data in Suppl. Mat Fig S1E showing that the level of Src42A is unimpaired in stage 6 zygotic src42A[26-1] homozygous mutant embryos.

      • Fig. 1B: The figure legend says, "dotted line represents mean value and error bars," but there are no dotted lines shown in the figure. Also, what p-value is for ****? It should be mentioned in the figure legend. It also says Src42A levels were normalized against E-Cad intensity here (stages 10 and 15). They have shown that E-Cad levels are affected in Src42A RNAi during gastrulation (Fig. 6). Is E-Cad not affected in Src42A26-1 zygotic mutants at stages 10 and 15?

      Response:

      We thank the referee for pointing out inaccuracies in the presentation and the description of Fig.1B. In the preliminary revision, we emphasized the marks on the graph and provide p-values throughout. Regarding the E-Cadherin levels: E-cadherin levels were altered in src42A RNAi knockdown embryos, but not in zygotic mutants, even at later developmental stages.

      Page 8, line 14 - "Embryos expressing TRiP04138 showed reduced hatching rates with variable penetrance and expressivity depending on the maternal Gal4 driver used (Fig. 2B)" - Fig. 2B doesn't seem to be a right citation for this sentence.

      Response:

      We agree with the referee and in the preliminary revised manuscript we corrected the reference to the conclusion drawn from Figure 2A’, which does show the relationship of hatching rate to the various maternal Gal4 drivers.

      • Fig. 2C: It will be helpful to indicate two other non-specific bands in the figure with arrows/arrowheads with a description in the figure legend.

      Response:

      In the preliminary revision, we added an arrow to mark the band specific for Src42A and asterisks to mark unspecific bands in Fig 2C.

      Page 9, line 9 - This is the first time that the fast and the slow phases of germband extension are mentioned. As these two phases are used to compare the Src42A and Src42A Abl double RNAi phenotypes, they should be introduced and explained better earlier, perhaps in Introduction.

      Response:

      We thank the referee for pointing out that the two phases of germband extension were not introduced. We added a sentence to introduce and define the distinct phases of extension movements in the preliminary revision.

      Fig. 3 - Fig. 3A: It will be helpful to mark the starting and the ending points of germband elongation with different markers (arrows vs. arrowheads or filled vs. empty arrowheads).

      Response:

      In the preliminary revision, we added distinct markers to indicate the start and endpoints of germband elongation to make this figure easier to read.

      • Fig. 3C figure legend: R2 is wrongly mentioned in Fig. 3D, E. Also, R2 (coefficient of determination) needs to be defined either in the figure legend or Materials & Methods.

      Response:

      We thank the referee for pointing this misleading reference to us. In the preliminary revision we corrected the reference to R2 in Fig,3D,E and will describe the definition of R2 in the figure legend.

      • Fig. 3D, E: statistical analysis is missing.

      Response:

      In the preliminary revision, we included a statistical analysis of the data (see ref #1). We changed the figure to indicate the data sets that were analyzed and added the p-values to the figure legend.

      • Fig. 3G and H should be cited in the text.

      Response:

      In the preliminary revision, we added references to Fig 3G,H in the result section to the annotation of Fig.3F).

      • Fig. 3F: It should be mentioned that the heat map is shown for pY20 signals in the figure legend, with an intensity scale bar in the figure.

      Response:

      In the preliminary revision, we added an intensity scale bar to the figure panel and mentioned the relationship to the PY20 signal.

      Fig. 7A: Arrows can be added to mark the delayed germband extension.

      Response:

      In the preliminary revision, we added arrows to mark the anterior and posterior extent of the germband.

      Fig. 8A: It should be mentioned that the heat map is shown for E-Cad signals in the figure legend, with an intensity scale bar in the figure.

      Response:

      In the preliminary revision, we added an intensity scale to the heat map and mention the relationship to the E-cadherin signal in the figure legend.

      Fig. S3G: An arrowhead can be added to the gel image to indicate the band described in the legend.

      Response:

      In the preliminary revision, we added an arrow to help annotating the Src42A-specific bands on the Western blot.

      • Fig. 9B: Arrow/arrowheads can be added to show the absence of the signals in the nurse cells.

      Response:

      In the preliminary revision, we added markers to help recognizing the reduced signal in the nurse cells and the oocyte.

      • Fig. 9C: Indicate the ending point of the germband extension by arrows.

      Response: In the preliminary revision, we added arrows to mark the anterior and posterior extent of the germband.

      Regarding comments from reviewer #3:

      Minor notes: Page 4, missing comma after "For example"

      Response: The text was edited accordingly.

      Page 4, "inevitable" does not make sense in this context Response:

      In the preliminary revision, we eliminated ‘inevitable’ and replaced it with ‘critical’ to better indicate the importance of Canoe protein for germband elongation.

      Description of analyses that authors prefer not to carry out

      Referee #1 point2 and Referee#2 minor comment figure 1. Both referees suggest that figure 1 AB should include earlier developmental stages according to the stages looked at in the RNAi knockdown experiment.

      Response:

      The referees’ comments are likely based on a misunderstanding. The data that the reviewer are referring to present analyses of the zygotic phenotype of embryos homozygous for the src42A26-1 loss of function allele. They are not related to the maternal RNAi knockdown experiments, but were meant to demonstrate the existence and extent of a maternal pool of Src42A protein, that persists even to late stages in development. The maternal knockdown mutants are analyzed in detail at the appropriate stages in Fig. 2.

      As described in our response above, we don’t feel that a detailed developmental stage Western analysis of wildtype and src42A26-1 embryos would provide significant additional insights. As mentioned in our response above, data for an earlier developmental stage (before germband elongation, as requested by the referees, were provided in Suppl. Fig. S1E.

      Referee #1 Point 6) Figure 8E - showing images of multiple tAJs, rather than z-slices of a single vertex, would better support the claim here, as the assertion is that Src42a levels are different between control and sdk RNAi conditions, and not that it varies in the z-dimension.

      Response:

      The image series of Fig. 8E shows one representative example of multiple tAJs that have been imaged for this experiment (n=6 for wild type and n=10 for sdk RNAi). We think that the presentation of Z-slices for this experiment is important as the protein distribution needs to be considered for a larger area along the apical-lateral cell interface. In addition the quantification of the data for multiple tAJs was presented in Fig. 8F,G as a graph. We would therefore rather not change this figure in the revised manuscript.

      Referee #3 suggests that anti MyoII staining should accompany the analysis of tension measurements in the germband.

      As this analysis has already been performed by Tamada et al. 2021, we decided not to reproduce these data, but rather extend the analysis towards tension measurements, which support the findings by Tamada et al. 2021 on a functional level. We do not see the added value of adding MyoII labeling.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01016

      Corresponding author(s): Dennis Klug

      1. General Statements [optional]

      Dear editor, dear reviewers,

      thank you very much for the quick review of our manuscript as well as for the constructive criticism and the interesting discussion of our results. Reading the comments, we realized that we may have put too much emphasis on the in vivo microscopy of sporozoites and their interaction with the salivary gland. We believe that the generated mosquito lines can be used to address different scientific questions, the in vivo microscopy of host-pathogen interactions being only one of them. Because of this imbalance, and to address some of the reviewers' comments, we have partially rewritten the manuscript (particularly the introduction). At the same time, we have implemented additional data on the inducibility of the promoters used, as well as on the functionality of hGrx1-roGFP2 in the salivary glands. Furthermore, we created an additional figure to better present the expression patterns of trio and saglin promoters within the median lobe, and we expanded the section on in vivo microscopy of sporozoites. We hope that these results further highlight the significance of our study. Accordingly, we have also changed the title of the manuscript to „A toolbox of engineered mosquito lines to study salivary gland biology and malaria transmission” to indicate the broad applicability of the generated mosquito lines and we have included an additional co-author, Raquel Mela-Lopez, who conducted the redox analysis. We hope that these changes will adequately answer the questions of the reviewers and address any concerns they may have had. We look forward to hearing from you.

      With our kind regards,

      Dennis Klug

      Katharina Arnold

      Raquel Mela-Lopez

      Eric Marois

      Stéphanie Blandin

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      This manuscript reports the generation and characterization of transgenic lines in the African malaria mosquito Anopheles coluzzii that express fluorescent proteins in the salivary glands, and their potential use for in vivo imaging of Plasmodium sporozoites. The authors tested three salivary gland-specific promoters from the genes encoding anopheline antiplatelet protein (AAPP), the triple functional domain protein (TRIO) and saglin (SAG), to drive expression of DsRed and roGFP2 fluorescent reporters. The authors also generated a SAG knockout line where SAG open reading frame was replaced by GFP. The reporter expression pattern revealed lobe-specific activity of the promoters within the salivary glands, restricted either to the distal lobes (aapp) or the middle lobe (trio and sag). One of the lines, expressing hGrx1-roGFP2 under control of aapp promoter, displayed abnormal morphology of the salivary glands, while other lines looked normal. The data show that expression of fluorescent reporters does not impair Plasmodium berghei development in the mosquito, with oocyst densities and salivary gland sporozoite numbers not different from wild type mosquitoes. Salivary gland reporter lines were crossed with a pigmentation deficient yellow(-) mosquito line to provide proof of concept of in vivo imaging of GFP-expressing P. berghei sporozoites in live infected mosquitoes.

      **Major comments**

      Overall the manuscript is very well written with a clear narrative. The data are very well presented. The generation of the transgenic mosquito lines is elegant and state-of-the art, and the new reporter lines are thoroughly characterized.

      This is a nice piece of work that is suitable for publication, although the in vivo imaging of sporozoites is somewhat preliminary and would benefit from additional experiments to increase the study impact.

      We would like to thank the reviewer for his/her appreciation of our manuscript. In the revised version, we have included additional experiments on in vivo imaging of sporozoites, which allowed us to quantify moving and non-moving sporozoites imaged under the cuticle of live mosquitoes. Although this is still a proof of concept, we believe that these new data provide novel interesting data and will better illustrate potential applications.

      The reporter mosquito lines express fluorescent salivary gland lobes, yet the authors only provide imaging of parasites outside the glands. It would be relevant to provide images of the parasite inside the fluorescent glands.

      We have now included images showing sporozoites inside the salivary glands in vivo in Figure 8C and discuss possible ways to further improve resolution and efficiency of the imaging procedure in lines 563-586.

      The advantage of the pigmentation-deficient line over simple reporter lines is not clear, essentially due to the background GFP fluorescent in figure 5C. Imaging of GFP-expressing parasites should be performed in mosquitoes after excision of the GFP cassette under control of the 3xP3 promoter. This would probably allow to document the value of the reporter lines more convincingly.

      Indeed, by incorporating two Lox sites in the transgenesis cassette, we designed the yellow(-)KI line to permit removal of the fluorescent cassette and completely exclude expression of the transgenesis reporter EGFP. Still, EGFP expression in the yellow(-)KI adults is restricted to the eye and ovary, as we show now in Figure 7 supplement 1D. In contrast, no EGFP fluorescence was observed in the thorax area (Figure 7 supplement 1D). Therefore, we believe that the benefit of removing the fluorescence cassette for this study is limited. Moreover, the generation of such a line would take at least 3-4 months before experiments could be performed. Nevertheless, we agree with the reviewer that removal of the fluorescence cassette would be instrumental for follow-up studies. To draw the reader's attention to this issue, we now discuss background fluorescence in lines 378-387.

      Along the same line, it is unclear if the DsRed spillover signal in the GFP channel is inherent to the high expression level or to a non-optimal microscope setting. This is a limitation for the use of the reporter lines to image GFP-expressing parasites.

      We have discussed this problem with the head of the imaging platform at our institute, and we believe that it is not a problem that occurs due to incorrect settings. Rather, it seems to be due to the significant expression differences of the two fluorescence reporters used. We agree with the reviewer that this is a limitation and discuss the problem now in lines 416-412 and 565-567.

      The authors should fully exploit the SAG(-) line, which is knockout for saglin and provides a unique opportunity to determine the role of this protein during invasion of the salivary glands. This would considerably augment the impact of the study. In this regard, line 131 and Fig S3E: why is there persistence of a PCR band for non-excised in the sag(-)EX DNA?

      We definitely share the reviewer's enthusiasm about saglin and its role in parasite development in mosquitoes. We have thoroughly characterized the phenotype of sag(-) lines with respect to fitness and Plasmodium infection. These results are described in a spearate manuscript currently in peer review and available as a preprint on bioRxiv (https://doi.org/10.1101/2022.04.25.489337). Furthermore, in the revised manuscript, we have included additional data on the transcriptional activity of the saglin promoter with respect to the onset of expression and blood meal inducibility (Figure 2). In addition, we have included a completely new Figure 3 to highlight the spatial differences in transcriptional activity of the saglin promoter compared with the trio promoter. These new data are commented in lines 206-276.

      There might be a misunderstanding in the interpretation of the genotyping PCR. The PCR shown in Figure 1 – figure supplement 3, displays PCR products for different genomic DNAs (sag(-)EX, sag(-)KI and wild type) using the same primer pair. „Excised“ refers to sag(-)EX while „non excised“ refers to sag(-)KI and „control“ to wild type. Primers were chosen in a way to yield a PCR product as long as the transgene has integrated, only the shift in size between „excised“ and „non excised“ indicates the loss of the 3xP3-lox fragment. We have now changed the labeling of the respective gel in Figure 1 – figure supplement 3 to make this clearer.

      Did the authors search for alternative integration of the construct to explain the trioDsRed variability?

      We validated trio-DsRed cassette insertion in the X1 locus by PCR. The only way to rule out an additional integration of the transgene would be whole genome sequencing, which we did not perform. Still, we believe that the observed expression patterns are due to locus-specific effects of the X1 locus. Indeed, several lines of evidence point in this direction: (1) transgenesis was realized using the phage Φ31 integrase that promotes site-specific integration (attP is 38bp long and very unlikely to occur as such in the mosquito genome) and for which we never detected insertion in other sites in the genome for other constructs inserted in X1 and other docking lines; (2) additional unlinked insertions would have been easily detected during the first backcrosses to WT mosquitoes we perform in order to isolate the transgenic line and homozygotise it; (3) we have often observed variegated expression patterns for other transgenes located in the X1 locus in the past, leading us to believe that this locus is subjected to variegation influencing the expression of the inserted promoters. Usually, the variation we observe is simpler (e.g. strong and weak expression of the fluorescent reporter placed under the control of the 3xP3 promoter in the same tissues where it is normally expressed), but some promoters are more sensitive to nearby genomic environment than others, which we believe is the case for trio. Finally, should there be additional insertions of the transgenesis cassette in the genome, they should all be linked to the X1 locus as we would otherwise have detected them in the first crosses as mentioned above, which is unlikely. Thus, although very unlikely, we cannot exclude a single additional and linked insertion possibly explaining the high/low DsRed patterns, but variegation would still be required to explain other patterns. We have mentioned this alternative explanation in the manuscript in lines 522-524.

      Line 254-255. Does the abnormal morphology of SG from aapp-hGrx1-roGFP2 result in reduced sporozoite transmission?

      This is an interesting question. For future experiments, it could indeed be important to test if the transmission of sporozoites by the generated salivary gland reporter lines is not impaired. However, the quantification of the number of sporozoites in aapp-hGrx1-roGFP2 expressing salivary glands did not reveal any significant differences from the wild type (Figure 5 – figure supplement 1B) and would definitely be sufficient to infect mice. As we have no evidence for reduced invasion of sporozoites in the salivary glands of aapp-hGrx1-roGFP2 and of the DsRed reporter lines, no good reason to believe that the expression of fluorescent proteins would interfere with parasite transmission, and as we produced these lines as tools to follow sporozoite interaction with salivary glands, we have not performed transmission experiments.

      Of note, we have now included images of highly infected salivary glands of all reporter lines in Figure 5 – figure supplement 1D to confirm that expression of the respective fluorescence reporter does not interfere with sporozoite invasion. Also we have not observed that sporozoites do not invade salivary gland areas displaying high levels of hGrx1-roGFP2.

      **Minor comments**

      -Line 51: sporogony rather than schizogony

      Schizogony was replaced with sporogony.

      -Line 56: sporozoites are not really deformable as they keep their shape during motility

      This sentence was removed.

      -In the result section, it is not clearly explained where constructs were integrated.

      We have now included the sentence „...with an attP site on chromosome 2L...“ (line 173) and the respective reference (PMID: 25869647) to give more information about the integration site.

      Line 106 and 434-435: for the non-expert reader, it is not clear what X1 refers to, strain or locus for integration?

      X1 refers to both, the locus and the docking line. We have rephrased the beginning of the result section (previously line 106) to give more information about the integration site as mentioned above.

      -Line 112-115: the rational of integrating GFP instead of SAG is not clearly explained here, but become clearer in the discussion (line

      We have slightly rephrased the sentence to better explain the reasoning for this procedure (lines 182-184).

      -Line 140: FigS2A instead of S3A

      This mistake was corrected in the revised manuscript.

      -Perhaps mention that GFP reporters (SG) might be useful to image RFP-expressing parasites.

      We have now included an image of the aapp-hGrx1-roGFP2 line infected with a mCherry expressing P. berghei strain in Fig. 7D.

      -Line 236: the authors cannot exclude integration of an additional copy (as mentioned in the discussion line 367-368).

      As discussed above, we removed „..as a single copy...“ and introduced the possibility of an additional integration linked to X1 (lines 522-524).

      -Line 257-258. The title of this section should be modified as SG invasion was not captured.

      The title was rephrased. It reads now „Salivary gland reporter lines as a tool to investigate sporozoite interactions with salivary glands” (line 356-357).

      -Line 287: remove "considerable number" since there is no quantification.

      This was removed. In addition, we included new data in this section of the manuscript and rephrased the results accordingly (lines 406-427).

      -Line 400-402: Klug and Frischknecht have shown that motility precedes egress from oocysts (PMID 28115054), so the statement should be modified.

      Thank you for this suggestion. The passage was modified accordingly.

      -Line 404: remove "significant number" since there is no quantification.

      This section was rephrased and the phrase "significant number" was removed (lines 406-427).

      -Line 497: typo "transgenesis"

      The typo was correct in the revised manuscript.

      -FigS1: add sag-DsRed in the title

      Thank you for spotting this inconsistency, we corrected this mistake (line 1134).

      -Stats: Mann Whitney is adequate for analysis in fig 2C but not 2B, where ANOVA should be used (more than 2 groups).

      We have performed now an one-way-ANOVA test and adapted figure and figure legend accordingly.

      Reviewer #1 (Significance (Required)):

      This work describes a technical advance that will mainly benefit researchers interested in vector-Plasmodium interactions. Invasion of salivary glands by Plasmodium sporozoites is an essential step for transmission of the malaria parasite, yet remains poorly understood as it is not easily accessible to experimentation. The development of transgenic mosquitoes expressing fluorescent salivary glands and with decreased pigmentation provides novel tools to allow for the first time in vivo imaging in live mosquitos of the interactions between sporozoites and salivary glands.

      Reviewer's expertise: malaria, Plasmodium berghei, genetic manipulation, host-parasite interactions

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The first achievements of the Klug et al. study are the (i) genetical engineering of the Anopheles coluzzii mosquitoes reared in insectarium, that stably express distinct fluorescent reporters (DsRed and hGrx1-roGFP2 and EGFP) under the putative "promoters" of genes reported to encode proteins expressed differentially in the pluri-lobal salivary glands(Sg) of anthropophilic blood-feeding adult females, (ii) the analysis of the promoter activity - based on the selected fluorescent reporter - with a primary focus on the salivary gland/Sg (including at the Sg lobe level) of the adult female but also considering the preimaginal developmental time with larvae and pupa samples. Of note, some data confirm the already reported time-dependent and blood meal-dependent promoter activity for the related Anopheles species. The last part presents preliminary dataset on live imaging of Plasmodium berghei sporozoites with the aim of highlighting the usefulness of these A. coluzzii transgenic

      lines to better understand how the rodent Plasmodium sporozoites first colonize and then settle as packed cells in Sg acinar host cells.

      **Major comments**

      The two first objectives presented by the authors have been convincingly achieved with (i) the challenging production of four different lines expressing different single or double reporters chosen by the authors (and appropriately presented in the result text and figure sections), (ii) the careful analysis of the spatiotemporal expression of the DsRed reporter under two "promoters" studied and with regards to the blood feeding event parameter. However, if the reason why the authors have put so much effort in the production of their transgenic mosquitoes is (and as mentioned) to provide a significant improved setting enabling the behavioral analysis of sporozoites upon colonization and survival in the Sg, it seems this part is kind of limited. Likely in relation with this perception is the fact I found the introductory section often confusing and not enough direct to the points: in particular distinguishing the rationale from the necessity to produce appropriate models, and clarifying what is/are the added value(s) offered by these new transgenic lines models when compared to what exist (in Anopheles stephensi) with specific evidence that argue for this knowledge gain. At this stage, it is unfortunately not clear to me, what is the bonus of imaging the Plasmodium fluorescent sporozoites in hosts with fluorescent salivary gland lobes if one can not monitor key events of the Sg-sporozoite interaction that were not reachable without the fluorescent mosquito lines. Furthermore, it should be better explained why the rodent Plasmodium species has been chosen rather Plasmodium falciparum (or other human species) for which A. coluzzii is a natural host; may be just mentioning that this study would serve as a proof of concept but bringing real biological insights would be fine.

      We would like to thank the reviewer for his/her evaluation of our manuscript, which has helped us clarify our manuscript on several points. Our goal here was a proof of concept demonstrating potential applications for the fluorescent salivary gland reporter lines and for the low pigmented yellow(-) line we generated. In vivo imaging of sporozoites in salivary glands is one possible application that we intended to use as proof-of-concept, but we tailored the manuscript too restrictively with this aim in mind and neglected other applications as well as characterization of the biology of salivary glands in general. To improve this, we have included further data on the blood inducibility of the promoters tested (Figure 2), the functionality of roGFP2 in the salivary glands (Figure 5), and the use of the generated lines in the examination and definition of expression patterns of salivary gland proteins in vivo (Figure 6). Accordingly, we have adjusted the entire manuscript to adequately describe all the results presented. We have also rephrased major parts of the abstract and the introduction to better describe the impact of salivary gland biology on the transmission of pathogens, and to explain the anatomy of salivary glands in more detail.

      We agree with the reviewer that it would be desirable to show direct salivary gland-sporozoite interactions in vivo. Still we believe that having mosquito lines expressing a fluorescent marker in the salivary gland as well as weakly pigmented mosquitoes are a first step to make this visualization possible, although we cannot provide a lot of quantitative data about this interaction yet.

      1- The three genes and gene products selected by the authors should definitively be more systematically explained, which means for example the authors need to introduce the different mosquito species and the parasite-mosquito host pairs they are then referring to for the promoter/encoded proteins of their interest. In the same vein, I did not find any information as to the choice of the mosquito species (A. Coluzzii) for the current work. I was curious to know what is the advantage since better knowledge was available with Anopheles stephensi with respect to (i) Saglin and its promotor activity, (ii) aap driven dsRed expression (lines already existing) and (iii) sporozoite-gland interaction.

      We have largely reworded the introduction to clarify the rationale for selecting these three promoters while providing a better understanding of salivary gland biology in general.

      The choice of the mosquito species depends, in our opinion, strongly on the perspective and on the experiments to be performed. We agree with the reviewer that the malaria mosquito A. stephensi is a widely used model, based on its robustness in breeding and its high susceptibility to P. berghei and P. falciparum infections. However, in these cases, both vector-parasite pairs are to some extend artificial. Indeed, although it is also a vector of P. falciparum in some regions, A. stephensi mostly transmits P. vivax that cannot be cultured in vitro. Thus research efforts on this vector-parasite pair is limited. Also, due to the emerging number of observed differences between Anopheles species and their susceptibility to Plasmodium infection and transmission, more research has recently been conducted on African mosquito species. This effect is also reinforced by the fact that P. falciparum, unlike all other Plasmodium species infecting humans, causes the most deaths, making control strategies for species from the A. gambiae complex such as A. coluzzii particularly important. As a result, the number of available genetic tools in A. coluzzi/gambiae has overpaced A. stephensi. These include mosquito lines with germline-specific expression of Cas9 for site-directed transgenesis, lines expressing Cre for lox-mediated recombination, and several docking lines. Such tools are, as far as we know, not available in A. stephensi and were essential in reaching our objectives. Docking lines are of particular interest because they allow reliable integration into a characterized locus, which is an advantage over random transposon-mediated integration. Random insertion sites have generally not been characterized in the past, which can cause problems since integrations regularly occur in coding sequences. Docking lines also enable comparison of different transgenes as they are all integrated in the same genetic environment, which does not ensure some expression variation as illustrated in our manuscript. For all these reasons, we have thus chosen to work with A. coluzzii.

      Concerning the use of the murine malaria parasite P. berghei instead of the human one P. falciparum, there are two reasons that motivated our choice. (1) For in vivo imaging of sporozoites, we needed a parasite line that is strongly fluorescent at this stage, and there is no such line existing for P. falciparum. Actually, there is no fluorescent P. falciparum line able to efficiently infect A. coluzzii reported thus far, as reporter genes have all been inserted in the Pfs47 locus that is required by P. falciparum for A. coluzzii colonization. (2) Imaging P. falciparum infected mosquitoes, especially with sporozoites in their salivary glands, requires to have access to a confocal microscope in a biosafety level 3 laboratory. Hence our objective here was indeed to provide a proof of principle of in vivo imaging of sporozoites in the vicinity or inside salivary glands using our engineered mosquitoes, and to provide a first analysis of this process using P. berghei as a model of infection. Nevertheless, we agree with the reviewer that the goal should be to work as close as possible to the human pathogen.

      Despite the wide range of topics that this study touches on, we want to try and keep the manuscript as concise as possible. Therefore, we have not discussed the advantages and disadvantages of the different vector-parasite pairs and ask the reviewer to indulge us in this.

      2- To help clarifying the added value of the present study, introducing the species names of the mosquito and the Plasmodium that serve as a model would be appreciated.

      We have included now the name of the used Plasmodium species in line 361. At this position we also give now more details about the transgene this line is carrying. We mention the used mosquito species A. coluzzii now at different positions in the manuscript (e.g. lines 52, 162 and 177).

      3- Since a focus is the salivary gland of the blood feeding female Anopheles sp., a rapid description of the glands with different lobes and subdomains the results and figure 1 nicely refer to, would help in the introduction.

      We explain now the anatomy of female and male mosquito salivary glands in the introduction (lines 119-123). The different lobes are now also indicated in the salivary gland images shown in several figures including Figure 1.

      4- That description could logically introduce the few proteins actually identified with lobe specific or cell domain specific expression (apical versus basal side, intracellular or surface expose, vacuole, duct...) profiles. The context with regards to sporozoite biology would then easily validate the "promoter choice". As a minor remark, I miss the reason why the authors wrote " the astonishing degree of order of the structures (referring to the packing of sporozoites within the Sg acinars) raise the question whether sporozoite can recognize each other". Please clarify since packing/accumulation can be passive due to cell mechanical constraints and explain what this point has to see with the question and experimental work proposed here?)

      We thank you for this suggestion. We have reworded key parts of the introduction to make the reasons for using the three selected promoters clearer. We also mention now other proteins expressed in the salivary glands which have been characterized in more detail because of their effect on blood homeostasis (e.g. anticoagulants) (lines 136-139).

      The mention of stack formation of salivary gland sporozoites served only to clarify that almost nothing is known about the behavior of sporozoites within the salivary glands in vivo to explain why new methods are needed to make these processes visible. We have now reworded this passage to make this clearer, and we also mention that stack formation could also occur due to mechanical constraints, as suggested by the reviewer (lines 101-102, 106-110).

      5- The selection of hGrx1-roGFP2 is quite interesting and justified but there is then no use of this reporter property in the preliminary characterization of the Sg and Sg-sporozoite interaction. Could the authors provide such characterization?

      We have now implemented data testing the functionality of hGrx1-roGFP2 in the salivary glands. We also show qualitatively that the redox state of glutathione does not change upon infection with P. berghei sporozoites (Figure 6). We now describe and discuss these new data in lines 337-354.

      6- Figure 1: it would be nice to add in the legend at what time the dissection/imaging has been made (age, blood feeding timing?). I would also omit the double mutant trio-Dsred/aapDsred in the main figure (may be supplemental) since the two single mutants Dsred separately together with the double mutant (with different fluorescence) already provide the information. I would suggest to regroup the phenotypic presentation of the transgenic line made in the KI mosquitoes (current figure 5) in the main figure 1.

      We have now added the missing information about the age of dissected mosquitoes and their feeding status in the legend of Figure 1. We also thank the reviewer for the suggestion to replace one image displaying aapp and trio promoter activity in trans-heterozygous mosquitoes with an image of the pigment deficient mutant yellow(-)KI. Still, due to the changes made to the manuscript based on the reviewers comments in general, we have now implemented new data highlighting the functionality of the generated salivary gland reporter lines investigating the redox state of glutathione as well as the expression pattern of the saglin and trio promoters at the single cell level (see Figure 3 and 6). Therefore it would no longer seem logical to introduce the yellow(-)KI mutant in Figure 1 while further data on this mutant are provided in the last two figures of the manuscript and discussed later in the manuscript (Figure 7 and 8). In addition we believe that co-expression of different transgenes (carrying fluorescent reporters) in the median and the distal lobes could potentially be interesting for certain applications. We believe that readers who might actually be interested in combining both transgenes in a cross would like to see the outcome to better evaluate the usefulness before experiments are planned and performed. This is especially true because localization as well as expression strength may differ between different fluorescence reporters while using the same promoter (e.g. the hGrx1-roGFP2 construct appears less bright and more localized to the apex of the distal-lateral lobes than dsRed, while expression of both reporters is driven by the aapp promoter in aapp-hGrx1-roGFP2 and aapp-DsRed, respectively).

      7- Figure 2:

      1. a) Is there anything known on the Sgs' size change overtime. It seems that between day 1 and 2 there is an increase of size and volume as much as I can evaluate the volume (Fig S4). Could that mean that there is increase in cell number in the lobes and therefore more cells expressing the transgene which would account for the signal intensity increase rather than more transcripts per cell? Thank you for this interesting question. The changes in the morphology of the salivary glands in Anopheles gambiae following eclosion have been studied in detail by Wells et al., 2017 (PMID: 28377572) which we cite now in the introduction (line 122-123). According to this reference, cell counts of the salivary gland are not changing upon emergence of the adult mosquito. However, we agree with the reviewer that the glands appear smaller and differ in morphology directly after eclosion. We noted that glands of freshly emerged females are more „fragile“ during dissections and lack secretory cavities, as reported by Wells et al., 2017. We believe that the increase in size occurs through the formation and filling of the secretory cavities which has been reported to take place within the first 4 days after emergence (Wells et al., 2017). This observation is in accordance with our observations that the promoters of the saliva proteins AAPP and Saglin display only weak activity after hatching, or, in the case of TRIO are not yet active directly after emergence. The timing of the formation of the secretory cavities is also in agreement with our time course experiment (Figure 2) which shows a strong increase in fluorescence intensity in dissected glands within the first 4 days after emergence.

      2. b) why choosing 24h after the blood meal to assess promoter activity in the Sgs? Do we have any information on how the blood meal impact on the Sgs'development. At this time anyway the sporozoites are far from being made. Yosshida and Watanabe 2006 mentioned at significant decrease of Sg proteins post-blood feeding. Could the authors detail their rationale based on what the questions they wish to address Thank you for this question. Unfortunately, the data available in the literature on this topic are very sparse, so we could only refer to few previous publications. The decision to quantify the fluorescence signals as early as 24 hours after blood feeding was based on Yoshida et al, Insect Mol. Biol, 2006, PMID: 16907827. The authors of this study generated the first salivary gland reporter line in A. stephensi by using the aapp promoter sequence to drive DsRed expression, and showed by qRT-PCR that DsRed transcripts increase 1-2 days after blood feeding compared to controls. Consistent with this observation and because we were concerned that putative changes in protein levels would only be visible for a short period of time, we began quantification one day after feeding. Since we observed significant changes in fluorescence intensity for the aapp-DsRed and sag(-)KI lines 24 hours after blood feeding, we retained the experimental setup and did not change it further. Nevertheless, we agree with the reviewer that different time points could help determine how long the effect lasts, and whether trio expression might also be regulated by blood feeding, but at a later time point. Still, our main objective here was to validate that the ectopic expression of DsRed driven by the aapp promoter in the aapp-DsRed line was indeed induced upon blood feeding as previously reported (PMID: 16907827). This experiment allowed us to confirm the inducibility of aapp in a different way and to show for the first time that saglin, but not trio, is induced one day after blood feeding. Our transgenic lines could be used for follow-up studies investigating the inducibility of salivary gland-specific promoters by different stimuli, or after infection with Plasmodium sporozoites. For example, for trio, transcription has been shown to increase after infection of the salivary gland by Plasmodium (PMID: 29649443).

      8- Figure 3: The figure is quite informative in terms of subcellular localization. Concerning the section "Natural variation of DsRed expression in trio-DsRed mosquitoes", I think it could be shortened because because it is a bit out of the focus the study.

      We agree with the reviewer that this part of the manuscript sticks a bit out and is not perfectly in line with the remaining results because it doesn’t deal with the salivary gland. Still, we would like to emphasise that in this work, we particularly want to show possible applications of the generated mosquito lines to address unanswered questions in host-parasite interactions and salivary gland biology. As a result, this manuscript establishes potentially important tools. For this reason, we feel it is important to mention the natural variation in DsRed expression, as this natural variation can have a significant impact on crossing schemes (especially with lines inheriting other DsRed-marked transgenes) and experiments (e.g. visualizing DsRed expression by western blot in larval and pupal stages). Furthermore, it is important for the use of the line to show that the transgene is inserted only once, at the expected location, which we try to emphasize with figure 4 – figure supplement 1 and figure 4 – figure supplement 2.

      We would also like to note that transgenesis in Anopheles is a relatively young field of research and altered expression patterns of ectopically used promoters have rarely been described so far, although this could have major implications e.g. in the case of gene drives. Therefore, we hope that the data shown will bring this previously neglected observation more into focus and highlight the importance of accurate characterization of generated transgenic mosquito lines.

      9- In contrast the last section of live imaging of P. berghei sporozoites in the vicinity and within salivary gland should be expanded. The 2 sentences summarizing the data are quite frustrating "We also observed single sporozoites moving actively through tissues in a back and forth gliding manner (Fig. 6B, Movie 3) or making contact with the salivary gland although no invasion event could be monitored"

      We have now implemented new data and extended Figure 8 showing the results of the in vivo imaging in a qualitative manner. We have rephrased the result and discussion section accordingly.

      10- I am aware of the technical difficulties to perform live imaging of sporozoite on whole mosquitoes, even when the salivary gland lobe under observation is closely apposed to the cuticle but that seems to be the final aim of the authors. I looked very carefully to the three movies and I am sorry but at this stage I could not make meaningful analysis out of them, and could not agree with the conclusions: for instances, the authors specify that sporozoites were undergoing back and forth movements (movie 3) but I do not see that and do not see the Sg contours in the available movies? The authors should also add bar and time scales to their movies. Having an in-depth description with regards to the sub-domain marked by a relevant reporter would strengthen the study, even if images are not collected in the whole mosquito to get higher resolution.

      We thank the reviewer for this comment. We have to admit that parasite imaging in fluorescent salivary glands in vivo is an ambitious goal given the complex biological system we are working with. We believe that the system presented in our manuscript is a first and important step to enable the analysis of the interaction of sporozoites with salivary glands, although in-depth analysis will require further optimization and considerable time, especially to generate quantitative data. Therefore, we now downstate the significance of our results in this respect and changed the title accordingly. Still, we also provide a more detailed analysis of the data we have already collected (Figure 8 and lines 406-427). Because we focus on the analysis of sporozoites in the thorax area in the revised manuscript, the outlines of the salivary gland are not necessarily visible in the images.

      I am not sure I understand the relevance of this quite condensed sentence in the text. Could the authors rephrase and expand if they wish to keep the issues they refer to. "The sporozoites' distinctive cell polarization and crescent shape, in combination with high motility, allows them to „drill" through tissues". I would stress more on the main unknown in terms of sporozoite-Sg interactions and the need to get right models for applying informative approaches (i.e. here, imaging).

      We thank you for this suggestion. The sentence mentioned has been removed in its entirety. We have also adjusted the text accordingly and reworded most of the introduction to make the narrative clearer (lines 91-119).

      Of note, it could help to point that the "Sgs is a niche in which the sporozoites which egress from the oocyst could mature and be fully competent when co-deposited with the saliva into the dermis of their intermediary hosts"

      We have now implemented a similar sentence in the introduction (lines 93-98).

      Reviewer #2 (Significance (Required)):

      1- Clear technical significance with the challenging molecular genetics achieved in the mosquito A. coluzzii.

      2- More limited biological significance: fair analysis and gain of knowledge of spatio-temporal of reporter expression under the selected promoter but limited significance of the final goal analysis which concerns the Plasmodium sporozoite biology once egressed from oocysts

      As stated above, we changed the title to place the focus on the engineered mosquito lines.

      3- Previous reports cited by the authors have used the DsRed reporter and the aap promoter in another Anopheles (i.e. A. stephensi, Yoshida and Watanabe, Insect Mol Biol, 2006; Wells and Andrew, 2019) which is also a natural host and vector for human Plasmodium spp.) with significantly more resolutive 3D visualization of GFP-fluorescent P. berghei but in dissected salivary glands and not in whole mosquitoes. The Wells and Andrew publication entitled "Salivary gland cellular architecture in the Asian malaria vector mosquito Anopheles stephensi" in Parasite Vectors, 2015 would deserve to be reference and described.

      Thank you very much for this suggestion. We considered citing Wells and Andrews (PMID: 26627194). However, this reference focuses very specifically on the subcellular localization of AAPP and shows only highly magnified sections of immunostained dissected and fixed salivary glands. Working only with the AAPP promoter, we felt it important to refer to the previously observed expression pattern along the entire salivary gland, as shown in Yoshida and Watanabe (PMID: 16907827). Nevertheless, we have cited two other publications by Wells and Andrews (PMID: 31387905 and 28377572) at various points in the manuscript.

      4- Audience: I would say that this work should be of interest of mostly scientists investigating Plasmodium biology (basic and field research) or in entomology of Diptera.

      5- To describe my fields of expertise, I can refer to my extensive initial training in entomology including at one point in the genetic basis of mosquito-virus interaction. I have also been working for more than 20 years in the field of Apicomplexa biology (Plasmodium and Toxoplasma) and I have long-standing interest in live and static high-resolution imaging.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Klug et al. generated salivary gland reporter lines in the African malaria mosquito Anopheles coluzzii using salivary gland-specific promoters of three genes. Lobe-specific reporter activity from these promoters was observed within the salivary glands, restricted either to the distal lobes or the medial lobe. They characterized localization, expression strength and onset of expression in four mosquito lines. They also investigated the possibility of influences of the expressed fluorescent reporters on infection with Plasmodium berghei and salivary gland morphology. Using crosses with a pigmentation deficient mosquito line, they demonstrated that their salivary gland reporter lines represent a valuable tool to study the process of salivary gland colonization by Plasmodium parasites in live mosquitoes. SG positioning close to the cuticle in 20% of females in this strain is another key finding of this study.

      The key findings from this study are largely quite convincing. The authors have created a suite of SG reporter strains using modern genetic techniques that aid in vivo imaging of Plasmodium sporozoites.

      Vesicular staining within salivary acinar cells should be stated as "vesicle-like" staining unless a co-stain experiment in fixed SGs is conducted using antisera against the marker protein(s) and antisera against a known vesicular marker (e.g. Rab11). It may also be possible to achieve this in vivo using perfusion of a lipid dye (e.g. Nile Red), but this is not necessary. As is, in Fig. 3A, there are images in which it appears that the vesicle-like staining is located both within acinar cells' cytoplasm and in the secretory cavities (e.g. Fig. 3A: aapp-DsRed bottom and middle), and this is fine, but should be more inclusively stated. Fixed staining of the reporter strain SGs would allow for clarification of this point. In previous work, other groups have observed vesicle-like structures in both locations (e.g. PMID: 33305876).

      Thank you very much for this suggestion. Indeed, when we observed the vesicle-like localization, we had similar ideas and considered investigating the identity of the observed particles in more detail. Ultimately, however, we concluded that the localization of DsRed does not play a critical role in the use of the lines as such and believe that a more detailed investigation of the trafficking of the fluorescent protein DsRed is beyond the scope of this study.

      We have thus followed the suggestion of the reviewer and now use the phrase „vesicle-like“ throughout the manuscript. In addition, we extended the discussion on the different localizations observed and presented some explanations that might have led to this observation. We also included a new reference that investigated the localization of AAPP using immunofluorescence (PMID: 28377572).

      Morphological variation is extensive among individual mosquito SGs, thought to impact infectivity, and well documented in the literature. The manuscript should be edited to make it much clearer (e.g. n = ?) exactly how many SGs, especially in microscopy experiments, were imaged before a "representative" image was selected from each data point and in any additional experiment types where this information is not already presented. Figure S8 is an example where this was done well. Figure 3A-B is an example where this was not well done. All substantial variation (e.g. "we detected a strangulation..." - line 189) across individual SGs within a data point should be noted in the Results. Because of the genetics and labor involved, acceptable sample sizes for minor conclusions may be small (5-10), but should be larger for major conclusions when possible.

      Thank you for this comment. We have improved this point by specifying precisely the number of samples and of repetitions in the respective figure legends. For example, we have now quantified the proportion of moving sporozoites and report both the number of sporozoites evaluated and the number of microscopy sessions required (see Figure 8).

      Thank you for this comment. We have improved this point by specifying precisely the number of samples and of repetitions in the respective figure legends. For example, we have now quantified the proportion of moving sporozoites and report both the number of sporozoites evaluated and the number of microscopy sessions required (see Figure 8). Regarding Figure 3, fluorescence expression and localization in salivary gland reporter lines was actually very uniform in each line. We added the following sentence in the legend of revised figures 3 and 5: “Between 54 and 71 images were acquired for each line in ≥3 independent preparation and imaging sessions. Representative images presented here were all acquired in the same session”.

      Sporozoite number within SGs has been shown to be quite variable across the infection timeline, by mosquito species, by parasite strain, in the wild vs. in the lab, and according to additional study conditions. The authors mention that the levels they observed are consistent with their prior studies and experience, but they did not utilize the reporter strains and in vivo imaging to support these conclusions, instead relying on dissected glands and a cell counter. It is important for these researchers to attempt to leverage their in vivo imaging of SG sporozoites for direct quantification, likely using the "Analyze Particles" function in Fiji. The added time investment for this additional analysis would be around two weeks for one person experienced in the use of the imaging software.

      Thank you for this interesting suggestion. Indeed, it would be beneficial to use an imaging based approach to quantify the sporozoite load inside the salivary glands. We already used „watershed segmentation“ in combination with the „Analyze Particles“ function in Fiji on images of infected midguts to determine oocyst numbers. Still, we believe this analysis cannot be applied to images of infected salivary glands mainly because of differences in shape and location of the oocyst and sporozoite stages. Sporozoites inside salivary glands form dense, often multi-layered stacks. Because of this close proximity, watershedding cannot resolve them as single particles which could subsequently be counted. This creates an unnecessary error by counting accumulations of sporozoites as one, likely leading to an underestimation of actual parasite numbers. Furthermore, given that the proximity issue could be resolved e.g. by performing infections yielding lower sporozoite densities, another problem would be that infected salivary glands prepared for imaging are often slightly damaged leading to a leak of sporozoites from the gland into the surrounding. These leaked sporozoites are likely not included on images which would then be used for analysis, potentially leading again to an underestimation of counts. Since these issues are circumvented by the use of a cell counter, we believe that this method is still the method of choice in acquiring sporozoite numbers.

      Nevertheless, we can understand the reviewer's concern that counts performed with a hemocytometer do not reflect the variability in the sporozoite load of individual mosquitoes. To highlight that all generated reporter lines can have high sporozoite counts, we have now included images of highly infected salivary glands for each line in Figure 7D.

      This manuscript is presented thoughtfully and such that the data and methods could likely be well-replicated, if desired, by other researchers with similar expertise.

      The statistical analysis is appropriate for the experiments conducted. It is currently unclear if some experiments were adequately replicated. That information should be added to the paper throughout where it is missing.

      We do appreciate your comments on our efforts to give all required information for other laboratories to replicate our experiments. We have added the missing information about the number of independent experiments in the respective figure legends wherever appropriate.

      Studies from multiple groups should be more thoroughly referenced when the authors are describing the "vesicle-like" staining patterns observed in SGs from reporter strains (e.g. Fig. 3A). Is this similar to the SG vesicle-like structures observed previously (e.g. PMIDs: 28377572, 33305876, and others)?

      Thank you for this comment. We did not discuss this observation in detail in the first version of our manuscript because the observed localization was rather unexpected, as DsRed was not fused to the AAPP leader/signal peptide. The observed localization is therefore difficult to explain, however, we have expanded the discussion on this (lines 465-482) and now cite one of the proposed references (PMID: 28377572, lines 468-469).

      There are minor grammar issues in the manuscript text (e.g. "Up to date" should be "To date"). The figures are primarily presented very clearly and accurately. One minor suggestion: In cases such as Fig. S2A images 3 and 6, where some of the staining labels are very difficult to read, please move all labels for the figure to boxes located directly above the image.

      We are sorry for the grammatical errors we have missed in the first version of our manuscript. We have now performed a grammar check over the whole manuscript. We have also increased the font size of the captions in the above figures and tried to make them better readable by moving the captions over the images.

      The data and conclusions are presented well.

      Reviewer #3 (Significance (Required)):

      This report represents a significant technical advance (improved in vivo reporter strain and sporozoite imaging), and a minor conceptual advance (active sporozoite active motility), for the field.

      This work builds off of previous SG live imaging studies involving Plasmodium-infected mosquitoes (e.g. Sinnis lab, Frischneckt lab, etc.), addressing one of the major challenges from these studies (reliable in vivo imaging inside mosquito SGs).

      This work will appeal to a relatively small audience of vector biology researchers with an interest in SGs. Many in the field still see the SGs as intractable, instead choosing to focus on the midgut due to ease of manipulation. Perhaps work like this will spark new interest in tangential research areas.

      I have sufficient expertise to evaluate the entirety of this manuscript. Some descriptors of my perspective include: bioinformatics, SG molecular biology, mosquito salivary glands, microscopy, RNA interference, SG infection, and SG cell biology.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Klug et al generated transgenic mosquito lines expressing fluorescent reporters regulated by salivary gland specific promoters and characterized fluorescent reporter expression level over the time, subcellular localization of fluorescent reporters, and impact on P. berghei oocyst and salivary gland sporozoite generation. In addition, by crossing one of the lines (aapp-DsRed) with yellow(-) KI mosquitoes, they open up the possibility to perform in vivo visualization of salivary glands and sporozoites.

      Overall the generation and characterization of these transgenic lines is well-done and will be helpful to the field. However, there are several concerns with the in vivo imaging data shown in Figure 6, which does not convincingly show fluorescent sporozoites in the lobe or secretory cavity of a fluorescent salivary gland lobe. This needs to be addressed. Points related to this concern are outlined below:

      (1) Although the authors mention that the DsRed signal was strong enough to see with GFP channel, it would be more appropriate to show that the DsRed signal from salivary glands and GFP channel image co-localize.

      We now show a merge of the GFP and DsRed signal in Figure 7 – figure supplement 2 The yellow appearance of the salivary gland in the merge likely indicates the spillover of the DsRed signal into the GFP channel. In addition we discuss the issue in lines 416-412 and 565-567.

      (2) Mosquitoes were pre-sorted using the GFP fluorescence of the sporozoites on day 17-21. From figure 4B, median salivary gland sporozoite number was about 10,000 sporozoites/mosquito on day 17-18. However, in Figure 6A there are no sporozoites in the secretory cavities. They should be able to see sporozoites in the cavities at this time. Can the authors confirm that they can visualize sporozoites in secretory cavities in vivo and perhaps show a picture of this.

      This is entirely correct. We also examined mosquitoes for the presence of sporozoites in the salivary glands and wing joints prior to imaging, as shown in Figure 7B and Figure 7 – figure supplement 2A, to increase the probability that sporozoites could be observed. Nevertheless, the area of the salivary gland that comes to the surface is often small and limited to a few cells that can be imaged with good resolution. Unfortunately, these same cells were often not infected although other regions of the salivary glands must have been very well infected based on the previously observed GFP screening (Figure 7B). In addition, with the confocal microscope available to us, we struggled to achieve the necessary depth to image sporozoites in the cavities of the salivary gland cells. For this reason, we were often able to detect a strong GFP signal in the background, but not always to resolve the sporozoites sufficiently well. Still, we have now included an image showing sporozoites in salivary glands (Figure 8C). However, we believe that the method can be further improved to be more efficient and provide better resolution. We discuss possible ways to further improve the imaging in lines 563-586.

      (3) There is no mention of the number of experiments performed (reproducibility) and no quantification of the imaging data. In the results (line 287-288), the authors state that sporozoites are present in tissue close to the gland and sometimes perform active movement. How can this be? Do they believe these sporozoites are on route to entering? More relevant to this study would be a demonstration that they can see sporozoites in the secretory cavities of the salivary gland epithelial cells, this should shown. If they have already performed a number of experiments, I would suggest to do quantification of the number of sporozoites observed in defined regions . The mention that sporozoites are moving is confounded by the flow of hemolymph. How do they know that the sporozoites are motile versus being carried by the hemolymph. Perhaps it's premature to jump to sporozoite motility in the mosquito when they haven't even shown sporozoite presence in the salivary glands.

      Thank you very much for this comment. We have followed the suggestions of the reviewer and have now quantified the behavior of sporozoites in the thorax area of the mosquito. For the analysis, we only considered sporozoites that could be observed for at least 5 minutes. This analysis revealed that 26% of persistent sporozoites performed active movements, which in most cases resembled patch gliding previously described in vitro. We adjusted the results section accordingly. In addition, we have changed the figure legend to accurately indicate the number of experiments performed. Likewise, we now also provide an image of sporozoites that we assume are located in the salivary gland (Figure 8C). Although we have not yet been able to image and quantify vector-sporozoite interactions extensively (further improvements would be required, as mentioned previously), we believe these results illustrate the potential of the transgenic lines.

      (4) In vivo imaging has been performed with the mosquito' sideways. Was this the best orientation? Have you tried other orientations like from the front (Figure 5B orientation).

      It is true that in the abdominal view as shown in Figure 7B the fluorescence in the salivary glands is very well visible. This is mainly due to the fact that in this area the cuticle is almost transparent and therefore serves as a kind of "window". Nevertheless, the salivary glands are not close to the cuticle in this position, which makes good confocal imaging impossible. Imaging always worked best where the salivary gland was very close to the cuticle, and this was always laterally. However, there were differences in the position of the salivary glands in individual mosquitoes, which also led to slight differences in the imaging angle.

      Overall, the text is easy to follow and I have only few suggestions.

      Thank you for this comment.

      In the result section, the authors describe the DsRed expression during development of mosquito (line 194-236) after they describe subcellular localization of fluorescent reporters. I felt the flow was disrupted. Thus, this part (line 194-236) could summarize and move to line 135. In this way, the result section flow according to the main figures.

      Thank you very much for this suggestion. We have considered your idea, but based on the changes we have made in response to reviewer comments and new data implemented in the form of two new figures, we believe the current order in the results section is more appropriate. The rationale was primarily to first characterize the expression of fluorescent reporters in the salivary glands of all lines before going into more detail on expression in other tissues of a single line. We then finish with potential applications like in vivo imaging of sporozoite interactions with salivary glands.

      Also, and as mentioned previously (reviewer 2, point 8), we believe it is important to describe the variability of ectopic promoter expression at a given locus with sufficient details, as this has not been characterized thus far despite its importance.

      In the result section, text line 186-190, the authors describe the morphological alternation of salivary gland in aapp-hGrx1-roGFP2. I would suggest to mention that this observation was only in one of lateral lobe. (I saw that it was mentioned in the figure legend but not in the main text.)

      We believe there has been a misunderstanding. The morphological alteration in salivary glands expressing aapp-hGrx1-roGFP2 was observed in all distal-lateral lobes to varying degrees (quantification in Figure 6E). To include as many salivary glands as possible in the quantification and because in some images only one distal-lateral lobe was in focus, only the diameter of one lobe per salivary gland was measured and evaluated. We have now revised the legend to prevent further misunderstandings.

      In the discussion section, author discuss localization of fluorescent reporters (line 322-331). When I looked at aapp-DsRed localization pattern (Figure 3A), the pattern looked similar to the previous publication by Wells et al 2017 (https://www.nature.com/articles/s41598-017-00672-0). This publication used AAPP antibody and stain together with other markers (Figure 4-7). This publication could be worth referring in the discussion section.

      Thank you for this suggestion. According to the information available through Vectorbase, we did not fuse DsRed with any coding sequence of AAPP that could potentially encode a trafficking signal. Therefore, it is rather unlikely that the observed DsRed localization in our aapp-DsRed line and the localization observed by AAPP immunofluorescence staining in WT mosquitoes match. This is further exemplified by the cytoplasmic localization of hGrx1-roGFP2 in the aapp-hGrx1-roGFP2 line, where the reporter gene was cloned under the control of the same promoter. For this reason, we had not mentioned this reference in the first version of the manuscript. In the revised manuscript, we have included now the suggested reference (lines: 475-476) and extended the discussion on possible reasons which led to the observed localization pattern.

      In the text, authors describe salivary gland lobes as distal lobes and middle lobe. It would be more accurate to refer to the lobes as the lateral and medial lobes. The lateral lobes can then be sub-divided into proximal and distal portions. I would suggest to use distal lateral lobes, proximal lateral lobes and median lobe as other references use (Wells M.B and Andrew D.J, 2019).

      Thank you for this suggestion. We have corrected the nomenclature for the description of the salivary gland anatomy as suggested throughout the manuscript.

      Overall, the figures are easy to understand and I have following suggestions and questions.

      Figure 1C) It is hard to see WT salivary gland median lobe. If authors have better image, please replace it so that it would be easier to compare WT and transgenic lines.

      We have replaced the wild-type images of salivary glands in this figure and labeled the median and distal-lateral lobes accordingly (see Figure 1).

      Figure 2) While it was interesting to observe the significant expression differences between day 3 and day 4, have you checked if this expression maintained over time or declines or increases (especially on day 17-21 when author perform in vivo imaging)?

      Thank you for this interesting question. We have not quantified fluorescence intensities in mosquitoes of higher age. Nevertheless, we regularly observed spillover of DsRed signaling to the GFP channel during sporozoite imaging, suggesting that expression levels, at least in aapp-DsRed expressing mosquitoes, remain high even in mosquitoes >20 days of age (see Figure 8A). We also confirmed this observation by dissecting salivary glands from old mosquitoes, whose distal lateral lobes always showed a strong pink coloration even in normal transmission light (data not shown).

      Figure 3A) There is no description of "Nuc" in figure legend. If "nuc" refers to nucleus, have you stained with nucleus staining dye (example, DAPI)?

      Thank you for spotting this missing information in the legend. Initial images shown in this figure were not stained with a nuclear dye. To test whether the observed GFP expression pattern really colocalizes with DNA, we performed further experiments in which salivary glands from both aapp-hGrx1-roGFP2 and sag(-)KI mosquitoes were stained with Hoechst. We have now included these new data in Figure 3 - figure supplement 1. It appears that GFP is concentrated around the nuclei of the acinar cells, which makes the nuclei clearly visible even without DNA staining.

      Figure 4B) The number of biological replicates in the figure and the legend do not match (In the figure, there are 3-5 data points and, in the legend, text says 3 biological replicates.)

      Thank you for spotting this inconsistency. The number of biological replicates refers to the number of mosquito generations used for experiments. The difference is due to the fact that sometimes two experiments were performed with the same generation of mosquitoes using two different infected mice. We have clarified the legend accordingly to avoid misunderstandings.

      Figure 4C) The number of data points from (B) is 5. However, in (C) only 4 data points are presented.

      We have corrected this mistake. In the previous version, the results of two technical replicates were inadvertently plotted separately in (B) instead of the mean.

      Figure 5) I would suggest to have thorax image of P. berghei infected mosquito to show both salivary glands and parasites.

      Thank you for this suggestion. Images in Figure 7B (previously Figure 5) were replaced with an infected specimen to show salivary glands (DsRed) and sporozoites (GFP) together.

      Reviewer #4 (Significance (Required)):

      The transgenic lines that authors created have potential for in vivo imaging of salivary gland and sporozoite interactions. Since the aapp and trio lines have distinct fluorescence expression, they could help elucidate why sporozoites are more likely to invade distal lateral lobes compare to median lobe.

      My areas of expertise are confocal microscope imaging, mosquito salivary gland and Plasmodium infection and sporozoite motility.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The first achievements of the Klug et al. study are the (i) genetical engineering of the Anopheles coluzzii mosquitoes reared in insectarium, that stably express distinct fluorescent reporters (DsRed and hGrx1-roGFP2 and EGFP) under the putative "promoters" of genes reported to encode proteins expressed differentially in the pluri-lobal salivary glands(Sg) of anthropophilic blood-feeding adult females, (ii) the analysis of the promoter activity - based on the selected fluorescent reporter - with a primary focus on the salivary gland/Sg (including at the Sg lobe level) of the adult female but also considering the preimaginal developmental time with larvae and pupa samples. Of note, some data confirm the already reported time-dependent and blood meal-dependent promoter activity for the related Anopheles species. The last part presents preliminary dataset on live imaging of Plasmodium berghei sporozoites with the aim of highlighting the usefulness of these A. coluzzii transgenic lines to better understand how the rodent Plasmodium sporozoites first colonize and then settle as packed cells in Sg acinar host cells.

      Major comments

      The two first objectives presented by the authors have been convincingly achieved with (i) the challenging production of four different lines expressing different single or double reporters chosen by the authors (and appropriately presented in the result text and figure sections), (ii) the careful analysis of the spatiotemporal expression of the DsRed reporter under two "promoters" studied and with regards to the blood feeding event parameter. However, if the reason why the authors have put so much effort in the production of their transgenic mosquitoes is (and as mentioned) to provide a significant improved setting enabling the behavioral analysis of sporozoites upon colonization and survival in the Sg, it seems this part is kind of limited. Likely in relation with this perception is the fact I found the introductory section often confusing and not enough direct to the points: in particular distinguishing the rationale from the necessity to produce appropriate models, and clarifying what is/are the added value(s) offered by these new transgenic lines models when compared to what exist (in Anopheles stephensi) with specific evidence that argue for this knowledge gain. At this stage, it is unfortunately not clear to me, what is the bonus of imaging the Plasmodium fluorescent sporozoites in hosts with fluorescent salivary gland lobes if one can not monitor key events of the Sg-sporozoite interaction that were not reachable without the fluorescent mosquito lines. Furthermore, it should be better explained why the rodent Plasmodium species has been chosen rather Plasmodium falciparum (or other human species) for which A. coluzzii is a natural host; may be just mentioning that this study would serve as a proof of concept but bringing real biological insights would be fine.

      1- The three genes and gene products selected by the authors should definitively be more systematically explained, which means for example the authors need to introduce the different mosquito species and the parasite-mosquito host pairs they are then referring to for the promoter/encoded proteins of their interest. In the same vein, I did not find any information as to the choice of the mosquito specie (A. Coluzzii) for the current work. I was curious to know what is the advantage since better knowledge was available with Anopheles stephensi with respect to (i) Saglin and its promotor activity, (ii) aap driven dsRed expression (lines already existing) and (iii) sporozoite-gland interaction.

      2- To help clarifying the added value of the present study, introducing the species names of the mosquito and the Plasmodium that serve as a model would be appreciated.

      3- Since a focus is the salivary gland of the blood feeding female Anopheles sp., a rapid description of the glands with different lobes and subdomains the results and figure 1 nicely refer to, would help in the introduction.

      4- That description could logically introduce the few proteins actually identified with lobe specific or cell domain specific expression (apical versus basal side, intracellular or surface expose, vacuole, duct...) profiles. The context with regards to sporozoite biology would then easily validate the "promoter choice". As a minor remark, I miss the reason why the authors wrote " the astonishing degree of order of the structures (referring to the packing of sporozoites within the Sg acinars) raise the question whether sporozoite can recognize each other". Please clarify since packing/accumulation can be passive due to cell mechanical constraints and explain what this point has to see with the question and experimental work proposed here?)

      5- The selection of hGrx1-roGFP2 is quite interesting and justified but there is then no use of this reporter property in the preliminary characterization of the Sg and Sg-sporozoite interaction. Could the authors provide such characterization?

      6- Figure 1: it would be nice to add in the legend at what time the dissection/imaging has been made (age, blood feeding timing?). I would also omit the double mutant trio-Dsred/aapDsred in the main figure (may be supplemental) since the two single mutants Dsred separately together with the double mutant (with different fluorescence) already provide the information. I would suggest to regroup the phenotypic presentation of the transgenic line made in the KI mosquitoes (current figure 5) in the main figure 1.

      7- Figure 2:

      a) Is there anything known on the Sgs' size change overtime. It seems that between day 1 and 2 there is an increase of size and volume as much as I can evaluate the volume (Fig S4). Could that mean that there is increase in cell number in the lobes and therefore more cells expressing the transgene which would account for the signal intensity increase rather than more transcripts per cell?

      b) why choosing 24h after the blood meal to assess promoter activity in the Sgs? Do we have any information on how the blood meal impact on the Sgs'development. At this time anyway the sporozoites are far from being made. Yosshida and Watanabe 2006 mentioned at significant decrease of Sg proteins post-blood feeding. Could the authors detail their rationale based on what the questions they wish to address

      8- Figure 3: The figure is quite informative in terms of subcellular localization. Concerning the section "Natural variation of DsRed expression in trio-DsRed mosquitoes", I think it could be shortened because because it is a bit out of the focus the study.

      9- In contrast the last section of live imaging of P. berghei sporozoites in the vicinity and within salivary gland should be expanded. The 2 sentences summarizing the data are quite frustrating "We also observed single sporozoites moving actively through tissues in a back and forth gliding manner (Fig. 6B, Movie 3) or making contact with the salivary gland although no invasion event could be monitored"

      10- I am aware of the technical difficulties to perform live imaging of sporozoite on whole mosquitoes, even when the salivary gland lobe under observation is closely apposed to the cuticle but that seems to be the final aim of the authors. I looked very carefully to the three movies and I am sorry but at this stage I could not make meaningful analysis out of them, and could not agree with the conclusions: for instances, the authors specify that sporozoites were undergoing back and forth movements (movie 3) but I do not see that and do not see the Sg contours in the available movies? The authors should also add bar and time scales to their movies. Having an in-depth description with regards to the sub-domain marked by a relevant reporter would strengthen the study, even if images are not collected in the whole mosquito to get higher resolution.

      I am not sure I understand the relevance of this quite condensed sentence in the text. Could the authors rephrase and expand if they wish to keep the issues they refer to. "The sporozoites' distinctive cell polarization and crescent shape, in combination with high motility, allows them to „drill" through tissues". I would stress more on the main unknown in terms of sporozoite-Sg interactions and the need to get right models for applying informative approaches (i.e. here, imaging).

      Of note, it could help to point that the "Sgs is a niche in which the sporozoites which egress from the oocyst could mature and be fully competent when co-deposited with the saliva into the dermis of their intermediary hosts"

      Significance

      1- Clear technical significance with the challenging molecular genetics achieved in the mosquito A. coluzzii.

      2- More limited biological significance: fair analysis and gain of knowledge of spatio-temporal of reporter expression under the selected promoter but limited significance of the final goal analysis which concerns the Plasmodium sporozoite biology once egressed from oocysts

      3- Previous reports cited by the authors have used the DsRed reporter and the aap promoter in another Anopheles (i.e. A. stephensi, Yoshida and Watanabe, Insect Mol Biol, 2006; Wells and Andrew, 2019) which is also a natural host and vector for human Plasmodium spp.) with significantly more resolutive 3D visualization of GFP-fluorescent P. berghei but in dissected salivary glands and not in whole mosquitoes. The Wells and Andrew publication entitled "Salivary gland cellular architecture in the Asian malaria vector mosquito Anopheles stephensi" in Parasite Vectors, 2015 would deserve to be reference and described.

      4- Audience: I would say that this work should be of interest of mostly scientists investigating Plasmodium biology (basic and field research) or in entomology of Diptera.

      5- To describe my fields of expertise, I can refer to my extensive initial training in entomology including at one point in the genetic basis of mosquito-virus interaction. I have also been working for more than 20 years in the field of Apicomplexa biology (Plasmodium and Toxoplasma) and I have long-standing interest in live and static high-resolution imaging.

    1. anticipations is key to 01:08:38 everything and attention is key to everything so every organism does that plants and everything else and it doesn't require a central nervous system 01:08:51 and and you i might add to this that not only is every organism cognitive but essentially every organism organism is cooperative to those cooperation and cognition 01:09:03 go hand in hand because any intelligent organism any organism that can act to better its you know viability is going to cooperate in 01:09:17 meaningful ways with other organisms and you know other species and things like that nice point because um there's cost to communication whether it's exactly whether it's the cost of making the pheromone 01:09:30 or just the time which is super finite or attention fundamentally and so costly interactions through time the game theory are either to exploit and stabilize which is fragile 01:09:42 or to succeed together yeah exactly and and and succeeding together cooperation is is is like everywhere once you once you understand what you're looking 01:09:54 for it's in the biologic world it's like everywhere so this idea that we're you know one one one person against all or you know we're a dog eat dog universe i mean it's you 01:10:08 know in a certain sense it's true obviously tigers eat you know whatever they eat zebras or whatever i mean that happens yes of course but in the larger picture 01:10:19 over and over multiple time scales not just uh you know in five minutes but over evolutionary time scales and uh you know developmental time scales and everything the cooperation is really the rule 01:10:33 for the most part and if you need if any listener needs proof of that just think of who you think of your body i mean there's about a trillion some trillion some cells 01:10:45 that are enormously harmonious like your blood pumps every day or you know this is a this is like a miracle i don't want to use the word miracle because i want to get into 01:10:59 whatever that might imply but uh it is amazing aw inspiring the the depth of cooperation just in our own bodies is like that's that's like 01:11:12 evolution must prefer cooperation or else there would never be such a complex uh pattern of cooperation as we see just in one human body 01:11:26 just to give one example from the bees so from a species i study it's almost like a sparring type of cooperation because when it was discovered that there were some workers with developed ovaries 01:11:38 there was a whole story about cheating and policing and about altruism and this equation says this and that equation says that and then when you take a step back it's like the colony having a distribution of over-reactivation 01:11:51 may be more ecologically resilient so um i as an evolutionary biologist never think well my interpretation of what would be lovey-dovey in this system must be how it works because that's so 01:12:05 clearly not true it's just to say that there are interesting dynamics within and between levels and in the long run cooperation and stable cooperation and like learning to adapt 01:12:17 to your niche is a winning strategy in a way that locking down just isn't but unfortunately under high um stress and 01:12:29 uh high uncertainty conditions simple strategies can become rife so that's sort of a failure mode of the population

      The human, or ANY multicellular animal or plant body is a prime example of cooperation....billions of cells in cooperation with each other to regulate the body system.

      The body of any multi-cellular organism, whether flora or fauna is an example of exquisite cellular and microbial cooperation. A multi-cellular organism is itself a superorganism in this sense. And social organisms then constitute an additional layer of superorganismic behavior.

    1. Reviewer #2 (Public Review):

      The research paper presents a modeling approach aimed at disentangling mother's genetic effects on their offspring in two components: prenatal environment and postnatal environment. Specifically, the authors use SEM on adopted and non-adopted individuals from the UK Biobank and leverage the variation in genetic similarities from different family structures. Because the UK Biobank is not created as an adoption study, they build seven different family structures to include all possible family combinations that can provide information regarding the two parameters of interest: those representing prenatal and postnatal environment respectively. The model is used on two phenotypes (birthweight and education attainment) to illustrate it.

      The results indicate an 'expected pattern of maternal genetic effect on offspring birthweight' and 'unexpectedly large prenatal (intrauterine) maternal genetic effects on offspring education attainment. The authors mention this result can likely be explained by adopted offspring being raised by biological relatives. They then show simulations supporting this hypothesis.

      We praise the authors for the complex analyses executed and the work done to create the model and make the scripts available to the research community. The models can be a valuable addition to the behavior genetics literature and to researcher's toolkit. We do however have a few concerns regarding 1. the meaning of the results, 2. model building decisions and the choice of sample and 3. the way some limitations are addressed. We go into more details for each of these points.

      1. Interest to study mothers' genetic effects as acting via the prenatal environment or the postnatal environment and the meaning of the parameters tested by the model

      I think this is an interesting question and a useful distinction for a number of phenotypes and the authors use the adoption design in an innovative way to define and estimate parameters that correspond to this distinction. However, I would suggest that the expressions of prenatal environmental effect and postnatal environmental effect (as distinct pathways for mother's gene to be expressed) seem to be an overstatement.

      The definition of mother genetic effects (effects of mother genotype on their child phenotype, over and above any genetic transmission) is citing Wolf & Wade 2009 (line 56) which mention the more general notion of 'maternal effect' that are defined as effect of genotype, phenotype (or both) on their offspring. I would argue that postnatal maternal genetic effects (as currently defined in the paper) are likely environmental effect and not only 'genetic effects'.

      These environmental effects are indeed partly influenced by mother's genes, but also strongly affected by other variables such as culture, generation, SES, education. It is not possible to disentangle these effects in the design(s) used here.

      This consideration can affect the authors definition of the covariance between an adopted individual's genotype and phenotype as a function of prenatal (but not postnatal) maternal genetic effects (line 93-94). The authors current assumption does not consider the potential for environmental modulation of the effect of adopted mothers' genes (which are not zero for several phenotypes). Postnatal maternal genetic effects are thus also likely to capture and represent environmental differences.

      2. Model building decisions specific to the UK biobank

      One of the main issues is that the method is tested on a sample that is not built as an adoption design. This forced the authors to make decision to circumvent this problem and lead to important limitations that are not inherent to their method, but to the specific sample they applied it to.

      a. Having adoptive parents partly genetically related to the child is breaking the logic of the adopted design. Thus, it brings back the genetic confound (passive gene-environment correlation) problem of usual family-based design. In their case, it alters their ability to differentiate between prenatal and postnatal environment.

      b. In section starting on line 426, the authors have included simulations to show how this issue could be addressed. However, it does not help the fact that in their model applied to the UK biobank, the information regarding the degree of genetic similarity between adopting parents and biological parents and the child is unknown.

      c. To address this problem in their analyses of UK biobank, authors used (Lines 302 & 417) information regarding whether children were breastfed or not (on the basis that this knowledge would be more common if the child was raised by a biological family relative) to identify adopted singletons raised by biological relatives. However, this is, at best, a mediocre index of genetic relatedness. I can see other reasons for participants to have knowledge of if they have been breastfed: because they were adopted at an older age, because they are still (or have been) in contact with their biological mother. It is also possible, albeit rare, that adoptive parents may breastfeed a child via the use of drugs to stimulate milk production. Line 420: the fact that the prenatal maternal estimate became non-significant after removing participants that were breastfed do provide results more in-line with what would be expected. But we can't use expected results as a basis to evaluate the validity of the approach. The absence of GxE and rGE are two other strong assumptions of the model that could also produce this kind unexpected results.

      d. I would suggest discussing the issue of genetic relatedness between adopting parents and offspring in terms of passive rGE which is a common problem for the estimation of parental effects in every familial design.<br /> e. Line 291: why use an unweighted PRS for EY3 (Lee, 2018), while the usual way of computing PRS (as a weighted sum of risk alleles) was used for birthweight?

      3. Limitations<br /> Assess other limitations of their method.

      a. limitation of the availability of birth father information,

      b. prenatal events uncorrelated with birthmother's genes (disease or accidents),

      c. Inferring prenatal environment effect from higher birth mother correlation compared to birthfather is subject to bias from measurement differences between the two (Loehlin, 2016).

      d. age at which the child is adopted (if the child has been partly raised by birth parents before adoption, it would bias (raise) the estimates of prenatal effects).

      e. evocative rGE not mentioned. It has been shown that parents partly react to children's behaviors. Thus, the estimate of maternal genetic postnatal effects could be biased (lowered) by evocative gene-environment correlation. In other words, the model also assumes no evocative gene-environment correlation.

      Final thoughts:

      1. I would like a better case made for why it is important to distinguish genetic effects into prenatal and postnatal effect.

      2. I would suggest the author make a clear distinction between the limits inherent to their sample (UK biobank) from those inherent to their methodological approach. I see important usefulness is plague by limits inherent to the sample used. At the same time, I am not aware of the availability of a big enough sample of adopted children with genotypic information available to compute PRS.

    2. Author Response

      Reviewer #2 (Public Review):

      Summary

      The research paper presents a modeling approach aimed at disentangling mother's genetic effects on their offspring in two components: prenatal environment and postnatal environment. Specifically, the authors use SEM on adopted and non-adopted individuals from the UK Biobank and leverage the variation in genetic similarities from different family structures. Because the UK Biobank is not created as an adoption study, they build seven different family structures to include all possible family combinations that can provide information regarding the two parameters of interest: those representing prenatal and postnatal environment respectively. The model is used on two phenotypes (birthweight and education attainment) to illustrate it.

      The results indicate an 'expected pattern of maternal genetic effect on offspring birthweight' and 'unexpectedly large prenatal (intrauterine) maternal genetic effects on offspring education attainment. The authors mention this result can likely be explained by adopted offspring being raised by biological relatives. They then show simulations supporting this hypothesis.

      We praise the authors for the complex analyses executed and the work done to create the model and make the scripts available to the research community. The models can be a valuable addition to the behavior genetics literature and to researcher's toolkit. We do however have a few concerns regarding 1. the meaning of the results, 2. model building decisions and the choice of sample and 3. the way some limitations are addressed. We go into more details for each of these points.

      1) Interest to study mothers' genetic effects as acting via the prenatal environment or the postnatal environment and the meaning of the parameters tested by the model .

      I think this is an interesting question and a useful distinction for a number of phenotypes and the authors use the adoption design in an innovative way to define and estimate parameters that correspond to this distinction. However, I would suggest that the expressions of prenatal environmental effect and postnatal environmental effect (as distinct pathways for mother's gene to be expressed) seem to be an overstatement.

      The definition of mother genetic effects (effects of mother genotype on their child phenotype, over and above any genetic transmission) is citing Wolf & Wade 2009 (line 56) which mention the more general notion of 'maternal effect' that are defined as effect of genotype, phenotype (or both) on their offspring. I would argue that postnatal maternal genetic effects (as currently defined in the paper) are likely environmental effect and not only 'genetic effects'. These environmental effects are indeed partly influenced by mother's genes, but also strongly affected by other variables such as culture, generation, SES, education. It is not possible to disentangle these effects in the design(s) used here.

      Although we have referred to the maternal effects estimated in our manuscript as “prenatal maternal genetic effects” and “postnatal maternal genetic effects”- all of these effects on the offspring are mediated through maternal phenotypes (which as the reviewer correctly notes, will be influenced by both genes and the environment). In other words, the maternal PRS used in our study proxies some maternal phenotype/s that then forms part of the offspring’s prenatal and/or postnatal environment which then affects the offspring’s phenotype. We have referred to these effects as maternal genetic effects rather than just maternal effects to emphasize the causal link with the maternal genotype and the fact that we are only proxying that part of the maternal phenotype that is explained by the relevant genetic variation (NB. This is consistent with the Wolf & Wade 2009 definition of maternal effects i.e. “…the causal influence of maternal genotypes on offspring phenotypes…”). We agree with the reviewer that our model is not attempting to disentangle proportions of variance due to genetic and environmental factors (which is not its purpose).

      This consideration can affect the authors definition of the covariance between an adopted individual's genotype and phenotype as a function of prenatal (but not postnatal) maternal genetic effects (line 93-94). The authors current assumption does not consider the potential for environmental modulation of the effect of adopted mothers' genes (which are not zero for several phenotypes). Postnatal maternal genetic effects are thus also likely to capture and represent environmental differences.

      Assuming that adopted offspring are not biologically related to their adoptive mothers, then adopted individuals’ PRS should not be correlated with adoptive mothers’ PRS. The corollary is that adoptive mothers’ PRS should not influence the covariance between adopted individuals’ PRS and phenotype (i.e. regardless of whether there is environmental modulation of the effect of adopted mothers’ genes on offspring phenotype). It is true, however, that we do not consider genotype by environment interaction effects in our model, and that this is a limitation of our model. We allude to this important point several times in the Discussion:

      “Those assumptions explicitly encoded in Figure 1 include that the total maternal genetic effect can be decomposed into the sum of prenatal and postnatal components, that genetic effects are homogenous across biological and adoptive families, the absence of genotype x environment interaction…”

      And

      “In contrast, in our design it is more important that genetic effect sizes are homogenous across adopted and non-adopted individuals (i.e. no genotype by environment interaction)…”.

      At the request of the reviewer, we now include additional discussion of GxE and other assumptions of our model in further detail in Supplementary File 17.

      2) Model building decisions specific to the UK biobank. One of the main issues is that the method is tested on a sample that is not built as an adoption design. This forced the authors to make decision to circumvent this problem and lead to important limitations that are not inherent to their method, but to the specific sample they applied it to.

      a) Having adoptive parents partly genetically related to the child is breaking the logic of the adopted design. Thus, it brings back the genetic confound (passive gene-environment correlation) problem of usual family-based design. In their case, it alters their ability to differentiate between prenatal and postnatal environment.

      We agree that the UK Biobank was never designed for this purpose, and that data from it regarding adoption is less than perfect. Nevertheless, we think that an important conclusion of our paper is that large-scale biobanks (which because of their size) contain many hundreds/thousands of adopted individuals can be used to partition maternal genetic effects into prenatal and postnatal components, provided good quality data on the adoption process has been gathered and/or genetic information on their adoptive parents.

      To help address the reviewer’s concerns we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, whether they are specific to the UK Biobank dataset or intrinsic to our method, their consequences on model parameters, and possible options for addressing them.

      b) In section starting on line 426, the authors have included simulations to show how this issue could be addressed. However, it does not help the fact that in their model applied to the UK biobank, the information regarding the degree of genetic similarity between adopting parents and biological parents and the child is unknown.

      We agree- but we feel it is important to demonstrate (a) that cryptic biological relatedness between adopted individuals and their adoptive parents is a potential issue not only for our study, but for other studies attempting to utilize this information in the UK Biobank, and (b) that cryptic relatedness can be dealt with effectively through appropriate modelling in our SEM framework (i.e. even if it is not possible with the current data from UK Biobank). The corollary is that we recommend that the UK Biobank (and other large-scale biobanks) attempt to acquire information on adopted individuals and their parents through e.g. questionnaire.

      c) To address this problem in their analyses of UK biobank, authors used (Lines 302 & 417) information regarding whether children were breastfed or not (on the basis that this knowledge would be more common if the child was raised by a biological family relative) to identify adopted singletons raised by biological relatives. However, this is, at best, a mediocre index of genetic relatedness. I can see other reasons for participants to have knowledge of if they have been breastfed: because they were adopted at an older age, because they are still (or have been) in contact with their biological mother. It is also possible, albeit rare, that adoptive parents may breastfeed a child via the use of drugs to stimulate milk production. Line 420: the fact that the prenatal maternal estimate became non-significant after removing participants that were breastfed do provide results more in-line with what would be expected. But we can't use expected results as a basis to evaluate the validity of the approach. The absence of GxE and rGE are two other strong assumptions of the model that could also produce this kind unexpected results.

      We agree that (a) the inclusion of adopted individuals whose adoptive parents are biologically related to them is only one possible reason for unexpectedly strong prenatal maternal genetic effect estimates, (b) attempting to remove these individuals from the analysis using a proxy like breastfeeding information is less than perfect. As indicated above, we now discuss in detail alternative explanations for our results including violations of assumptions regarding the absence of GxE and rGE, and other explanations (assortative mating, stratification etc) (see new text in the Discussion and Supplementary File 17).

      d) I would suggest discussing the issue of genetic relatedness between adopting parents and offspring in terms of passive rGE which is a common problem for the estimation of parental effects in every familial design.

      We now include mention of passive rGE in the Discussion:

      “Rather we hypothesize it is possible that our model could have been misspecified in that substantial numbers of adopted individuals in the UK Biobank may have in fact been raised by their biological relatives. This can be thought of as (unintentional) reintroduction of passive gene-environment correlation into the study. In other words, adopted children are brought up by their genetic relatives, who in turn provide the environment in which they are raised. This induces a correlation between adopted individuals’ PRS and their environment.”

      e) Line 291: why use an unweighted PRS for EY3 (Lee, 2018), while the usual way of computing PRS (as a weighted sum of risk alleles) was used for birthweight?

      We thank the reviewer for pointing this inconsistency out. We have now rerun the analyses using weighted and unweighted PRS for both birth weight and educational attainment. The reason for running both sets of analyses is that the GWAS on which the SNPs are selected (i.e. the weights are based), contains UK Biobank individuals. This may inflate the overall strength of association between the PRS and outcome through winner’s curse (although not differentially between individuals from adoptive and biological families). In contrast, unweighted scores should be much more robust to this inflation, and so are a useful sanity check on the results.

      3) Limitations

      As our Discussion is already very long, we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, their consequences on model parameters, and possible options for addressing them. We also discuss specific concerns raised by the referee below.

      Assess other limitations of their method.

      a) limitation of the availability of birth father information,

      Our model does not require information on adopted individual’s birth fathers (although it does require PRS on non-adopted individuals’ birth fathers- which is typically readily available). It does, however, make the assumption that fathers do not contribute prenatally to offspring traits- which we think is a reasonable assumption for the majority of offspring phenotypes. If PRS for adopted individuals’ biological fathers were available, then prenatal paternal genetic effects could be estimated as part of the model. To accommodate the reviewer’s request, we have included and discussed this limitation/assumption in more detail in Supplementary File 17.

      b) prenatal events uncorrelated with birthmother's genes (disease or accidents),

      We agree that our model assumes that maternal genotype is uncorrelated with prenatal environmental factors. We now discuss this assumption/limitation further in Supplementary File 17.

      c) Inferring prenatal environment effect from higher birth mother correlation compared to birthfather is subject to bias from measurement differences between the two (Loehlin, 2016).

      Whilst this is a limitation of adoption designs that estimate prenatal effects using the difference between maternal and paternal correlations with offspring phenotypes, this is not actually a limitation of our model. In our model we do not use (phenotypic) mother-child and father-child correlations (we use PRS-phenotype correlations). Also, in our model, information on the size of the prenatal (and postnatal) maternal genetic effects primarily comes from the difference between the PRS-phenotype covariance in adopted singletons compared to the PRS-phenotype covariance non-adopted individuals (i.e. not from the difference between maternal and paternal correlations with offspring phenotypes). We state this in the Introduction and Methods e.g.:

      “Thus, the difference between the genotype-phenotype covariance in adopted and non-adopted singleton individuals provides important information on the likely size of postnatal genetic effects.”

      It is also worth noting, that in our model, the size of the paternal PRS-offspring association does not factor into the estimation of maternal genetic effects (nor does the difference between the maternal PRS-offspring phenotype association and the paternal PRS-offspring phenotype association). Also, our model takes into account if there are differences in the amount of (random) measurement error in adoptive and non-adoptive families.

      d) age at which the child is adopted (if the child has been partly raised by birth parents before adoption, it would bias (raise) the estimates of prenatal effects).

      We agree and now discuss this limitation further in Supplementary File 17.

      e) evocative rGE not mentioned. It has been shown that parents partly react to children's behaviors. Thus, the estimate of maternal genetic postnatal effects could be biased (lowered) by evocative gene-environment correlation. In other words, the model also assumes no evocative gene-environment correlation.

      We agree and now discuss this limitation in Supplementary File 17 (although we note that the effect that evocative rGE will have on the SEM parameters will depend on the direction of the gene-environment correlation).

      Final thoughts

      1) I would like a better case made for why it is important to distinguish genetic effects into prenatal and postnatal effect.

      We have included the following text in the Introduction:

      “Given the increasing number of variants identified in GWAS that exhibit robust maternal genetic effects, a natural question to ask is whether these loci exert their effects on offspring phenotypes through intrauterine mechanisms, the postnatal environment, or both. Indeed, resolving maternal effects into prenatal and postnatal sources of variation could be a valuable first step in eventually elucidating the underlying mechanisms behind these associations (Armstrong-Carter et al. 2020), directing investigators to where they should focus their attention, and in the case of disease-related phenotypes, yielding potentially important information regarding the optimal timing of interventions. For example, the demonstration of maternal prenatal effects on offspring IQ/educational attainment, suggests that if the mediating factors that were responsible could be identified, then improvements in the prenatal care of mothers and their unborn babies which target these factors, could yield useful increases in offspring IQ/educational attainment.”

      2) I would suggest the author make a clear distinction between the limits inherent to their sample (UK biobank) from those inherent to their methodological approach. I see important usefulness is plague by limits inherent to the sample used. At the same time, I am not aware of the availability of a big enough sample of adopted children with genotypic information available to compute PRS.

      One of the main limitations inherent to our sample (UK Biobank) is the fact that currently we cannot be certain that adopted individuals are not biologically related to their adoptive parents. As we demonstrate, this limitation could be addressed if information were gathered regarding the relationships, which at least in principle could be done relatively easily in the UK Biobank (e.g. by questionnaire, or even better, by genotyping adoptive parents where possible). The SEMs could then be adjusted to take these relationships into account. We discuss this limitation, and many others, in Supplementary File 17, and divide the table according to whether the limitation is primarily a consequence of the dataset (UK Biobank) or the method more broadly.

      We agree with the reviewer that the size of adoption studies is currently limited (e.g. Texas Adoption Project; Colorado Adoption Study etc). Nevertheless, it is likely that the number of adopted individuals available in large-scale Biobanks will increase over time, in which case models like the one espoused in this manuscript will become increasingly useful. Importantly, our method does not require adoptive families in order to partition maternal effects, merely adopted singleton individuals, and reliable information on the biological relatedness (or lack thereof) of their adoptive parents. We feel therefore that it is important that this sort of information be gathered so that the adopted individuals within these large-scale resources can be leveraged to examine interesting questions like the ones discussed in our manuscript.

      We have added these points to the Discussion:

      “We argue that of greater consequence for the validity of our model is that any genetic relationship between adoptive and biological parents is accurately modelled and included in the SEM. Through simulation, we have shown that the consequences of model misspecification depend upon which biological and adoptive parents are related, the nature of this relationship, and the proportion of adopted individuals in the sample who have had their relationship misspecified. Our simulations also showed that correctly modelling this relationship returns asymptotically unbiased effect estimates and correct type I error rates. Clearly, knowing these cryptic relationships in the UK Biobank would allow us to properly model them and better estimate prenatal and postnatal maternal genetic effects using this resource. We emphasize that accurately modelling these relationships does not require that actual genotypes for adoptive and/or biological parents be obtained (although this would be advantageous in terms of statistical power) as our SEM allows us to model these relationships in terms of latent variables. Indeed, as large-scale resources like the UK Biobank become more common, we expect that the number of adopted individuals who have GWAS will also increase, and consequently models like the one espoused in this manuscript will become increasingly useful. High quality phenotypic information on these adopted individuals and their adoptive parents including whether they share any biological relationship will be critical to making the most of these resources.”

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第五部分,内容主要是:

      1.Lee Sproull发表的演讲:“信息是不够的:计算机对生产性工作的支持”(Information Is Not Enough: Computer Support for Productive Work)。内容介绍:对一项新技术的任何设想都意味着对人类及其行为的设想。在这次演讲中,我描述了与个人计算的最有影响力的技术愿景相关的人类行为愿景,其缩影是万尼瓦尔·布什(Vannevar Bush)的Memex--孤独的思想者和问题解决者的愿景。我将这一愿景与关于人类生产性行为如何实际发生的另一种观点进行对比--在相互依赖的社会关系中。我回顾了目前计算机对社会行动者的支持状况,并提出了另一种观点,即信息处理从属于关系管理。

      2.艾伦·凯(Alan Kay)发表的演讲:“Simex:布什的愿景中被忽视的部分”(Simex: the neglected part of Bush's Vision)。内容介绍:布什的愿景是在一张桌子上建立一个超链接的10000卷图书馆,它对个人计算的发展产生了巨大的影响,而且今天也有可能实现(甚至可以通过互联网超越它)。然而,尽管布什在30年代就从事(模拟)计算机模拟工作,但很可能他从他的工作或新建的Eniac中都看不到Memex的任何模拟作用。布什的设想中缺少什么,今天能不能发明出来?

      3.第 2 天小组讨论。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第四部分,内容主要是:1.拉吉·瑞迪(Raj Reddy):重新审视布什的智能系统(Bush's Intelligent Systems Revisited)。内容介绍:在他著名的论文 《诚如我思》中,万尼瓦尔·布什(Vannevar Bush)为创造能够解释图片、听写、理解语言、使用超链接和从数字图书馆进行关联检索的机器提供了一个愿景。在这次演讲中,我们将回顾50年来在这些预测方面所取得的进展。

  2. Local file Local file
    1. 'I don't think it's anything—I mean, I don't think it was ever put to anyuse. That's what I like about it. It's a little chunk of history that they'veforgotten to alter. It's a message from a hundred years ago, if one knew howto read it.'

      Walter and Julia are examining a glass paperweight in George Orwell's 1984 without having context of what it is or for what it was used.

      This is the same sort of context collapse caused by distance in time and memory that archaeologists face when examining found objects.

      How does one pull out the meaning from such distant objects in an exegetical way? How can we more reliably rebuild or recreate lost contexts?

      Link to: - Stonehenge is a mnemonic device - mnemonic devices in archaeological contexts (Neolithic carved stone balls


      Some forms of orality-based methods and practices can be viewed as a method of "reading" physical objects.


      Ideograms are an evolution on the spectrum from orality to literacy.


      It seems odd to be pulling these sorts of insight out my prior experiences and reading while reading something so wholly "other". But isn't this just what "myths" in oral cultures actually accomplish? We link particular ideas to pieces of story, song, art, and dance so that they may be remembered. In this case Orwell's glass paperweight has now become a sort of "talking rock" for me. Certainly it isn't done in any sort of sense that Orwell would have expected, presumed, or even intended.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the reviewers

      Manuscript number: RC-2022-01407

      Corresponding author(s): Ivana, Nikić-Spiegel

      1. General Statements

      We would like to thank the reviewers for careful reading of our manuscript and for their insightful and useful comments. We are happy to see that the reviewers find these results to be of interest and significance. The way we understand reviewers’ reports, their main concerns can be roughly divided in following categories: 1) providing more quantitative data 2) interpretation of the Annexin V/PI assay 3) additional evidence for calpain involvement. We intend to address these experimentally or by modifying the text, as outlined below.

      2. Description of the planned revisions

      Reviewer #1

      Fig1A/B o SYTO 16 staining suggests slight reshaping of nucleus upon spermine NONOate, showing less blurry punctae. From the SYTO 16 profile, this should be quantifiable.

      By looking at the shown examples and the entire dataset, it appears to us as if neuronal nuclei are shrinking upon spermine NONOate treatment resulting in their less blurry appearance. We are not sure if this is what the reviewer is referring to, but this can also be quantified by measuring changes in neuronal nuclear size. We already have this data from the measurements shown in Fig4 and we intend to show it in the revised version of the manuscript. Line profile measurements are also possible, but the nuclear size quantification might be more suitable for this purpose.

      o There is a subset of neuron nuclei that are SYTO 16 positive. Please quantify the ratio

      We will use our existing dataset to quantify the ratio of NFL positive and SYTO16 positive nuclei.

      FigS1A o Show NeuN with Anti-NFL merged figures

      We will show merged NeuN and anti-NFL images, which might require rearrangement of the existing figures and figure panels. We will do this in the revised manuscript.

      FigS1C o Show quantification and timeline. I want to know whether there is also a plateau reached here.

      As the data shown in the FigS1C do not include NeuN staining, we will do additional experiments and perform proposed quantifications.

      FigS2A-F o Though the statements might be true, selecting one nucleus for a line profile as a statement for the whole dataset seems problematic. Average a larger number of unbiased selected nuclei profiles across multiple cultures to make a stronger statement, or a percentage of positive nuclei as in FigS1b.

      Corresponding images and line profiles are representative of the entire dataset. However, we agree with the reviewer that this is not obvious from the current manuscript version. Thus, to strengthen our findings, we intend to quantify the percentage of positive nuclei as in FigS1b. The only difference will be that instead of NeuN, we will use SYTO16 as a nuclear marker. The reason being that the existing datasets contain images of NFL and SYTO16 and not NeuN.

      FigS3 • There are no fluorescence profiles, no quantification

      As the reviewer suggests, we will quantify the ratio of NFL positive and SYTO16 positive nuclei, and include the quantifications in the revised manuscript.

      General statement: There do seem to be punctated patterns of non-nucleus accumulating NFL fragments. Can they be localized to any specific structure?

      We assume that the reviewer is referring to neuronal/axonal debris. They are present after injury but they do not colocalize with nuclear stains. We will address this in the revised manuscript.

      Fig1C-F • I find it too simplistic to categorize c+f and d+e together. There is a huge difference in the examples of nuclear localization between d and e. To not comment on their distinction (if that is consistent) is problematic. Also, since we don't see a merge with either NeuN or SYTO 16, reader quantification is difficult.

      We thank the reviewer for bringing this up. We will carefully check our entire dataset and we will update the figures and the text accordingly. We will also show the corresponding SYTO16 images, as the reviewer suggested.

      Would the microfluidic device construction allow for time to transport any axonally damaged fragments to the soma?

      Yes, the construction of the microfluidic devices allows the transport of axonal proteins back to the soma. Based on our experiments, it seems that damaged NFL from the axonal compartment could be contributing to the accumulation of NFL fragments in the nuclei. However, this contribution seems to be minimal as we cannot detect nuclear NFL upon the injury of axons alone. Alternatively, it could be that the processing of axonal NFL fragments proceeds differently if neuronal bodies are not injured and that this is the reason we don’t detect the NFL nuclear accumulation upon injury of axons alone. We will discuss this in the revised manuscript.

      Fig2C+D • The statement ".... no annexin V was detected on the cell membrane" needs to be shown more clearly

      We will modify figures to address this comment.

      • Please provide merged AnnexinV/PI images

      We will modify figures to address this comment.

      • The conclusion about 2D, that nuclear accumulated NFL overlaps with PI is not supported by the example image shown. There are plenty of PI positive spots that are not NFL positive and even several NFL positive ones that do not have a clear PI staining. Please quantify and then show a very clear result in order to be able to suggest necrosis as the underlying process.

      We are not sure if we understand the reviewer’s concern correctly. We will try to clarify it here and in the revised text. If necessary, we will tone down our conclusion, but the reason why not all of PI positive spots are NFL positive is most likely due to the fact that not all injured nuclei are NFL positive. We quantified in FigS1 that up to 60% of nuclei under injury conditions show NFL accumulations. That is why we are not surprised to see some PI positive/NFL negative nuclei. And the fact that there are some NFL positive nuclei which appear to be PI negative is most likely related to the fact that the PI binding is affected. In addition, upon closer inspection of NFL and PI panels in Fig2d it can be observed that NFL positive nuclei are also PI positive, albeit with a lower PI fluorescence intensity. We will modify the figure to show this clearly in the revised manuscript.

      FigS5 C+D • If the case is made that nitric oxide damage induces necrosis, then why is it that the AnnexinV example of Staurosporine exposure (which induces apoptosis) looks similar to that of nitric oxide damage in Fig2d and necrosis induction with Saponin looks very different?

      We thank the reviewer for bringing this up. We will try to clarify this in the revised manuscript. Regarding the specific questions, the most likely explanation why staurosporine treated neurons look similar to the ones treated with spermine NONOate is that in the late stages of apoptosis cell membrane ruptures and allows for the PI to label nuclei. This is probably the case here as illustrated by the nucleus in the middle of the image (FigS5c) that shows the fragmentation characteristic for the apoptosis. This is not happening in early apoptotic cells due to the presence of an intact plasma membrane. On the other hand, the reason why saponin treated cultures look different compared to spermine NONOate is that membranes are destroyed by saponin so that the PI can enter the cell. For that reason, there could have not been any AnnexinV binding to the membrane which would correspond to the AnnexinV signal of spermine NONOate treated neurons. As we will discuss below, we did not try to mimic spermine NONOate-induced injury with saponin treatment. Instead this was a control condition for PI labeling and imaging. We also used a rather high concentration of saponin which probably destroyed all the membranes which was not the case with spermine NONOate treatment. We intend to do additional control experiments to address this.

      • Additionally, does necrosis induction with Saponin also cause NFL fragment accumulation in the nucleus? Please show a co-staining of them. Also, the authors want to make a claim about reduce PI binding in NFL accumulated necrotic cells. In these examples, the intensity of the nuclear stain of PI with Saponin looks dimmer than with Staurosporine. Are the color scalings similar? It might be that the necrotic process itself causes reducing binding of PI and is not related to the presence of NFL.

      With regards to this question, it is important to note that Annexin V and PI imaging was done in living cells. To obtain the corresponding anti-NFL signal as shown in Fig 2c,d we had to fix the neurons, perform immunocytochemistry and identify the same field of view. We tried to do the same procedure after saponin treatment (Supplementary Figure 5d) but the correlative imaging was very difficult due to the detachment of neurons from the coverslip after the saponin treatment. For this reason, we could not identify the same field of view co-stained with NFL. However, other fields of view did not show NFL fragment accumulation. This could also be the consequence of the high saponin concentration that we used as we discuss above. We have also noticed the reduced intensity of PI binding in the nuclei of saponin-treated neurons. However, if the necrotic process itself reduces the binding of PI to the DNA, then all of the neurons treated with spermine NONOate would have an equally low PI signal. In our experiments, only the nuclei which contained NFL accumulations had a low PI signal, while the signal of NFL-negative nuclei was higher (as shown in Fig2d). We would also like to point out again that the saponin treatment was our control of the PI’s ability to penetrate cells and bind the DNA, as well as our imaging conditions, and not the control of the necrotic process itself. This is the reason why we didn’t go into details about neuronal morphology and NFL localization upon saponin treatment. We thank the reviewer for pointing this out since it prompted us to reevaluate what we wrote in the corresponding paragraph of the manuscript. We realized that the confusion might stem from our explanation of the AnnexinV/PI assay controls in the lines 196-198 (“Additional control experiments in which neurons were treated with 10 μM staurosporine (a positive control for induction of apoptosis) or with 0.1% saponin (a positive control for induction of necrosis) confirmed the efficiency of the annexin V/PI assay (Supplementary Fig. 5c,d).”). We will modify this portion of the text to clearly state that staurosporine and saponin treatments were controls of the AnnexinV and PI binding to their respective targets and not of the apoptosis/necrosis process. When it comes to the saponin treatment, our intention was only to permeabilize the membranes in order to allow PI penetration and DNA binding and not to induce necrosis or to mimic the effect of the spermine NONOate. We also intend to perform experiments with lower concentration of saponin to try to address this experimentally in addition to the text modifications.

      Fig3d • Please show similarly scaled images from controls for proper comparison

      We will show similarly scaled images of the control neurons so that they can be properly compared. They were initially not scaled the same for visualization purposes, but we will modify this in the revised manuscript.

      • How do the authors scale the degree and kinetics of induced damage between application of hydrogen peroxide/CCCP and glutamate toxicity? Does glutamate toxicity take longer to affect the cell, not allowing enough time to accumulate NFL fragments in the nucleus?

      It is challenging to scale the degree and kinetics of induced damage with different stressors. That is why we did not intend to do this. Instead we set different injury conditions based on the published literature. That is why can only speculate when it comes to this. In this regard, it can be that the glutamate toxicity takes “longer” to affect the cells even though it is very difficult to compare them on a timescale, especially when considering different mechanisms of action. We will discuss this limitation in the revised manuscript.

      Fig4B • Some groups (like NO and NO + emricasan) have much larger numbers of close to 0 intensity, compared to the control group. Why?

      We were wondering the same when we analyzed the data. The fact that our nuclear fluorescence intensity analysis picked up NFL signal in control neurons which had no nuclear NFL accumulation made us realize that the intensity measured in the nuclei of control group comes entirely from the out of focus fluorescence – from neurofilaments in cell bodies, dendrites and axons (an example can be seen in the FigS6). That is why we presented the corresponding data with a cut-off value based on the control signal (as mentioned in lines 238-240). Since the oxidative injury causes NFL degradation (not only in neuronal soma, but also neuronal processes), the overall fluorescence intensity of the NFL immunocytochemical staining is reduced in injured neurons. We can see that in all of our images. Consequently, there is no contribution of out of focus fluorescent signal to the measured fluorescence intensity in the majority of nuclei. Due to that, the nuclei without NFL accumulation (at least 40% of injured nuclei) will appear to have a close to 0 intensity of the fluorescent signal. We will discuss and clarify this additionally in the revised manuscript.

      • Please add the ratio of above/below threshold (50/50 obviously in controls)

      We will update the figure in the revised manuscript.

      • The description of the CTCF value calculation seems a little... muddled? Several parameters are described whereas "integrated density" is not even used. Why not simply mean intensity of nuclear ROI-mean intensity of background ROI?

      We included the integrated density in the description since it is measured together with the raw integrated density and can also be used for the CTCF value calculation. However, since we didn’t use it for the CTCF calculation, we will remove it from the corresponding section of the manuscript. We calculated the CTCF value instead of calculating mean intensity of the nuclear ROI - mean intensity of the background ROI, since the CTCF value also takes into account the area of the ROI and not just the mean intensity.

      • Also, please tell me if the areas for nuclear ROIs change, as I noted for Fig1A/B

      We will include this information in the revised manuscript.

      • To make sure that one of the 3 experimental repeats didn't skew the results, please show the median fluorescence intensity for each individual experiment to clarify that the supposed effect is repeated across experiments.

      We have already noticed that in the earliest of the three experiments overall fluorescence intensity was higher, but this was consistent across all the experimental groups and did not skew the results or affect the overall conclusion. However, we will double-check this and revise the figure.

      • From the text "...and due to the NFL degradation during injury...": this seems to contradict the process? Either the NFL fragment accumulates in the nucleus or it is degraded during injury. And isn't the degradation through calpain what supposedly allows this fragment of NFL to go to the nucleus in the first place? I reckon that the authors are possibly trying to reconcile why there are many close-to-0 intensity nuclei in the NO and NO + emricasan groups, but I don't feel the explanation given here fits.

      As we tried to explain in our response above, we think that the overall degradation of neurofilaments in neurons affects the fluorescence intensity originating from the out of focus neurofilaments. Therefore, the nuclei without NFL accumulation in injured conditions have a close to 0 fluorescence intensity. Additionally, we think that this is not an either/or situation, but that both degradation and nuclear accumulation of NFL happen simultaneously. We also think that degradation of axonal NFL and the transport of its tail domain to the soma will at least partially contribute to the accumulation in the nucleus. In any case, degradation and nuclear accumulation seem to be differentially regulated in individual neurons, as some of them show nuclear NFL accumulation and some not. Furthermore, calpain and other mechanisms could also cause NFL degradation up to the point at which these fragments can no longer be recognized by the anti-NFL antibody leading to the loss of signal. We will try to clarify this in the revised version of the manuscript.

      Fig5 • Does the distribution of this GFP in B match any of the various antibody stainings of different NFL fragments? Perhaps this is still a valid fragment of NFL, just not picked up by any AB?

      The GFP signal in B appears rather homogenous and it does not match any of the various antibody stainings of different NFL fragments. As the reviewer points out, this could also be a valid fragment of NFL fused to GFP that none of our antibodies is recognizing. We will clarify this in the revised manuscript.

      • "... and was indistinguishable from the full277 length NFL-GFP." Based on what parameters?

      We will clarify this in the revised text, but we meant in terms of overall neurofilament network and cell appearance, which is commonly used to test the effect of NFL mutations.

      • The authors claim that b is different from d, but I am not convinced. I would like to see a time dependent curve from multiple cells showing a differential change in nuclear and cytosolic GFP signal.

      As we also wrote in the manuscript, in the majority of neurons that were monitored during injury we were not able to detect an increase in the GFP fluorescence intensity in the nucleus. This is what prompted further experiments with NFL(ΔA461–D543)-FLAG. We will clarify this additionally in the revised manuscript and perform line profile intensity measurements to show the difference in nuclear and cytosolic GFP signal.

      • Secondly, the somatic GFP intensity for NFL increases for full length NFL-GFP. How is this explained, if it is only a separation of NFL and GFP? If anything, GFP should float away. And if the answer is that NFL is recruited to the nucleus, you showed that inhibition of calpain activity partially prevents that. So, if calpain activity is necessary for the transport of NFL to the nucleus, then wouldn't it also cut the GFP from NFL before it reaches the nucleus?

      We thank the reviewer for bringing this up and we apologize for the confusion. This can be explained by the fact that the images were scaled in a way that the GFP signal over time could still be seen easily (i.e. differently across different time points which we unfortunately forgot to mention in the figure legend). In the revised manuscript, we will either scale the images the same or we will alternatively show the displayed grey values in individual panels.

      Fig6 • It is recommended to overlap the transfected cells with a stain for endogenous NFL to show that despite the absence of the FLAG-tag, there is still NFL.

      We did not overlap the anti-NFL with anti-FLAG and SYTO16 staining, due to the space constraint and the intent to clearly show the overlap of FLAG and SYTO16 signals in the merged images above the graphs. However, the line profile intensity measurements were done in all three channels and show that despite the absence of FLAG, there is still NFL in the nucleus (Fig6b), or that both FLAG and NFL are present in the nucleus (Fig6d, NFL signal shown in gray). However, as this is not obvious and can easily be overlooked, we will show the endogenous NFL staining overlap in the revised version of the manuscript.

      Fig7 • „ ...all disrupted neurofilament assembly...": this sounds like the staining for native NFL supposedly shows a distortion due to a dominant negative effect of the expression of these constructs? Please clarify.

      Yes, we were referring to the disruption of neurofilament assembly due to a dominant negative effect of the expression of NFL domains. We will clarify this in the revised version of the manuscript.

      Discussion: • The authors show that after overepression of the head domain only, it possibly passively diffuses into the nucleus even in the absence of oxidative injury. However, it seems to be suggested as well that the head domain would not be freely floating around if it wouldn't be for increased calpain activity as a result of oxidative injury in the first place. Therefore, a head domain fragment localized in the nucleus would still more prominently happen upon oxidative injury and interact with DNA through prior identified putative DNA interaction sites from Wang et al. Please comment.

      That is correct. Upon injury and calpain cleavage, it is conceivable that a fragment containing the NFL head domain would also be present in the cell and could potentially diffuse to the nucleus and interact with the DNA. However, by staining injured neurons with an antibody that recognizes amino acids 6-25 of the NFL head domain, we were not able to detect an NFL signal in the nucleus (FigS2a,b). It could be that either the NFL head domain does not localize in the nuclei upon injury, or that the fragment localizing in the nucleus does not contain amino acids 6-25 of the NFL head domain. As the putative DNA-binding sites described by Wang et al involve 7 amino acids located in the first 25 residues of the NFL head domain, we would expect to detect it with the aforementioned antibody. However, as that was not the case we speculated that the interaction of NFL and DNA occurs differently in living cells, as opposed to the test tube conditions utilized by Wang et al. We will comment and clarify this in the revised version of the manuscript.

      • Reviewer #2*

      • Major Comments:

      • The initial data presented in the paper is good, does response of oxidative damage with proper controls, testing the antibodies to NF-L and etc. (Fig. 1-Fig. 4). *

      We thank the reviewer for their positive feedback.

      1. The evidence for calpain involvement in NF-L cleavage during oxidative damage is missing. Provide the evidence for full length NF-L construct and deletion mutants transfected into cells by immunoblot for cleavage of NF-L, perform nuclear and cytoplasmic extract preparations and show that enrichment of the tagged cleaved NF-L fragment in nuclear fraction.

      We thank the reviewer for their comments and suggestions. Since we saw in our microscopy experiments that calpain inhibition reduced the accumulation of NFL in the nucleus, and since it is known that NFL is a calpain substrate (Schlaepfer et al., 1985; Kunz et al., 2004 and others), we did not perform additional experiments to confirm the involvement of calpain in NFL degradation during injury. However, to strengthen our findings, we intend to perform the suggested experiments and include the results in the revised manuscript.

      1. Show calpain activation during oxidative damage by performing alpha-Spectrin immunoblots identify calpain specific 150-kda Spectrin and caspase specific 120-kDa fragment generation in these cells. Also, calpain activation can be measured by MAP2 level alteration and p35 to p25 conversion. Without this evidence it's very hard to believe if the calpain activity is increased or decreased during oxidative damage and these markers are altered by using calpain inhibitors.

      To confirm the calpain activation, we intend to perform anti-alpha spectrin and/or anti-MAP2 blots in lysates of control and injured neurons and include the results in the revised manuscript.

      1. The premise that NF proteins are absent in cell bodies and present only in axons is not correct. It has been demonstrated by multiple investigators that NFs are present in the perikaryon and dendrites of many types of neurons (Dahl, 1983, Experimental Cell Research)., Dr. Ron Liem's group showed NF protein expression in cell bodies of dorsal root ganglion cells (Adebola et ., 2015, Human Mol Genetics) and also showed N-terminal antibodies for NF-L, NF-M and NF-H stain rat cerebellar neuronal cell bodies and dendrites (Kaplan et al., 1991, Journal of Neuroscience Research) when NFs are less phosphorylated. (Schlaepfer et al., 1981, Brain Research) show staining of cell bodies of cortex and dorsal root ganglion cell bodies with NF antibody Ab150, and Yuan et al., 2009 in mouse cortical neurons with GFP tagged NF-L.

      We are not sure what the reviewer is referring to since we cannot find a corresponding section in which we claim that NF proteins are absent in cell bodies. We wrote the following “Anti-NFL antibody staining of neurons treated with the control compound showed the expected neurofilament morphology, that is, a strong fluorescence intensity in axons and lower intensity in cell bodies and dendrites (Fig. 1a)” in our results section (lines 119-121), but the claim we were trying to make there was that NF proteins are particularly abundant in axons. We will clarify this in the revised manuscript.

      1. Quantifying NF-L signal or tagged NF-L fragment signals in the cell body by ICC has many problems and making conclusions. It's extremely difficult to have control over levels of proteins in transfected overexpression models and comparing two or three different constructs with each other by ICC. Not every cell expresses same levels of protein in transfected cells and quantifying it by ICC again has a major problem. This can be addressed if there are stable lines that express equal levels of protein in all cells that comparisons can be made. Under thesese circumstances validation of the hypothesis presented in the study has no strong direct evidence to demonstrate that calpain is activated and NF-L fragment translocate to the nucleus.

      We agree that the results from overexpression-based experiments should be interpreted with caution as levels of expression vary between the cells. We intend to discuss this in the revised manuscript. However, we find it difficult to experimentally address this comment since we are not sure which specific experiments the reviewer is referring to. With regards to this, we would like to emphasize that most of the initial experiments in which we observed NFL accumulation in the nuclei of injured neurons were based on the ICC labeling of endogenous NFL and didn’t involve its overexpression. This includes labeling of endogenous NFL in various types of neurons, comparing the effects of different types of oxidative injury, as well as testing the effects of calpain inhibition on the observed nuclear accumulation (Figures 1-4; Supplementary Figures 1-6). We later resorted to the overexpression experiments in primary neurons (Figures 5-7; Supplementary Figure 7, 10) to gain more information about the identity of NFL fragment which was detected in the nucleus. Due to the low transfection efficiency of primary neurons, we performed an additional set of overexpression experiments in neuroblastoma ND7/23 cells (Figure 8; Supplementary Figures 8,9) and obtained similar results in a higher number of cells. We agree that having stable cell lines which e.g. express same levels of NFL domains would be a more elegant approach and we intend to make them for our follow-up studies, however the generation of said stable cell lines might be beyond the scope of this revision. Furthermore, looking at our data with overexpression of NFL domains in ND7/23 cells (Supplementary Figure 8,9), it appears to us as if different domains are rather homogenously expressed in different cells. While the expression levels might vary, it seems that they all show the same trend when it comes to their localization (which was the main point of those experiments).

      1. The interpretation that NF-L preventing DNA labeling cells is misinterpretation. NFs have very long half-life compared to other proteins. Due to oxidative damage, DNA is degraded in the cells but NFs that have very long half-life you see as NFs rings in the dead cells. So, NFs do not prevent DNA labeling, but DNA or chromatin is degraded in dead cells.

      We thank the reviewer for their useful insight. DNA degradation could certainly be the reason why we observe a lower fluorescence intensity of the propidium iodide fluorescence in the nuclei of injured neurons. We intend to discuss this in the revised manuscript. However, if the DNA degradation is the only reason for the lower PI fluorescence intensity, then the PI fluorescence intensity would be the same in all injured nuclei. In our experiments, we saw the reduced PI fluorescence intensity in nuclei that contained NFL accumulations and not in other nuclei. Additionally, we observed a reduction of SYTO16 fluorescent labeling of nuclei which contained accumulations of the NFL tail domain, even in the absence of oxidative injury. Due to these reasons we speculated that NFL accumulation in the nucleus might hinder nuclear dyes from interacting with the DNA. But this is only a speculation and we will try to clarify this further in the revised manuscript including alternative explanations.

      Minor comments: 1. In the introduction on page 4 reference is missing for NF transport, aggregation and perikaryal accumulation (on line 93).

      We will add a reference to the revised manuscript.

      1. The statement in discussion on page 14 line 454 for Zhu et al., 1997 study is not accurate. It should be modified to sciatic nerve crush not spinal cord injury.

      We will correct this mistake in the revised manuscript.

      1. What is the size of the calpain cleaved NF-L tail domain? If you perform immunoblots on cell extracts treated with oxidative agents one would know it.

      We will perform immunoblots on cell lysates and incorporate the corresponding results in the revised manuscript.

      1. Authors could make their conclusions clear. This is particularly true for the experiments in Figure 4 panels c and d. It is very difficult to understand the conclusions of the experiments. First state the expectation and then described whether the expectation is true or different.

      We will do as the reviewer suggested in the revised manuscript.

      1. The ICC images are at extremely low magnification. They should be shown at 100x or 120x so that details of the cell body and the nucleus can be seen.

      Our intention was to show larger fields of view and wherever appropriate insets, but we will try to improve this in the revised manuscript by either zooming in, cropping or adding additional insets with individual cell bodies and nuclei. In general, images were taken with an optimal resolution/pixel size in mind for any of the used objectives (60x/1.4 NA or 100x/1.49 NA) and we can easily modify our figure panels to show more details.

      1. Oxidative damage leads to beaded accumulation of NF-L in neurites and axons. Authors should address this issue.

      We will discuss this in the revised manuscript.

      1. The combination treatment of the inhibitors (last 3 sets of the Fig. 4 b) has no statistical significance should be removed.

      Actually, these differences were statistically significant (Supplementary Table 1). For clarity and as described in the figure legend (line 516: “The most relevant significant differences are indicated with an asterisk”) we showed only a subset of them on the graph, but we will change this in the revised manuscript.

      1. Why only two antibodies recognize cleaved NF-L? If the antibodies at directed at tail region, they should recognize it unless the phosphorylated tail at Ser473 may inibit the antibody binding. In that case NF-L Ser473 specific antibody (EMD Millipore: MABN2431) may be used to test this idea.

      This is a very good point that we also wonder about. Even if all antibodies are directed at tail region, exact epitopes are not described for all of them. That makes it also difficult for us to understand and speculate on this. However, we have already ordered the new antibody as suggested by the reviewer and we will experimentally test it.

      **Referees cross-commenting**

      I agree with the reviewer#1 about presenting the quantification data for the indicated figures to make conclusions strong and see how much of variation is there among sampled cells.

      As discussed in our response to reviewer #1, we will provide additional quantifications.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      4. Description of analyses that authors prefer not to carry out

      Reviewer #2, major comment 7. Authors could do chromatin immunoprecipitation (chip) analysis to identify NF-L binding sites on chromatin and perform gel shift assays to show NF-L tail domain binding to specific consensus DNA sequences.

      We thank the reviewer for their suggestion. We are very interested in performing additional experiments and identifying the NFL binding sites on the DNA (either by chromatin immunoprecipitation or DamID-seq) and we intend to perform these experiments as soon as possible. Unfortunately, at the moment we do not have the expertise to perform such experiments in our lab. Instead, this type of follow-up project requires establishing a collaboration which is beyond the scope of this revision.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines EEG responses time-locked to (or "entrained" by) musical features and how these depend on tempo and feature identity. Results revealed stronger entrainment to "spectral flux" than to other, more commonly tested features such as amplitude envelope. Entrainment was also strongest for lowest rates tested (1-2 Hz).

      The paper is well written, its structure is easy to follow and the research topic is explained in a way that makes it accessible to readers outside of the field. Results will advance the scientific field and give us further insights into neural processes underlying auditory and music perception. Nevertheless, there are a few points that I believe need to be clarified or discussed to rule out alternative explanations or to better understand the acquired data.

      We thank the Reviewer for taking the time to evaluate our manuscript and for the positive response. We have now conducted further analyses to strengthen our conclusion that neural synchronization was strongest at slower musical tempi and to rule out an alternative explanation that neural synchronization was strongest for music presented near its own original or “natural” tempo. We also added some points to the Discussion in response to your comments; revised text is reproduced as part of our point-by-point responses below for your convenience. The page and line numbers correspond to the manuscript file without track changes.

      1) Results reveal spectral flux as the musical feature producing strongest entrainment. However, entrainment can only be compared across features in an unbiased way if these features are all equally present in the stimulus. I wonder whether entrainment to spectral flux is only most pronounced because the latter is the most prominent feature in music. Can the authors rule out such an explanation?

      Respectfully, it is not fully clear to us based on the literature that entrainment can only be compared across features fairly when those features are equally presented in the stimulus. Previous work in the speech domain has compared entrainment to amplitude envelope vs. spectrogram, vs. a symbolic representation of the time of occurrence of different phonemes (Di Liberto et al., 2015). Work in the music domain has compared entrainment to amplitude envelope (and its derivative) vs. features quantifying melodic expectation (surprise and entropy, quantified using a hidden Markov-model trained on a corpus of Western music; Di Liberto et al., 2020). In these papers, there was no quantification of the degree to which each feature was present in the stimulus material, and when comparing such qualitatively different features, it is not clear to us how one would do so. Nonetheless, these studies used the resulting TRF-based dependent measures to evaluate which feature best predicted the neural response. Here, although we do not know what acoustic feature might be most present / strongest in music, we believe that we can investigate the degree to which each feature predicts the neural response. In fact, we might argue the sort of reverse of the logic in your comment – that the TRF results actually tell us which feature is perceptually or psychologically the most important in terms of driving brain responses, which may not be fully predictable from the acoustics of those features.

      From a data analysis perspective, we have independently normalized (z-scored) each feature as well as the neural data, as prescribed in Crosse et al., 2021, to try to level the playing field for the musical features we are comparing. Moreover, we made changes in the discussion to acknowledge your concern. The text is reproduced here for your convenience.

      p. 26, l. 489-497: “One hurdle to performing any analysis of the coupling between neural activity and a stimulus time course is knowing ahead of time the feature or set of features that will well characterize the stimulus on a particular time scale given the nature of the research question. Indeed, there is no necessity that the feature that best drives neural synchronization will be the most obvious or prominent stimulus feature. Here, we treated feature comparison as an empirical question (Di Liberto et al., 2015), and found that spectral flux is a better predictor of neural activity than the amplitude envelope of music. Beyond this comparison though, the issue of feature selection also has important implications for comparisons of neural synchronization across, for example, different modalities.”

      2) Spectral analyses of neural data often yield the strongest power at lowest frequencies. Measures of entrainment can be biased by the amount of power present, where entrainment increases with power. Can the authors rule out that the advantage for lower frequencies is a reflection of such an effect?

      Thank you for this insightful comment. In response to your comment and the comments of Reviewer 3, we normalized the TRF correlations, stimulus–response correlations, and stimulus–response coherences by surrogate distributions that were calculated separately for each musical feature and – importantly – for every tempo condition. Following Zuk et al., 2021, we formed surrogate distributions by shifting the relevant neural data time course relative to the stimulus-feature time courses by a random amount. We did this 50 times, and for each shift re-calculated all dependent measures. We then normalized our dependent measures calculated from the intact time series relative to these surrogate distributions by subtracting the mean and dividing by the standard deviation of the surrogate distribution (“z-scoring”). Since the approach of shifting the neural data leaves the neural time series intact, the power spectrum of the data is preserved, but only its relationship to the stimulus is destroyed. After normalization, the plots obviously look a little different, but the main results – a higher level of neural synchronization to slower stimulation tempi and in response to the spectral flux – remain.

      The changes can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section.

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      A related point, what was the dominant rate of spectral flux in the original set of stimuli, before tempo was manipulated? Could it be that the slow tempo was preferred because in this case participants listened to a most "natural" stimulus?

      This is a good point, thank you. We did two things to attempt to address this (see also comment Reviewer 3). First, the original tempo for each song can be found in Supplementary Table 1. To make the table more readable and more comparable with the main manuscript, we have updated the table and now state the original tempi in BPM and Hz. Second, we added histograms of the original tempi across all songs as well as the maximum amount by which all songs were tempo-shifted (i.e., the maximum tempo difference between the slowest (or fastest) version of each song segment compared to the original tempo). These histograms have been added to Figure 1 – figure supplement 2, and are paraphrased here for your convenience (p. 13 l. 265-273): The original tempo of the set of musical stimuli ranges between 1-2.75 Hz. This indeed overlaps with the tempo range that revealed strongest neural synchronization. When songs were tempo-shifted to be played at a slower tempo than the original, they were shifted by ~0.25-1.25 Hz. In contrast, shifting a song to have a faster tempo typically involved a larger shift of ~1-2.25 Hz. Thus, it is definitely possible that tempo, degree of tempo shift, and proximity to “natural” tempo were not completely independent values.

      For that reason, to investigate the effects of the amount of tempo manipulation on neural synchronization, we conducted an additional analysis. We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulated tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 3 comments, we also added this additional point to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempo range of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of the magnitude of tempo manipulation on other tempo conditions.”

      3) The authors have a clear hypothesis about the frequency of the entrained EEG response: The one that corresponds to the musical tempo (or harmonics). It seemed to me that analyses do not sufficiently take that hypothesis into account and often include all possible frequencies. Restricting the analysis pipeline to frequencies that are expected to be involved might reduce the number of comparisons needed and therefore increase statistical power.

      Although we manipulated tempo, and so had an a priori hypothesis about the frequency at which the beat would be felt, natural music is a complex stimulus composed of different instruments playing different lines at different time scales, many or most of which are nonisochronous. Thus, we analyzed the data in two different ways – 1) based on TRFs and 2) based on stimulus–response correlation and coherence. Stimulus–response coherence is a frequency-domain measure, and so it was possible to do exactly as you suggest here and consider coherence only at the stimulation tempo and first harmonic, which we did (Figure 2E-J). However, for the TRF analyses, we followed previous literature (e.g., Ding et al., 2014; Di Liberto et al., 2020; Teng et al., 2021), and considered broader-band EEG activity (bandpass filtered at 0.5-30 Hz). Previous work has shown that the beat in music evokes a neural response at harmonics up to at least 4 times the beat rate (Kaneshiro et al., 2020), so we wanted to leave a broad frequency range intact in the neural data. Despite being based on differently filtered data, we found that the dependent measures from the two analysis approaches were correlated, which suggests to us that neural tracking at the stimulation tempo itself was probably the largest contributor to the results we observed here.

      Related to your comment, we added two points to our discussion, which we reproduce here for your convenience.

      p. 24-25, l. 453-461: “Regardless of the reason, since frequency-domain analyses separate the neural response into individual frequency-specific peaks, it is easy to interpret neural synchronization (SRCoh) or stimulus spectral amplitude at the beat rate and the note rate – or at the beat rate and its harmonics – as independent (Keitel et al., 2021). However, music is characterized by a nested, hierarchical rhythmic structure, and it is unlikely that neural synchronization at different metrical levels goes on independently and in parallel. One potential advantage of TRF-based analyses is that they operate on relatively wide-band data compared to Fourier-based approaches, and as such are more likely to preserve nested neural activity and perhaps less likely to lead to over- or misinterpretation of frequency-specific effects.”

      p. 29 l. 564-577: “Despite their differences, we found strong correspondence between the dependent variables from the two types of analyses. Specifically, TRF correlations were strongly correlated with stimulation-tempo SRCoh, and this correlation was higher than for SRCoh at the first harmonic of the stimulation tempo for the amplitude envelope, derivative and beat onsets (Figure 4 - figure supplement 1). Thus, despite being computed on a relatively broad range of frequencies, the TRF seems to be correlated with frequency-specific measures at the stimulation tempo. The strong correspondence between the two analysis approaches has implications for how users interpret their results. Although certainly not universally true, we have noticed a tendency for TRF users to interpret their results in terms of a convolution of an impulse response with a stimulus, whereas users of stimulus–response correlation or coherence tend to speak of entrainment of ongoing neural oscillations. The current results demonstrate that the two approaches produce similar results, even though the logic behind the techniques differs. Thus, whatever the underlying neural mechanism, using one or the other does not necessarily allow us privileged access to a specific mechanism.”

      Reviewer #2 (Public Review):

      Kristin Weineck and coauthors investigated the neural entertainment to different features of music, specifically the amplitude envelope, its derivative, the beats and the spectral flux (which describes how fast are spectral changes) and its dependence on the tempo of the music and self-reports of enjoyment, familiarity and ease of beat perception.

      They use and compare analysis approaches typically used when working with naturalistic stimuli: temporal response functions (TRFs) or reliable components analysis (RCA) to correlate the stimulus with its neural response (in this case, the EEG). The spectral flux seems the best music descriptor among the tested ones with both analyses. They find a stronger neural response to stimuli with slower beat rates and predictable stimuli, namely familiar music with an easy-to-perceive beat. Interestingly, the analysis does not show a statistically significant difference between musicians and non-musicians.

      The authors provide an extensive analysis of the data, but some aspects need to be clarified and extended.

      We thank the Reviewer for taking the time to evaluate and summarize our manuscript and for the great comments. We addressed the concerns and made changes throughout the manuscript, but especially in the introduction and discussion sections about the terminology (neural entrainment and neural measures), musical features of the stimuli, and musical experience of the participants. Below you can find the alterations described in more detail. The page and line numbers correspond to the manuscript file without track changes.

      1) It would be helpful to clarify better the concepts of neural entertainment, synchronization and neural tracking and their meaning in this specific context. Those terms are often used interchangeably, and it can be hard for the reader to follow the rest of the paper if they are not explicitly defined and related to each other in the introduction. Note that this is fundamental to understanding the primary goal of the paper. The authors clarify this point only at the end of the discussion (lines 570-576). I suggest moving this part in the introduction. Still, it is unclear why the authors use the TRF model and then say they want to be agnostic about the physiological mechanisms underlying entertainment. The choice of the TRF (as well as the stimulus representation) automatically implies a hypothesis about a physiological mechanism, i.e., the EEG reflects convolution of the stimulus properties with an impulse response. Please could you clarify this point? I might have missed it.

      Thank you for this valuable comment. We agree that it is fundamental to define and uniformly use terminology, and have made changes throughout the manuscript along these lines. First of all, we have changed all instances of “neural entrainment” or “neural tracking” to “neural synchronization”, as we think this term avoids evoking a specific theoretical background or strong mechanistic assumptions. Second, we have moved the Discussion paragraph you mention to the Introduction and expanded it. Specifically, we take the opportunity to address the association between specific analysis approaches (TRFs vs. stimulus–response correlation or coherence) and specific mechanistic assumptions (convolution of stimulus properties with an impulse response vs. entrainment of an ongoing oscillation, respectively). This allowed us to clarify what we mean when we say we prefer to stay agnostic to specific mechanistic interpretations. We are happy to have had the chance to strengthen this discussion, and think it benefits the manuscript a lot.

      We reproduce the new Introduction paragraph here for your convenience.

      p. 5-6, l. 101-123: “The current study investigated neural synchronization to natural music by using two different analysis approaches: Reliable Components Analysis (RCA) (Kaneshiro et al., 2020) and temporal response functions (TRFs) (Di Liberto et al., 2020). A theoretically important distinction here is whether neural synchronization observed using these techniques reflects phase-locked, unidirectional coupling between a stimulus rhythm and activity generated by a neural oscillator (Lakatos et al., 2019) versus the convolution of a stimulus with the neural activity evoked by that stimulus (Zuk et al., 2021). TRF analyses involve modeling neural activity as a linear convolution between a stimulus and relatively broad-band neural activity (e.g., 1–15 Hz or 1–30 Hz; (Crosse et al., 2016, Crosse et al., 2021); as such, there is a natural tendency for papers applying TRFs to interpret neural synchronization through the lens of convolution (though there are plenty of exceptions to this e.g., (Crosse et al., 2015, Di Liberto et al., 2015)). RCA-based analyses usually calculate correlation or coherence between a stimulus and relatively narrow-band activity, and in turn interpret neural synchronization as reflecting entrainment of a narrow-band neural oscillation to a stimulus rhythm (Doelling and Poeppel, 2015, Assaneo et al., 2019). Ultimately, understanding under what circumstances and using what techniques the neural synchronization we observe arises from either of these physiological mechanisms is an important scientific question (Doelling et al., 2019, Doelling and Assaneo, 2021, van Bree et al., 2022). However, doing so is not within the scope of the present study, and we prefer to remain agnostic to the potential generator of synchronized neural activity. Here, we refer to and discuss “entrainment in the broad sense” (Obleser and Kayser, 2019) without making assumptions about how neural synchronization arises, and we will moreover show that these two classes of analyses techniques strongly agree with each other.”

      2) Interestingly, the neural response to music seems stronger for familiar music. Can the authors clarify how this is not in contrast with previous works that show that violated expectations evoke stronger neural responses ([Di Liberto et al., 2020] using TRFs and [Kaneshiro et al., 2020] using RCA])? [Di Liberto et al., 2020] showed that the neural response of musicians is stronger than non-musicians as they have a stronger expectation (see point 2). However, in the present manuscript, the analysis does not show a statistically significant difference between musicians and non-musicians. The authors state that they had different degrees of musical training in their dataset, and therefore it is hard to see a clear difference. Still, in the "Materials and Methods" section, they divided the participants into these two groups, confusing the reader.

      Our findings are consistent with previous studies showing stronger inter-subject correlation in response music in a familiar style vs. music in an unfamiliar style (Madsen et al., 2019) and stronger phase coherence in response to familiar relative to unfamiliar sung utterances (Vanden Bosch der Nederlanden et al., 2022). We actually don’t think our results (stronger neural synchronization for familiar music) or these previous results are incompatible with work showing that violations of expectations evoke stronger neural responses. This work either manipulated music so it violated expectations (Kaneshiro et al., 2020) or explicitly modeled “surprisal” as a feature (Di Liberto et al., 2020). Thus, we could think of those stronger neural responses to expectancy violations as reflecting something like “prediction error”. Our music stimuli did not contain any violations, and we were unable to model responses to surprisal given the nature of our music stimuli, as we better explain below (p. 27 l. 514-529). Thus, neural synchronization was stronger to familiar music, and we would argue that listeners were able to form stronger expectations about music they already knew. We would predict that expectancy violations in familiar music would evoke stronger neural responses to those in unfamiliar music, though we did not test that here. We now include a paragraph in the Discussion reconciling our findings with the papers you have cited.

      p. 27 l. 514-529: “We found that the strength of neural synchronization depended on the familiarity of music and the ease with which a beat could be perceived (Figure 5). This is in line with previous studies showing stronger neural synchronization to familiar music (Madsen et al., 2019) and familiar sung utterances (Vanden Bosch der Nederlanden et al., 2022). Moreover, stronger synchronization for musicians than for nonmusicians has been interpreted as reflecting musicians’ stronger expectations about musical structure. On the surface, these findings might appear to contradict work showing stronger responses to music that violated expectations in some way (Kaneshiro et al., 2020, Di Liberto et al., 2020). However, we believe these findings are compatible: familiar music would give rise to stronger expectations and stronger neural synchronization, and stronger expectations would give rise to stronger “prediction error” when violated. In the current study, the musical stimuli never contained violations of any expectations, and so we observed stronger neural synchronization to familiar compared to unfamiliar music. There was also higher neural synchronization to music with subjectively “easy-to-tap-to” beats. Overall, we interpret our results as indicating that stronger neural synchronization is evoked in response to music that is more predictable: familiar music and with easy-to-track beat structure.”

      Your other question was why we did not see effects of musical training / sophistication on neural synchronization to music, when other studies have. There are a few possible reasons for this. One is that previous studies aiming to explicitly test the effects of musical training recruited either professional musicians or individuals with a high degree of musical training for their “musician” sample. In contrast, we did not target individuals with any degree of musical training, but attempted this analysis in a post-hoc way. For this reason, our musicians and nonmusicians were not as different from each other in terms of musical training as in previous work. Given this, we have opted to remove the artificial split into musician and nonmusician groups, and now only include a correlation with musical sophistication (as you suggest in your next comment), which was also nonsignificant (Figure 5 – figure supplement 2).

      3) Musical expertise was also assessed using the Goldsmith Music Sophistication Index, which could be an alternative to the two-group comparison between musicians and non-musicians. Does this mean that in Figure 5, we should see a regression line (the higher the Gold-MSI, the higher should be the TRF correlation)? Since we do not see any significant effect, might this be due to the choice of the audio descriptor? The spectral flux is not a high-level descriptor; maybe it is worth testing some high-level descriptors such as entropy and surprise. The choice of the stimulus features defines linear models such as the TRF as they determine the hierarchical level of auditory processing, and for testing the musical expertise, we might need more than acoustic features. The authors should elaborate more on this point.

      It is true that the Goldsmith Music Sophistication Index serves as an alternative way of investigating the effects of musical expertise on neural synchronization to natural music, and we now include this approach exclusively instead of dividing our sample (see response to the previous comment). Indeed, if musical sophistication would have an effect on the TRF correlations in this study, we would see a regression line in Figure 5 – figure supplement 2. Based on our experiment it is difficult to assess whether the lack of a correlation between neural measures and musical expertise is based on our choice of stimulus features. That is because our experiment was designed to investigate the effects of fundamental acoustic features of music, and it was not possible to calculate high-level descriptors, such as the entropy or surprisal, for the music stimuli we chose to work with – the stimuli were polyphonic, and moreover were purchased in a .wav format, so we do not have access to the individual MIDI versions or sheet music of each song that would have been necessary to apply, for example, the IDyOM (Information Dynamics of Music) model. As we cannot rule out that the (lack of) effects of varying levels of musical expertise on TRF correlations is due to our choice of stimulus features, we added this to the discussion.

      p. 28 l. 541-546: “Another potential reason for the lack of difference between musicians and non-musicians in the current study could originate from the choice of utilizing pure acoustic audio-descriptors as opposed to “higher order” musical features. However, “higher order” features such as surprise or entropy that have been shown to be influenced by musical expertise (Di Liberto et al., 2020), are difficult to compute for natural, polyphonic music.”

      4) Regarding the stimulus representation, I have a few points. The authors say that the amplitude envelope is a too limited representation for music stimuli. However, before testing the spectral flux, why not test the spectrogram as in previous studies? Moreover, the authors tested the TRF on combining all features, but it was not clear how they combined the features.

      One of the main reasons that we did not use the spectrogram as a feature was that it wouldn’t be possible to use a two-dimensional representation for the RCA-based measures, SRCorr and SRCoh, so we would not have been able to compare across analysis approaches. However, spectral flux is calculated directly from the spectrogram, and so is a useful one-dimensional measure that captures the spectro-temporal fluctuations present in the spectrogram (https://musicinformationretrieval.com/novelty_functions.html). Thank you for making this important point, we added this explanation to the Materials and Methods section (p. 35 l. 726-727).

      Sorry for not explaining the multivariate TRF approach better. Instead of using only one stimulus feature, e. g. the amplitude envelope, several stimulus features can be concatenated into a matrix (with the dimensions: time T x 4 musical features M at different time lags), which is then used as an input for the mTRFcrossval, mTRFtrain and mTRFpredict of the mTRF Matlab Toolbox (Crosse et al., 2016) – actually this is exactly how using a 2D feature like the spectrogram would work. The multivariate TRF is calculated by extending the stimulus lag matrix (time course of one musical feature at different time lags, T × τwindow) by an additional dimension (time course of several musical features at different time lags, T × M x τwindow). We added an explanation to the Methods section of the manuscript and hope that it is this way better understandable:

      p. 39 l. 840-842: “For the multivariate TRF approach, the stimulus features were combined by replacing the single time-lag vector by several time-lag vectors for every musical feature (Time x 4 musical features at different time lags).”

      Reviewer #3 (Public Review):

      Subjects listened to various excerpts from music recordings that were designed to cover musical tempi ranging from 1-4 Hz, and EEG was recorded as subjects listened to these excerpts. The main and novel findings of the study were: 1) spectral flux, measuring sudden changes in frequency, were tracked better in the EEG than other measures of fluctuations in amplitude, 2) neural tracking seemed to be best for the slowest tempi, 3) measures of neural tracking were higher when subject's rated an excerpt as high for ease-of-tapping and familiarity, and 4) their measure of the mapping between stimulus feature and response could predict whether a subject tapped at the expected tempo or at 2x the expected tempo after listening to the musical excerpt.

      One of the key strengths of this study is the use of novel methodologies. The authors in this study used natural and digitally manipulated music covering a wide range of tempi, which is unique to studies of musical beat tracking. They also included both measures of stimulus-response correlation and phase coherence along with a method of linear modeling (the temporal response function, or TRF) in order to quantify the strength of tracking, showing that they produce correlated results. Lastly, and perhaps most importantly, they also had subjects tap along with the music after listening to the full excerpt. While having a measure of tapping rate itself is not new, combined with their other measures they were able to demonstrate that neural data predicted the hierarchical level of tapping rate, opening up opportunities to study the relationship between neural tracking, musical features, and a subject's inferred metrical level of the musical beat.

      Additionally, the finding that spectral flux produced the best correlations with the EEG data is an important one. Many studies have focused primarily on the envelope (amplitude fluctuations) when quantifying neural tracking of continuous sounds, but this study shows that, for music at least, spectral flux may add information that is tracked by the EEG. However, given that it is also highly correlated with the envelope, what additional features spectral flux contributes to measuring EEG tracking is not clear from the current results and worth further study.

      All four of their main findings are important for research into the neural coding of musical rhythm. I have some concerns, however, that two of these findings could be a consequence of the methods used, and one could be explained by related correlations to acoustic features:

      We thank the Reviewer for the very helpful review, the summary, and the great suggestions. We addressed the comments and performed additional analysis. We made changes throughout the manuscript, but especially 1) concerning the potential advantage of the neural response to slower music, 2) the effects of the amount of tempo manipulation on neural synchronization, 3) the SVM-related analysis and 4) the relation between stimulus features and behavioral ratings. The implemented modifications can be found below in more detail. The page and line numbers correspond to the manuscript file without track changes.

      The authors found that their measures of neural tracking were highest for the lowest musical tempos. This is interesting, but it is also possible that this is a consequence of lower frequencies producing a large spread of correlations. Imagine two signals that are fluctuating in time with a similar pattern of fluctuation. When they are correctly-aligned they are correlated with each other, but if you shift one of the signals in time those fluctuations are mismatched and you can end up with zero or negative correlations. Now imagine making those fluctuations much slower. If you use the same time shifts as before, the signals will still be fairly correlated, because the rates of signal change are much longer. As a result, the span of null correlations also increases. This can be corrected by normalizing the true correlations and prediction accuracies with a null distribution at each tempo. But with this in mind, it is hard to conclude if the greater correlations found for lower musical tempos in their current form are a true effect.

      Thank you for this great suggestion. We followed your lead (Zuk et al., 2021), and normalized all measures of neural synchronization (TRF correlation, SRCorr, SRCoh) relative to a surrogate distribution. The surrogate distribution was calculated by randomly and circularly shifting the neural data relative to the musical features for each of 50 iterations. This was done separately for every musical feature and stimulation tempo condition (Figures 2 and 3). After normalization, the results look qualitatively similar and the main results – spectral flux and slow stimulation tempi resulting in highest levels of neural synchronization – persist.

      The changes in the manuscript based on your comment (and the comment of Reviewer 1) can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section:

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      If the strength of neural tracking at low tempos is a true effect, it is worth noting that the original tempi for the music clips span 1 - 2.5 Hz (Supplementary Table 1), roughly the range of tempi exhibiting the largest prediction accuracies and correlations. All tempos above this range are produced by digitally manipulating the music. It is possible that the neural tracking measures are higher for music without any digital manipulations rather than reflecting the strength of tracking at various tempi. This could also be related to the author's finding that neural tracking was better for more familiar excerpts. This alternative interpretation should be acknowledged and mentioned in the discussion.

      Thank you for these important suggestions (see also comment #2 (part 2) from Reviewer 1). First up, it is important to say that all music stimuli were tempo manipulated: even if the tempo of an original music segment was e. g. 2 Hz and the same song was presented at 2 Hz, it was still converted via the MAX patch to 2 Hz again (to make it comparable to the other musical stimuli). Second, it is true that we cannot fully exclude the possibility that the amount of tempo manipulation could have an effect on neural synchronization to music – meaning that less tempo manipulated music segments (so a stimulation tempo close to the original tempo) could result in higher neural synchronization. However, we have now conducted an additional analysis to address this as best we could.

      We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulation tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 1 comments, we also added it to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempi of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of tempo manipulation on other tempo conditions.”

      We also provide more information to the reader about the amount of tempo shift that each stimulus underwent. We added two plots to the manuscript that show 1) the distribution of original tempi of the music stimuli and 2) the distribution of the amount of tempo manipulation across all stimuli (Figure 1 – figure supplement 2).

      Their last finding regarding predicting tapping rates is novel and important, and the model they use to make those predictions does well. But I am concerned by how well it performs (Figure 6), since it is not clear what features of the TRF are being used to produce this discrimination. Are the effects producing discriminable tapping rates and stimulation tempi apparent in the TRF? I noticed, though, that these results came from two stages of modeling: TRFs were first fit to groups of excerpts with different tapping rates or stimulation tempo separately, then a support vector machine (SVM) was used to discriminate between the two groups. So, another way to think about this pipeline is that two response models (TRFs) were generated for the separate groups, and the SVM finds a way of differentiating between them. There is no indication about what features of the TRFs the SVM is using, and it is possible this is overfitting. Firstly, I think it needs to be clearer how the TRFs are being computed from individual trials. Secondly, the authors construct surrogate data by shuffling labels (before training) but it is not clear at which training stage this is performed. They can correct for possible issues of overfitting by comparing to surrogate data where shuffling happens before the TRF computation, if this wasn't done already.

      Thank you for noticing this important point. You are absolutely right – when re-analyzing that part of the results based on your comment, we noticed that we had an error in our understanding of the analysis pipeline. Indeed, we first calculated two TRF models for the separate groups (e. g. stimulation tempo = tapping tempo vs. stimulation tempo = 2* tapping tempo) based on all trials of each group apart from the left-out-trial. Next, the resulting TRFs were fed into the SVM which was used to predict the group. The shuffling of the surrogate data occurred at the SVM training step.

      Based on your comment, we tried several approaches to solve this problem. First, we calculated TRFs on a single-trial basis (instead of using the two-group TRFs as before, only one trial was used to calculate the TRFs) and submitted the resulting TRFs to the SVM. The resulting SVM accuracy was compared to a “surrogate SVM accuracy” which was calculated based on shuffling the labels when training the SVM classifier. Second, we shuffled, as you suggest, the labels not at the SVM training step, but instead prior to the TRF calculation. This way we could compare our “original” SVM accuracies (based on the two-group TRFs) to a fairer surrogate dataset. However, in both cases the resulting SVM accuracies did not perform better than the surrogate data. Therefore, we felt that it is the fairest to remove this part from the manuscript. We are aware that this was one of the main results of the paper and we are sorry that we had to remove it. However, we feel that our paper is still strong and offers a variety of different results that are important for the auditory neuroscience community.

      Lastly, they show that their measures of neural tracking are larger for music with high familiarity and high ease-of-tapping. I expect these qualitative ratings could be a consequence of acoustic features that produce better EEG correlations and prediction accuracies, especially ease-of-tapping. For example, music with acoustically-salient events are probably easier to tap to and would produce better EEG correlations and prediction accuracies, hence why ease-of-tapping is correlated with the measures of neural tracking. To understand this better, it would be useful to see how the stimulus features correlate with each of these behavioral ratings.

      We agree that our rating-based results could be influenced by acoustic stimulus features (at least for ease of tapping, it’s actually not clear to us why familiarity would be related to acoustics). As it is difficult to correlate stimulus features (time-domain, and one time course per song) with behavioral ratings (one single value per song per participant), we conducted frequency-domain analysis on the musical features to arrive at a single value quantifying the strength of spectral flux at the stimulation frequency and its first harmonic. We calculated single-trial FFTs on the spectral flux (which was used for the main Figure 5) for the 15 highest- and 15 lowest-rated trials per behavioral category (enjoyment, familiarity, ease to tap the beat) and participant. We compared the z-scored FFT peaks at the stimulation tempo and first harmonic for the top- and bottom-rated stimuli. We did observe significant acoustic differences between top- and bottom-rated stimuli in each category, but the differences were not in the direction that would be expected based on acoustically more salient events leading to better TRF correlations, with the exception of ease of tapping. Easy-to-tap music did indeed have stronger spectral flux than difficult-to-tap music, which is intuitive. However, spectral flux was stronger for more enjoyed music (we did not see any significant differences between TRF correlations of more vs. less enjoyed music; Figure 5C) and for less familiar music (this is the opposite of what we saw for the TRF measures). Overall, given the inconsistent relationship between acoustics, behavioral ratings, and TRF measures, we would argue that acoustic features alone cannot solely explain our results (Figure 5 – figure supplement 1, p. 21 l. 381 – 387).

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第一部分,内容主要是:

      1.Paul Penfield的开幕致辞。

      2.Andy van Dam介绍万尼瓦尔·布什及其经历。

      3.Paul Kahn(Memex专家):布什作品的视觉之旅(Visual tour of Bush's work)。

      4.道格拉斯·恩格尔巴特(Douglas Engelbart):对集体智慧的战略追求(The Strategic Pursuit of Collective IQ)。内容介绍:对我来说,布什在《诚如我思》中留下的遗产直接关系到提高人类组织所代表的社会有机体的集体智慧的非常真实和重要的潜力。最认真和有效地追求这种潜力的公司、机构--实际上是国家--显然会有强大的成功/生存优势。除此之外,整个人类能否在一个健康和 "人性化 "的社会、政治、经济和生态环境中生存,很可能取决于我们如何尽快和有效地明确追求这一潜力。

      认真的追求将涉及到我们思考方式的许多变化,与 "我们工作方式"的许多同步变化相协调--以及我们可以合作、分享、扮演新的角色、行使新的/不同的技能和方法集,等等。简而言之,这将涉及到将人类的基本感觉、运动、精神和学习能力与集体开发、整合和应用知识的任务相结合的根本性新方法。

      有效的追求将需要一种战略方法,其接受程度肯定会涉及到一些普遍存在的范式的关键转变。我想描述一下它们,以及它们在追求大规模集体智商显著提高的候选"引导"战略中的相对作用。

      技术只是该战略中的一个重要因素,在这个因素中,关键是要加快开放的超文件系统的发展,要有适当的通用功能、应用领域、互操作性和可扩展性的目标。WWW/HTML的激动人心的出现提供了一个极其重要的推动力;我想描述一下下一阶段向OHS目标演变的一些候选者。

      5.泰德·尼尔森(Theodor Holm Nelson):小路通向何方(Where the Trail Leads)。内容介绍:像任何简洁的预言作品一样,《诚如我思》支持许多解释,并导致推断的问题。我们今天聚集在一起表示敬意,并争论谁的想法最忠实地表达了最初所说的内容。

      布什预见到了一个可公开访问的、快速访问的连接性文献,这将允许人们发表已经存在的材料之间的连接。但他所预见的结构,即他所称的"线索",与今天的意大利面条式的超文本相当不同;布什的结构是基于转包(transclusion)而不是链接。它值得详细研究。

      经过适当的推断和打磨,我相信这个想法会导致跨平行媒体(连接的对象与它们的连接一起被看到),以及设计一个广泛的版权安排,以便不受约束地重新使用。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第三部分,内容主要是:1.迈克·莱斯克(Michael Lesk)发表的演讲:“信息检索的七个时代”(The Seven Ages of Information Retrieval)。内容介绍:万尼瓦尔·布什(Vannevar Bush)在1945年的文章中提出了一个快速获取世界图书馆内容的目标,看起来它将在65年后的2010年实现。因此,它的历史堪比一个人的历史。信息检索在20世纪50年代和60年代初有其学生时代的研究阶段;然后在20世纪70年代努力争取采用,但在20世纪80年代和90年代,随着自由文本检索系统的常规使用,它已被接受。例如,我的公司不再用纸印刷其公司电话簿。现在,它正在继续前进,开展声音和图像检索项目,同时以电子方式提供现在图书馆中的大部分内容。我们可以期待着布什的梦想在一个生命周期内完成。2.第 1 天小组讨论。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第二部分,内容主要是:

      1.罗伯特·卡恩(Robert Kahn)发表的演讲:“用数字技术增强布什的愿景”(Augmenting Bush's Vision with Digital Technology)。内容介绍:尽管万尼瓦尔·布什(Vannevar Bush)在他的经典论文《诚如所思》(As We May Think)中描述了信息共享的重要性,但他的视野必然受到当时技术的限制。特别是,我们现在认为理所当然的数字计算和通信技术甚至还没有进入他的参考框架。本讲座将探讨计算机和通信基础设施的可能演变,以及架构、技术和智能在该系统中的作用。连通性以及几乎无限的数字对象、通用服务和应用将刺激网络中的思想共享、各种联合活动、虚拟实体和团队工作。在分布式任务执行的背景下,网络内和网络外的软件代理的作用将被考虑。最后,将对智能分布式系统的前景进行探讨。

      2.蒂姆·伯纳斯-李(Tim Berners-Lee)发表的演讲:“超文本和我们的集体命运”(Hypertext and Our Collective Destiny)。内容介绍:布什考虑到研究人员被无法获取的信息所淹没的困境。他提出了MEMEX,一种可以快速访问并允许信息片段之间随机链接的机器。此后,网络和计算机使我们在速度和便利性方面超过了这个带有远见的设想。然而,我们在解决政治问题、管理大型组织或放大我们的团体直觉的能力方面没有看到巨大的进步。 我们必须做得更多,而不是赋予个人权力。我们必须让一起互动的人和机器以新的方式作为一个群体来行事。现在,我们可以通过我们的信息制造线索,我们必须创造一个基质,在这个基质中,这些线索将成长为一个越来越有意义的整体,而不是一个纠结的群体。我们和我们的文件能够作为一个大型机器一起运作,但不是作为一个大型的头脑。各种规模的团体都必须获得直觉、关联和发明的天赋,这些天赋我们通常与人而不是机器联系在一起,然后我们才能迎接布什对人类的挑战,"在种族经验的智慧中成长",而不是 "在冲突中灭亡"。

  3. Jun 2022
    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to consider more the role of the predator in predator-prey interactions, particularly from a collective locomotion aspect. This is an aspect which at times has been overlooked, with many theories, experiments and models focusing largely on the prey response, independent of how the predator behaves. The major strengths are the (1) excellent writing, (2) quality of the figures, (3) quantity of data, and (4) question tackled. The major weaknesses are (1) the volume of information (as a reader, it is quite hard to distil key points from the sheer volume of what has been presented), (2) the confined captive environment making it difficult to draw comparisons with a wild-type scenario, and (3) lack of clarity about the wider implications of the work outside of the immediate field.

      We thank the reviewer for their thoughtful review and positive comments. To address the weaknesses highlighted by the reviewer, we have revised our manuscript throughout.

      Reviewer #2 (Public Review):

      The manuscript describes a laboratory-based predator-prey experiment in which pike hunt shiner fish as a way to gain insight into the selective pressures driving the evolution of collective behavior. Unlike the predictions of classical theoretical work in which prey on the edge of social groups are considered to be at highest risk of predation, the fish in the center of the school were primarily targeted by the pike. This is because the pike uses a hunting behavior in which it slowly moves to the center of the school, seemingly undetected, until it rapidly attacks prey directly in front of its snout. This study also differs from previous studies in that both the predator and prey motion are examined, and the success of predation attempts was precisely determined. While the study demonstrates why shiners would be under selective pressure to avoid the center of a school, I am not convinced that the results explain why shiners evolved to have schooling behavior.

      The reviewer indeed highlights one of the main findings of our study, that fish closer to the group center are more at risk of being attacked by pike. They also give a proper account of its possible explanation, and highlight some of the main ways in which our study differs from previous work. The reviewer states that our results do not explain why shiners evolved to school. We agree and note that we also don’t claim this anywhere in the manuscript. Rather, we state our study provides important new insights about differential predation risk in groups of prey and highlight the important role of predator attack strategy and decision-making and prey response, with potential repercussions for the costs and benefits of grouping.

      We have considerably revised our introduction to better explain the importance of understanding differential predation risk in animal groups (lines 36-50): A key challenge in the life of most animals is to avoid being eaten. Via effects such as enhanced predator detection (Lima, 1995; Magurran et al., 1985), predator confusion (Landeau and Terborgh, 1986), and risk dilution effects (Foster and Treherne, 1981; Turner and Pitcher, 1986), individuals living and moving in groups can reduce their risk of predation (Ioannou et al., 2012; Krause and Ruxton, 2002; Pitcher and Parrish, 1993; Ward and Webster, 2016). This helps explain why strong predation pressure is known to drive the formation of larger and more cohesive groups (Beauchamp, 2004; Krause and Ruxton, 2002; B. Seghers, 1974). However, the costs and benefits of grouping are not shared equally among individuals within groups, and besides differential food intake and costs of locomotion, group members themselves may experience widely varying risks of predation (Handegard et al., 2012; Krause, 1994; Krause and Ruxton, 2002). Where and who predators attack within groups not only has major implications for the selection of individual phenotypes, and thereby the emergence of collective behaviour and the functioning of animal groups (Farine et al., 2015; Jolles et al., 2020; Ward and Webster, 2016), but also shapes the social behaviour of prey and the properties and structure of prey groups. Hence, a better understanding of the factors that influence predation risk within animal groups is of fundamental importance.

      And in the discussion now better explain the potential evolutionary consequences of the findings of our work (lines 456-466): Predation is seen as one of the main factors to shape the collective properties of animal groups (Herbert-Read et al., 2017) and has so far generally been seen as to drive the formation of larger, more cohesive groups that exhibit collective, coordinated motion (see e.g. Beauchamp, 2004; Ioannou et al., 2012; B. H. Seghers, 1974). Our finding that central individuals are more at risk of being predated could actually have the opposite effect, with schooling having a selective disadvantage and over time result in weaker collective behaviour and less cohesive schools. However, we do not deem this likely as selection is likely to be group-size dependent, as discussed above. Furthermore, our multi-model inference approach revealed that, despite more central individuals experiencing higher predation risk, being close to others inside the school was still associated with a lower risk of being targeted. As most prey experience many types of predators, including sit-and-wait predators and active predators that hunt for prey, the extent and direction of such selection effects will depend on the broader predation landscape in which prey find themselves.

      Major strengths of the paper include the precise recording of the location and orientation of all fish at all times during the experiments. This indeed provides a rich dataset that can be used to search for the factors that predict the likelihood of attack and escape with higher statistical power.

      The major concern I have about the manuscript is that the results somewhat contradict the aim of the paper as expressed in the introduction and discussion: that predator-prey interactions explain the emergent evolution of collective behavior. Figure 2C shows that fish in smaller clusters or those that were totally isolated experienced lower rates of predation and were not included in any subsequent analyses. This would suggest that shiners experiencing predation from pike would be under strong selection to avoid schooling behavior altogether. Can you compare the likelihood of predation for individuals in non-central school locations compared to individuals outside of schools altogether? It might be helpful to investigate whether other predators of shiners use predation strategies that target prey on the edge of the school to help explain why schooling could be useful. Did the likelihood of schooling decrease throughout the trials?

      The reviewer makes a good point regarding the observation that pike tended to mainly attack individuals in the main school, questioning if this would result in a selective disadvantage for schooling. We would like to point out that this result is regarding the likelihood to attack an individual, not the likelihood for a successful attack. If we look at the later we find 5 out of 8 attacks away from the main school were successful, a ratio that is actually similar to that of the main school. More importantly, when wanting to understand how predation risk is linked to group size one needs to look at the per capita risk. If we do that for the group size we used in our study, despite a moderately elevated risk of being predated in a large group, the shiners in the main school still had considerably lower individual risk to be killed than those that occurred in small sub-groups or were alone. We would like to note that in our study the shiners did not really show proper fission-fusion behaviour and by far the majority of the time the shiners were in one large cohesive school. Therefore, we feel our dataset is not suitable for a proper investigation about the role of group size in predation risk.

      We now clarify these points in the discussion (lines 467-471): While the finding that pike were more likely to attack the main school may also appear to indicate a selective disadvantage to school, calculating the per-capita-risk for each individual would actually reveal it is still safest to be part of the main school. Nevertheless, as the shiners in our study rarely exhibited fission-fusion dynamics we feel our dataset is not appropriate to make proper inferences about how predation risk is linked to group size.

      We have also slightly extended the relevant sentences in the results to further clarify the clustering results (lines 144-150): We found that, by and large, the shiners were organised in one large, cohesive school at the time of attack and rarely showed fission-fusion behaviour (merging and splitting of schools) during the trials. Only occasionally there were one or two singletons besides the main school (25 attacks) or multiple clusters of more than two fish (12 attacks Figure 2C), which tended to exist relatively briefly (mean school size: 36.5 ± 0.8). In more than 80% of these cases, pike still targeted an individual in the main cluster (Figure 2C).

      We now also provide more discussion about other predator types being likely to attack central prey (lines 343-354): That predators may actually enter groups and strike at central individuals is not often considered (Hirsch and Morrell, 2011), possibly because it contrasts with the long-standing idea that predation risk is higher on the edge of animal groups (Duffield and Ioannou, 2017; Krause, 1994; Krause and Ruxton, 2002; Stankowich, 2003). However, our finding is in line with the predictions of theoretical work that suggest that the extent of marginal predation may depend on attack strategy and declines with the distance from which the predator attacks (Hirsch and Morrell, 2011). Furthermore, increased risk of individuals near the centre of groups may be more widespread than currently thought. Predators not only exhibit stealthy behavioural tactics that enable them to approach and attack central individuals, as we show here, but may also do so by attacking groups from above (Brunton, 1997) or below (Clua and Grosvalet, 2001; Hobson, 1963; but see Romey et al., 2008), and by rushing into the main body of the group (Handegard et al., 2012; Hobson, 1963; Parrish et al., 1989).

      We furthermore discuss the potential role of group size on the observed effects (lines 441-455): In particular, while group size is not expected to effect much whether ambush predators are likely to attack internal individuals, the specific risk of central individuals could both be hypothesized to decrease with group size, such as if the predator is more likely to attack when surrounded by prey, or to not be affected by it, such as if the predator actively targets central individuals. Whatever the process, the observed findings are likely for prey that move in groups of somewhat intermediate size; for very large groups, such as the huge schools encountered in the pelagic, ambush predators may simply not be able to attack the group centre due to spatial constraints. More generally, the tendency for predators to attack the centre of moving groups may depend on the medium in which the predator-prey interactions occur. As in the air there is potential for (fatal) collisions, and on land it is physically difficult for predators to enter groups and predators’ size advantage tends to be more limited, predators may be less likely to go for the group centre as compared to in aquatic or mixed (e.g. aerial predator hunting aquatic prey) systems. Hence, the important interplay we highlight between predator attack strategy and prey response may have different implications across different predator prey systems and warrants concerted further research effort.

      Finally, in response to the reviewer’s question if the likelihood to school decreased through the trials, we did not see a change in packing faction (median nearest-neighbour distance) with repeated exposure to the pike, but shiners increasingly avoided the area directly in front of the pike’s head (lines 182-186): While the shiners did not show a change in their packing fraction (median nearest-neighbour distance) with repeated exposure to the pike (F1,52 = 1.81, p = 0.185), they increasingly avoided the area directly in front of the pike’s head (Appendix 2 – Figure 1A) resulting in the pike attacking from increasingly further away (target distance: F1,52 = 45.52, p < 0.001, see Appendix 2 – Figure 1B,C). See also further Appendix 2.

      I am also curious whether tank size affects the behavior of the fish, both of the shiners and the pike. The pike seem to be approximately 1/3 the shortest length of the tank, and 6 inches of depth have constrained the movement to be mostly in the 2D plane. A lack of open space might limit the pike's ability to hunt in any way other than this stealthy strategy. Has this stealthy hunting strategy been described in other experiments in larger or more naturalistic conditions? Does open space affect the shiners' propensity to school? Although the manuscript describes that shiners tend to school near the surface of water, does the shallow depth affect the pike's behavior? The manuscript states that some pike never attacked -- were these the largest in the study?

      While the tank is small relative to the real world, we actually decided on this size of ~2m2 based on previous experimental work on predator-prey dynamics. As we stated in the methods of the original manuscript (lines 543-545) we expect that if a much larger space would have been used, pike would actually still show the same approach and attack behaviour linked to their stealthy attack strategy. The stealthy hunting behaviour of pike and similar predators and their ability to thereby get very close to their prey has been described elsewhere (see e.g. references on lines 332-344 of the original manuscript).

      We now better explain the potential limitation of the arena size in the discussion (lines 472-480): Laboratory studies on predator-prey dynamics like ours do, of course, have their limitations. Although the size of the arena we used (~2m2) is in line with behavioural studies with large schools of fish (e.g. Sosna et al., 2019; Strandburg-Peshkin et al., 2013) and experiments with live predators attacking schooling prey (Bumann et al., 1997; Magurran and Pitcher, 1987; Neill and Cullen, 1974; Romenskyy et al., 2020; Theodorakis, 1989), compared to conditions in the wild the prey and predator had limited space to move. However, as pike are ambush predators they tend to move relatively little to search for prey and rather rely on prey movement for encounters (Nilsson and Eklöv, 2008). Increasing tank size would have made effective tracking extremely difficult, or impossible, and while a much larger tank is expected to considerably increase latency to attack, we expect it to have relatively little effect on the observed findings.

      We agree that the shallow depth of the tank is a limitation of our study and may have somewhat restricted the pikes’ natural behaviour, although pilot experiments showed that the pike exhibited normal movements and attack behaviours. Fish were tested in very shallow water to be able to acquire detailed individual-based tracking of the schools as well as compute features related to the visual field of the fish. We would also like to note that both shiners and pike can often be found in the littoral zone and come in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018), with some experimental work furthermore showing that pike may actually prefer shallow water (Hawkins et al., 2005). We don’t think that increasing the depth of the tank would have considerably changed the predatory behaviour of the pike, as the pike would be expected to still use their stealthy approach to get close to their prey even if the prey school would be more three-dimensional.

      We now provide a much more extensive discussion of the limited depth used in the discussion (lines 480-494): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. Shiners naturally school in very shallow water conditions as well as near the surface in deeper water in the wild (Hall et al., 1979; Krause et al., 2000b; Stone et al., 2016) and also pike primarily occur in the shallow littoral zone, sometimes only a few of tens of cm deep (Pierce et al., 2013; Skov et al., 2018). Furthermore, pilot experiment showed the pike did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). Recent other work on predator-prey dynamics did not find a considerable impact of adding the third dimension to their analyses (Romenskyy et al., 2020). Still, the water depth used is a limiting factor of our study and in the future this type of work should be extended to deeper water while still keeping track of individual identities over time. We expect that adding the third dimension would not change the stealthy attack behaviour of the pike and therefore still put more central individuals most at risk, but possibly attack success would be reduced because of increased predator visibility and prey escape potential in the vertical plane, which remains to be tested.

      We did not observe a relationship between pike size and tendency to attack.

      Reviewer #3 (Public Review):

      While it has long been clear that animals in groups (e.g., fish schools) benefit in terms of safety in numbers, there has also been a keen interest in which animals in the group are at higher versus lower risk (e.g., those in front, or along the edges) and how that might depend on the predator's attack strategy. This study addresses these important predator-prey details using a common predatory fish (northern Pike) attacking schools of prey fish (golden shiners). A strength of the study is that it uses cutting-edge video tracking and computational/statistical methods that allow it to quantify and follow each fish's (1 predator and 40 prey in a group) spatial position, relative spacing, orientation and even each individual's visual field and movement throughout each of 125 attacks. Most (70%) of these attacks were successful, but many were not. The variation in attack success allowed the investigators to do statistical analyses to identify key predator and prey behaviors that are associated with successful vs. unsuccessful attacks.

      The study yielded numerous interesting insights. While conventional wisdom pictures predators initiating an attack from outside of the group thus putting individuals at the group's edge at greatest risk, this study found that pike typically approached the school of prey headon both in terms of the group's orientation and direction of movement, and often stealthily moved within the group before initiating an attack. To understand which prey individual was targeted by the predator, the highly quantitative video analyses examined 11 measures of each individual prey's position and orientation at the time that the pike initiated its attack. Of course, pike showed a strong tendency to target one of the 3 closest prey, particularly prey that were more or less directly in front of the pike. However, contrary to conventional wisdom, the analysis showed that targeted prey were closer to the center than the edge, and that an individual's position and orientation relative to other nearby prey also played an important role in whether it might be targeted by the predator. Not surprisingly, analyses showed that targeted prey were more likely to escape if they were further from the predator's head and if they exhibited higher maximum acceleration. Interestingly, during the actual strike, on average, the predator accelerated to a speed about 50% faster than the velocity of the targeted prey.

      A limitation of the study (that the authors describe and discuss) is that it was conducted in a tank with no spatial refuges whereas in nature, pike are often found in areas with vegetation, and schools of prey can often potentially respond to the presence of a predator by moving towards refuge (e.g., vegetation). Also, the study was done in very shallow water (6 cm) -- likely shallower than many, if not most, natural predator-prey interactions for these species. In deeper water, the predator-prey interaction might be better analyzed in three dimensions (i.e., also accounting for variation in vertical height in the water), though the authors argue that this conventional idea is not necessarily true.

      Overall, this study provides an impressive example of the use of modern technology and statistical analyses allows us to better describe and understand the fine-scale behaviors that affect an interaction of high importance for ecology and evolution.

      We thank the reviewer for the care and attention put in their review and their detailed objective assessment of our study.

      Regarding refuge use, it is true that in the wild pike are often found in areas with vegetation, but it is actually predominantly younger pike seeking refuge among vegetation from predators themselves, including from cannibalism by larger pike (see Skov & Lucas, 2018 Chapter 5). Vegetation is also used by pike as background camouflage rather than a refuge per se, but due to their elongated body and narrow frontal body pike are able to approach and ambush prey when no vegetation is available, as we show in our study. During pilot experiments we did provide pike with refuges, but as they never used them, and it would provide a hiding place for hiding, which would have considerably impacted our ability to investigate predation risk within the schools, no refuges were provided during the experiment.

      We now added an explanation about not using refuges in the discussion (lines 495502): For our experiments we used a testing arena without any internal structures such as refuges. This was a strategic decision as providing a more complex environment would have impacted the ability of the shiners to school in large groups and would have led fish to hide under cover. Although studying predator-prey dynamics in more complex environments would be interesting in its own regard, it would not have allowed us to study the questions we are interested in about the predation risk of free-schooling prey. Furthermore, pilot experiments indicated that the pike never used refuges (consistent with previous work, see Turesson and Brönmark, 2004), so they were not further provided during the actual experiment.

      Regarding the shallow depth of the tank, we now better acknowledge this limitation and explain our reasoning (lines 480-482): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. We would also like to note that both shiners and pike spent a lot of their life in the littoral zone and occur in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018). Although the limited vertical space may have restricted the pikes’ natural behaviour to some extent, they did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). We now better discuss the limitation of the shallow depth used in the discussion on lines 477-494 (see also our responses above).

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, He and collaborators analyse eight samples from six patients with acral melanoma through single-cell RNA sequencing. They describe the tumour microenvironment in these tumours, including descriptions of interactions among distinct cell types and potential biomarkers. I believe the work is thoroughly done, but I have identified a few concerns in their depiction and interpretation of their results.

      Strengths:

      1) One of the few available single-cell studies of acral melanoma, including a non-European cohort of patients.

      2) Data will be very useful to study the immune landscape of these rare tumours.

      3) Data include adjacent tissue, primary tumours and a metastatic sample, covering all disease stages.

      4) Analyses seem to be carefully done.

      Things to improve:

      1) Figures need much more description to be understandable, in particular, axes should be clearly labeled and the colour code should be specified

      Thank you for your generous comments and suggestions. We have improved the integrity of some figures and added some figure legends. I believe this will further improve the quality of our manuscript.

      2) In some places, I would recommend the authors soften their interpretation of their analyses (for example, when they suggest targeting TNFRSF9+ T cells as a novel therapy), as these are nearly all bioinformatic in a small number of samples

      As for the conclusions of TNFRSF9, we indeed provided a possibility that TNFRSF9 may serve as a novel therapy. We made some changes to soften the statement. In addition, we have added instructions and explanations in the Discussion section.

      3) I don't think the experiments add much to the literature, as these test already known oncogenes on a common, non-acral melanoma cell line. Thanks for your comments regarding the experiments included in our study. We have pointed out this deficiency in the Discussion section, and made some experimental changes. For example, we have removed the TWIST1-related experiments from the main Results section and shown them only as non-focus work in the Supplementary Figure.

      It is difficult for us to obtain AM cell lines. No commercial AM cell lines can be purchased in ATCC or ECACC. AM cell lines are more difficult to establish and there are few reports on methods for establishing primary acral melanoma cell cultures (PMID: 22578220, PMID: 17488338). Some Japanese and Chinese researchers have isolated the primary generation of AM cells (e.g., PMID: 17488338, PMID: 22578220, PMID: 34097822), but due to the customs policy and the COVID-19 epidemic, we could not receive them within a short period. Moreover, these studies also stated their limitations; namely, that the stability during serial passaging had not been evaluated. Therefore, it may be very time-consuming to obtain operable AM cell lines for functional assays. However, our research group would like to have the opportunity to separate and culture primary cells in subsequent studies, and improve relevant experiments according to your valuable suggestions. Man thanks again for your comments.

      Reviewer #2 (Public Review):

      The study presented by Zan He et al dissects the main interactions between malignant and stromal cells present in acral melanoma samples and in adjacent tissues using single cell RNA sequencing. The study describes factors that allow communication between the different cell types, with a special focus on macrophages, lymphocytes and fibroblasts, along with malignant cells. Factors playing a role in cell-cell communication are identified and suggested to be relevant prognostic makers and/or attractive therapeutic targets.

      Historically, the study of acral melanomas has been neglected due to the low incidence among Europeandescents and this formed an important gap of knowledge in the field and hindered the development of effective therapies to control the disease. Therefore, studies that address this unmet need in melanoma research are very important and should be motivated. This includes singlecell sequencing studies that allow one to study the complexity of tumours, including microenvironment features that influence the development and effectiveness of certain types of treatment. The present study contributes information on how cells interact in the acral melanoma microenvironment and this could be a first step toward better understanding how these interactions influence acral melanoma development, progression, and therapy response.

      However, there are a few points that should be carefully considered. The authors use 3 adjacent tissues (which in theory is composed of normal skin next to a cancer lesion), 4 primary tumor samples, and one lymph node metastasis as a model to study tumor progression. Adjacent tissue is not considered a stage of tumour progression and the sample size is too small to rule out sample-dependent effects. The study is descriptive in nature and could better contextualize the findings regarding what is known for other subtypes of melanomas or other tumours. This is especially important to help readers understand why it would be relevant to study cutaneous melanomas located in acral skin. It would be helpful to explain how different it is from nonacral cutaneous melanoma, and what this study adds compared to other single-cell studies from cutaneous acral and non-acral melanomas.

      Thank you for your generous comments. It is not accurate to represent the adjacent tissue samples as ‘tumour progression’, and our study did not want to focus on the tumour developmental process. We have revised related description in the text. Tumour adjacent tissues (ATs) have always been the focus of research on TMEs. Some studies believe that there are a lot of mutations and clone amplification in normal tissues adjacent to cancer, which may be in a pre-cancerous state (PMID: 33004515), and many single-cell studies of tumours have also sampled and paired para-cancer tissues (e.g., PMID: 29988129; PMID: 35303421).

      The problem of sample size limits the generality of the results, as we pointed out in the Discussion section. Most acral melanoma (AM) patients opt for surgical resection at an early stage to avoid the possibility of metastasis. Hence, we rarely encounter patients with lymph gland (LG) metastases. We only collected one metastatic sample, because it is very rare in clinic. However, the sample has a high quality, such as a high cell activity of single cell suspension after dissociation (95.30%), and a rich amount of tumour cells and other stroma cells. Therefore, we added its sequencing data into the overall analyses, hoping to contribute to the comprehensiveness of resources and research.

      It is important to link this study with the findings regarding what is known for other subtypes of melanomas. We have already supplied the comparison of AMs with non-acral skin cutaneous melanomas (CMs), using the published data. Your comments and advices are entirely helpful to us, and we believe that the current manuscript is more comprehensive and complete.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors estimate growth curves ('nomograms') for hippocampal volume (HV) using Gaussian process regression applied to UK Biobank data and evaluate the influence of polygenic scores for HV on the estimated centile curves. By taking this into account, the centile scores are shifted up or down accordingly. The authors then apply this to the ADNI cohort and show that subjects with dementia mostly lie in the lower centiles, but this does not improve the prediction of transition from mild cognitive impairment to dementia.

      This paper is reasonably well written and the finding that centile curves for different phenotypes are sensitive to genetic features will be of interest to many in the field, albeit perhaps somewhat unsurprising given the polygenic score evaluated here is for the same phenotype under investigation (i.e. HV). I think using centiles derived from nomograms/normative models for precisely assessing both current staging and progression of neurological disorders is a highly promising direction. Regarding this manuscript, I have a few comments about the methodology and interpretation of results, which I will outline below.

      • My most significant concern is that It appears that the assumption of Gaussian residuals is violated by the HV phenotypes that the authors fit their GP to. For example, in figure 2, the distribution is clearly skewed, and the lower centiles -in particular- are poorly fit to the data. First, please provide additional metrics to assess the fit and calibration of these models quantitatively (the latter can be done e.g. via Q-Q plots).

      Thanks for pointing this out. We are sorry for causing this confusion. The skew in the figure appears because the scatter plot overlayed with the GP-generated nomogram is showing ADNI samples of all diagnoses – not the UKB training data used for the GP. The lower centiles are mainly occupied by the participants with AD or MCI (see the new plots in Figure 5). In addition, the healthy subjects from ADNI do indeed fit the model reasonably well. We have added a supplementary figure to show just the healthy subject and have made the following edits in the text to address the confusion:

      Lines 143-149: “Nomograms of healthy subjects generated using the SWA and GPR method displayed similar trends (Figure 2; Supplementary Figure S8). … This extension allowed 86% of all diagnostic groups from the ADNI to be evaluated versus 56% in the SWA Nomograms (Figure 2; Figure 2 – Figure Supplement 2).”

      Lines 159-170 (description of figure 2): “Figure 2: Comparing Nomogram Generation Methods. Nomograms produced from healthy UKB subjects using the sliding window approach (SWA) (red lines) and gaussian process regression (GPR) method (grey lines) … The benefits of this extension can be seen with scatter plots of ADNI subjects of all diagnoses overlayed (E, F… A similar figure with only the Cognitively Normal ADNI subjects can be found in Figure 2 – Figure Supplement 2

      Second, I think if the authors wish to make precise inferences about the centile distribution for the reference model, then the deviation from Gaussianity ought to be accommodated in some manner. There are several options for this, including different noise models (e.g. Gamma, inverse Gamma, SHASH, etc), variable transformation, or quantile regression. One option that could be useful in the context of Gaussian process regression is the use of likelihood warping (see e.g. Fraza et al 2021 Neuroimage and references therein) which was originally developed for GP models. I would recommend the authors pursue one of these routes and provide metrics to properly gauge the fit.

      This is an excellent point. However, we believe that given that the training data indeed follows a Gaussian distribution (see new Figure 4 – Figure Supplement 3; reproduced below) across the relevant strata (sex, PGS) and across age groups, such modifications are not required.

      • Related to the above, it is likely that the selection of subjects with high/low polygenic scores for HV changes the shape of the distribution. It is currently impossible to assess this because no data points are shown in these cases. Please also add this information, along with comparable quantitative metrics to those for the models above.

      Thank you for bringing this up. We have now added a new supplementary figure with the shape of these distributions along with the Shapiro-Wilkens test results for each of them. As can be seen, the Shapiro-Wilkens tests detects mild deviation from Normality in some cases. However, given the size of the strata N>2000 this is not surprising. Moreover, would multiple testing be applied here across the 48 comparisons, then none of the tests would be significant at the corrected threshold (P<0.001).

      • How did the authors handle site effects? There appears to be no adjustment for the fact that the ADNI data are acquired from different sites that were not used during the estimation of the normative models. I would expect to see this dealt with properly (e.g. via fixed or random effects included in the modelling) or at the very least a convincing demonstration that site effects are not clearly biasing the results.

      We agree that site effects are a major issue; we have rerun the application experiments after adjusting the ADNI volumes with NeuroCombat. The results did not change significantly, but we have changed all the reported results with the updated results. In addition, we noted this in the methods section:

      Lines 442-445: Finally, we used NeuroCombat 1 to adjust across ADNI sites and harmonize the volumes with the UKB Dataset. To do this we modelled 58 batches (UKB data as one batch and 57 ADNI sites as separate batches) and added ICV, sex, and diagnosis (assigning all UKB as Healthy and using the diagnosis columns in ADNI) to retain biological variation.

      • How do the authors interpret the finding that the relationship between the polygenic scores and HV is different in the cohorts they consider (i.e. bimodal in UKB and unimodal in ADNI)? Does this call into question the appropriateness of the subsampled model for the clinical cohort?

      While we do see a bimodal distribution in UKB the effect is not very strong as the other reviewers commented. Therefore, we have de-emphasized this aspect. One reason may be that we detect the slightly bimodal aspect in UKB because of greater statistical power due to the large sample size (one order of magnitude). One further aspect is the used SNP data, i.e., differences in genotyping platform and imputation. This is also the reason why integrating PGS directly into the predictive model comes with additional challenges. We have addressed this topic briefly in our discussion: Lines 390-392: “Lastly, a recent study of PGS uncertainty revealed large variance in PGS estimates63, which may undermine PGS based stratification; hence a more sophisticated method of building PGS or stratification may improve results further.”

      • Perhaps the authors can comment on (or better, evaluate) how this genetic shift could be accommodated in normative models (e.g. the possibility of including polygenic risk scores as predictor variables in the normative model). This would remove the need for post hoc adjustment and would allow more precise control over the adjustment than just taking the upper/lower xxx % of the PGS distribution as is done in the current manuscript.

      We agree that integration of the genetics directly into the normative models is a great idea. And this will be the direction we will be exploring in future work. However, PGS themselves are prone to show ‘site’ effects that depend on the genotyping method that was used as well as of the quality of genotyping and imputation. As a consequence, using the ‘raw’ PGS scores in predictive models brings its own challenges. Therefore, we feel that the current framework is simpler at this point and illustrates the potential of PGS when combined with normative models.

      • Related to my point above, it is perhaps unsurprising that the polygenic score for the HV phenotype influences the centile distribution. I think the paper would benefit considerably by also evaluating other polygenic scores (e.g., APOE4 as in some of the prior cited references). it would be interesting to compare the magnitude and shape differences for these adjustments. The authors can consider this an optional suggestion.

      Our rationale for focusing on HV PGS was that we sought to improve the accuracy of the normative model. The genetics influences HV and this is a first attempt to adjust for this in the normative modeling framework. Indeed, APOE-e4 has a sizable effect on HV. However, this is most likely mediated by nascent accelerated neurodegeneration, i.e., Alzheimer’s disease. Thus, in our view focusing on APOE-e4 would mean to focus on a disease effect. We address this issue briefly in the discussion (Lines 326-334). For sensitivity analysis, we did indeed test other PGS, such as AD and Whole-Brain-Volume, and found that these do not affect the normative models for HV.

      Reviewer #3 (Public Review):

      Given the large variation in and high heritability of hippocampus volume in the population, taking out known variation in the healthy population is a nice way of reducing heterogeneity, and a step forward towards using normative models in clinical practice. The dataset the nomograms are based on is large enough to do so even when stratified by polygenic scores for hippocampal volume, and these provide interesting information on the role of genetics in hippocampus volume.

      There are however several concerns regarding the applicability of the models to the ADNI dataset. First, the lack of overlap in the age range between the dataset the model is trained on and the application to subjects that are outside that age range is questionable. The authors prefer Gaussian process regression (GPR) over a sliding window-based approach using the argument that the former allows for predictions in a larger age range but extrapolation beyond the reach of the data is usually not valid. The claim that Supplementary Figure 6 shows accurate extension beyond these limits is in my opinion not justified. If anything, we can be rather certain that the extensive growth of the hippocampus up to age 48 is not realistic (see e.g. Dima et al., 2022).

      As mentioned already in response to reviewer #1, this was a miscommunication on our side. We only used the ADNI samples that were within the age range of the models they were being plotted against. The GPR model did not require smoothing at the edges of the age-range and thus can support a wider age range than the SWA. This is why we stated that the extension of the nomograms enabled more of the ADNI dataset to be used, i.e., because otherwise these samples were outside the range of the model and could not be used.

      We have changed the following lines in the manuscript to make this idea explicit:

      Lines 477-478 (end of GPR methods section): “For both SWM and GPR models, we only tested the ADNI samples that lay within the age range of each model respectively.”

      Regarding the accurate extension claim, we have edited the line (411-412) in the discussion so that it now reads:

      Lines 347-348 “In fact, our GPR model can potentially be extended a few years beyond those limits”

      Thank you for pointing out the discrepancy in the hippocampal growth around 48 with the results by Dima et al. 2022. Although sample sizes between the two studies are similar. The data availability in UKB for ages 45-50 is rather sparse (N<100; see new Figure 4 – Figure Supplement 3). Thus, the observed growth is likely due to under sampling. The growth effect has been observed in other studies using UKB data7,8. We have noted this in the discussion:

      Lines 354-356:” However, there is a possibility that our results suffer from edge effects. For example, we suspect that the peak noted in the male nomogram is likely due to under-sampling in the younger participants.”

      Second, the drop in mean 'percentile' difference between high and low polygenic scoring individuals that if one uses genetically adjusted nomograms seems nice, but this difference is currently just a number and the reader cannot see whether this difference is significant, or clinically relevant.

      We have now provided a new figure (Figure 5) that shows the boxplots behind those numbers. The MCI-to-AD conversion analyses in the ADNI explored the clinical benefit of genetically adjusted nomograms. However, adjusted, and un-adjusted percentiles performed equally well. In the discussion we argue that the MCI stage is already too late and earlier stages may benefit from the increased precision:

      Lines 373-378: “However, despite this sizable effect, genetically adjusted nomograms did not provide additional insight into distinguishing MCI subjects that remained stable or converted to AD. Nonetheless, the added precision may prove more useful in early detection of deviation among CN subjects, for instance in detecting subtle hippocampal volume loss in individuals with presymptomatic neurodegeneration.”

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors show that the turnover of centriole components is necessary for proper centriole maintenance within Drosophila cultured cells (during prologued cell cycle arrest) and within Drosophila oocytes, where centrioles are normally degraded prior to fertilisation. They highlight Ana1 as an important player in centriole maintenance. The authors begin with a candidate screen to identify core centriole proteins that are required to properly maintain centrioles. They then focus on Ana1, given that its depletion had the strongest effect, and show that its depletion leads to a reduction in the levels of centriole components in Drosophila oocytes. They show that the previously observed ability of centriole-targeted Polo to counteract centriole loss depends at least in part on Ana1 and that targeting Ana1 to centrioles also counteracts centriole loss. The authors conclude that Ana1 is a component of the PCM-promoted centriole integrity pathway.

      Major comments

      1. The authors say that Plk4 depletion does not lead to centriole loss, but there are significant differences in centriole number between the control and Plk4 depletion cells in Fig 1F and S1D. Please comment.
      2. One of the main results is that depletion of centriole components leads to a reduction in centrosome numbers when measured 8 days after S-phase arrest. I wonder whether a restriction of centriole duplication could add to this effect? Any cells that were in G2 or M phase when the drugs were added would presumably progress into the following S-phase and duplicate their inherited centrioles, but not if centriole duplication proteins had been depleted. It's true that Plk4 depletion leads to a relatively mild centriole loss phenotype, but can the authors be sure that this is not due to variations in the efficiency of different RNAi constructs? Perhaps the authors can show that Plk4 depletion efficiently prevents centriole duplication under otherwise normal conditions.
      3. The authors show that Ana1 depletion has the strongest effect, but this could in theory be due to differences in RNAi efficiency. I don't expect the authors to show the efficiency of all RNAi constructs, but they could state in the text that this is a caveat e.g. "...although we cannot rule out the possibility that differences in RNAi efficiency lead to the observed differences in severity of phenotype..."
      4. A key conclusion is that core centriole components turnover to some extent and that the incorporation of new molecules is necessary for centriole maintenance. This is a very interesting and important point and so it would be nice to have more direct data to support it. This could be done in different ways, including transfecting fluorescently tagged centriole components after S-phase arrest and showing that some molecules become incorporated into the centrioles, or by performing FRAP experiments. Of course, it is possible that the turnover is so low that the incorporated fluorescent molecules cannot be detected...
      5. The authors show that depletion of Ana1 from oocytes leads to a reduction in the intensity of centriole markers. They do not measure centrosome numbers, as the centrosomes cluster too tightly. The authors therefore can't be certain that Ana1 depletion leads to a reduction in centrosome numbers. The authors could show this by inhibiting centrosome clustering while depleting Ana1. There is a recent BioRxiv paper showing that centrosome clustering can be inhibited by depletion of Kinesin-1.
      6. In Figure 3B the authors show that expression of GFP-Polo-PACT partially rescues the effect of "all PCM" depletion, but this seems strange given that Polo's role is presumably to recruit PCM (which has been depleted). Can the authors comment? Also, it would make sense to test whether GFP-Polo-PACT can rescue centriole loss after the depletion of Ana1 alone (not Ana1 and all PCM). If Ana1 has a role in recruiting Polo (either directly or indirectly), which has been shown previously in mitotic cells, then there should be a rescue to some extent.
      7. In Fig4A,C, the authors say that γ-tubulin levels at centrosomes increase when GFP-Polo is forced onto the centrosomes - the graph seems to show a big increase, but the pictures do not...? Are the authors measuring total levels at all centrosomes? If so, I think they should be measuring the average at individual centrosomes. Also, why is the level of GFP alone not much higher when expressed with GFPnanoPACT (Fig 1B)? Presumably GFP should be recruited to the centrosomes by GFPnanoPACT.
      8. The authors show that tethering Ana1-GFP to the centrioles counteracts centriole loss in oocytes (Fig4G). They say that the centrosomes are most likely inactive because they don't recruit PCM, but they have only looked at γ-tubulin, which is a downstream component of the PCM. I think it is important to check whether Polo is recruited, given that tethering Polo to centrioles also counteracts centriole loss and that a recent paper showed that Ana1 has a role in recruiting Polo to centrosomes (Alvarez-Rodigo et al., 2021). The authors also say that these centrosomes do not organise microtubules but do not show the data.
      9. The authors propose that Ana1 is downstream of the PCM, and so over-expressing Ana1 should at least partially rescue centriole loss after PCM depletion. But I don't really agree with this. If Ana1 relies on the PCM then how would its overexpression manage to rescue the phenotype in the absence of the PCM? The finding that over-expressing Ana1 partially rescues centriole loss may instead suggest that Ana1 is either upstream of the PCM or part of an independent pathway. Indeed, the authors show that depletion of both the PCM and Ana1 has a stronger effect than either depletions individually - this is indicative of two independent pathways.

      Minor comments

      1. When the authors say that the centriole wall and cartwheel components are "dynamic" I think that they need to make it clear that this "dynamicity" is not very fast. Using the term dynamic tends to suggest rapid turnover (like in the PCM). Perhaps the authors could use the term "slow exchange" or something similar.
      2. The authors currently use a 0 or 1 centriole categorisation - it would be nice to see the breakdown of what percentage of cells have 0, 1, 2, or >2 centrioles, perhaps in a supplementary excel file.

      Significance

      How centrioles are eliminated in certain cells is an interesting question and the data presented is also relevant to understanding centriole biology in general, because it seems that some apparently very stable structural proteins actually turnover. It is widely known that PCM proteins turnover relatively quickly, but core centriole proteins are considered to be stably incorporated. The data will therefore raise interest in the centrosome field. I do, however, feel that for the authors to make this point more strongly it would be good to show this more directly. Overall, this is a very interesting paper that is well written. The data is well presented and supports the conclusions that centriole components turnover and that Ana1 is involved in maintaining centriole integrity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript Pimenta-Marques build on their previous work addressing how centrioles are stabilized and maintained or destabilized and disassembled, depending on the cell type and developmental context. Using Drosophila cell culture and oogenesis as an in vivo model for centriole destabilization, they identify the centriole wall protein Ana1 as a central player in centriole stability. Its presence is required for the maintenance even of mature centrioles, suggesting that there is continued turnover of centriole structural components.

      Major comments:

      1. The experiments and results are very well described and most of the conclusions are supported by the data. One aspect needs clarification though. It is not clear to this reviewer how the authors envision the regulation and mechanism by which Ana1 functions in centriole stability. The data suggest that it can stabilize centrioles independent of PCM (Fig. 3B, 5B), yet the authors claim in the results and discussion that it functions downstream of PCM. As presented, this does not make sense. I would argue the opposite, it may function upstream or in parallel to the PCM. Related to the above, the last sentence of the intro states: "Finally, we found that both Polo and the PCM require ANA1 to promote centriole structural integrity." This is shown for Polo, but where is the data showing that PCM requires ANA1 for promoting centriole stability?
      2. I have a concern regarding the number n used for statistics in the quantifications. In many cases it seems that the number n of cells etc. was used (e.g. n>100 cells) rather than the number of experiments (e.g. n=3). The statistics should measure variability between experimental repetitions, not between cells etc. If statistics were indeed not done on experiments and would have to be changed, some of the observed effects may not be statistically significant and would require additional experimental replicates, which would increase the time needed for revision.

      Minor comments:

      1. I would advice the authors to improve the presentation of the figures. In particular the labels are in many cases very small and difficult to read. Readability is also reduced by the use of bold font in the labels and a mix of various font sizes within single figure panels.
      2. The result section could be shortened/become more readable by moving several paragraphs to the intro or discussion.
      3. The introduction is quite long and some parts read more like an introduction of a review on the topic.

      Significance

      This is a nice, focused study on the requirements underlying centriole stability and maintenance. The first part identifies the cartwheel, the centriole wall, and the PCM as important for centriole maintenance. The remaining parts identify and focus on the essential role of ANA1 in this process. This is an important finding, since the mechanisms underlying centriole stability and maintenance are poorly understood, yet highly relevant. Some cell types inactivate and/or disassemble centrioles during differentiation and this is likely important to their function. Providing more mechanistic insight, for example, regarding the relationship between ANA1 and PCM recruitment or the regulation of ANA1's centriole function by Polo, would have further strengthened the study. The audience interested in this work will be cell and developmental biologists. My expertise is in centrosome biology and microtubule organization.

      Referees cross-commenting

      I agree with the additional points raised by the other reviewers. I still think that overall the paper is fine and most things could be addressed in a reasonable time frame. The work does not provide much mechanism though. In this regard, the confusing placement of ANA1 downstream of PCM, would be the only mechanistic aspect, and it seems the authors got it wrong, at least based on the provided data. Here, additional experiments could elucidate these relationships further, but if this is not the goal, text changes could also address this and it would remain a smaller, more focused study.

    1. Peer review report

      Reviewer: Yulia Karmanova

      Institution: Research Centre Kairos

      email: yulia.karmanova@gmail.com


      General assessment

      In my honest opinion the topic of intercultural competence (ICC) should be of great interest not only to researchers involved in linguistics and pedagogics but to a general reader as well. By developing ICC, that represents a set of skills needed when encountering people from various backgrounds, one can learn valuable communication skills, flexibility in behaviour and become more aware of a lack of one’s tact and tolerance.

      The manuscript is well written in an engaging and lively style, it provides excellent context about linguistic cues of ICC that will help educators steer and stimulate the ICC development of their students.

      The manuscript cites relevant and sufficient literature that provides a very useful resource for current practitioners.

      I do not identify fundamental flaws in the manuscript, there is nothing illogical or irrational, although I have a few suggestions for minor improvements. Please see my comments below for further details.


      Essential revisions that are required to verify the manuscript

      No essential revisions. The manuscript clearly describes the research methods of data collection and analysis as well as other meaningful parameters. Section number 3 (Research Method and Results) is recipe-like, the study can be reproduced.

      The data collected for the research is impressive: 1,635 blogs (on average 400 words each) written by 672 students majoring in Hotel Management.

      The data and analysis provided in the manuscript are not deprived of clarity and logic. No additional experiments are needed to validate the results presented in the manuscript.

      Discussion and conclusion section aligns with objectives stated in the first section.

      The authors of the manuscript made a valuable contribution by identifying linguistic markers for ICC in the language use of students blogging about intercultural experiences: I-perspective lexemes, insight verbs and quantifiers. These language cues make ICC more «tangible» and as a result provide teachers with concrete tools for giving students more targeted ICC assessments in their reflective writing tasks. By giving certain linguistic prompts to students, educators may form a more thoughtful and personalised approach in describing their intercultural experience.


      Other suggestions to improve the manuscript

      The content of the manuscript is scientifically sound but has minor shortcomings that could be improved by further revisions.

      I do agree with the limitations of the research mentioned by the authors, especially with the lack of the explanatory value of a significant difference in frequency of use of the linguistic markers which I think can be resolved in future studies of this topic.

      I suggest that the authors should involve more assessors in their future research. Two lecturer-researchers and three senior students were involved in the process which I assume is not enough for such large-scale research like this. A bigger team of professional assessors could make valuable contribution when analysing the data and resolving emerging research questions.

      I would also recommend providing the manuscript with brief comments on the meanings of the parameters in column 4 (Table 3, 4, 5, 6) for readers’ clarity. What do t, p and n.s. stand for?

      I believe that the manuscript would benefit from correcting minor inaccuracies. I would recommend to:
 replace «his» with gender neutral «their», page 6: In these blogs, the language use of students serves as a vehicle of information on the students’ development of ICC, offering the reader concrete cues – henceforth referred to as linguistic markers – of his reflective learning process.

      • add a space between that and are, page 19: In order to bring more focus to our research, we initially focused on word categories thatare characteristic of properties that can be linked to ICC and cultural sensitivity, such as openness, self- relativity, curiosity and reflection or analytical thinking.

      • add missing parentheses, page 22; Deardorff, D. 2006. Identification and Assessment of Intercultural Competence as a Student Outcome of Internationalisation. Journal of Studies in International Education, 10 (3), 241-266.

      All in all, I find the topic of the manuscript fascinating and the research question relevant and essential to the field.


      Decision

      Verified manuscript: The content is scientifically sound, only minor amendments (if any) are suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by O'Herron et al. describes an all-optical method combining optogenetic stimulation and 2-photon microscopy imaging to simultaneously manipulate and monitor brain microvasculature contractility in three dimensions. The method itself, which represents a microvasculature-targeted variation on a theme previously elaborated for simultaneous stimulation and monitoring of ensembles of neurons, employs a spatial light modulator (SLM) to create three-dimensional activation patterns in the brains of cranial window-model transgenic mice expressing the excitatory opsin, ReaChR, in mural cells (smooth muscle cells and pericytes) under control of the PDGFRβ promoter. The authors demonstrated that, by splitting a single 1040-nm stimulating beam into multiple beamlets using an SLM, this system is capable of optogenetically activating ReaChR at discrete depths in the neocortex, depolarizing mural cells and producing highly localized constrictions in targeted, individual microvessels. Using this system to investigate the kinetics of optogenetic-induced contraction and sensory-evoked dilation, the authors found that the onset of optogenetically evoked contraction was much more rapid than that of sensory-evoked dilation, concluding that the observed lag between sensory stimulation and vascular response does not reflect intrinsic limitations of mural cell contractile mechanisms but is instead attributable to the time course of neurovascular coupling mechanisms. They further found that by titrating the stimulation duration they could completely negate the vasodilatory response to a concurrent sensory stimulus.

      1) The red-shifted opsin, ReaChR, represents an improvement over opsins used in previously described 3D neuronal activation/monitoring systems. In particular, brief single-photon stimulation (100 ms) of ReaChR led to rapid, robust arteriole constrictions throughout the activation volume, whereas a previous generation ChR2 opsin required stimulation for seconds to achieve slowly appearing constrictions.

      Thank you for pointing out this key takeaway from our manuscript. In Figure 9 of the revised manuscript, we provide a comparison of ReaChR-induced vasoconstriction, with data previously collected across microvascular zones using line-scanning in ChR2-expressing mice. These data show how ReaChR produces faster and more potent vasoconstriction in alpha-SMA expressing SMCs and ensheathing pericytes, but has similar effects on the slow contraction with capillary pericytes.

      2) Single-photon stimulation was capable of completing stopping blood flow in a "first order pre-capillary branch". (Not clear what is meant by the phrase "pre-capillary branch"; anatomically, penetrating arterioles feed capillary branches.) While this speaks to the effectiveness of the method, it also highlights potential supraphysiological effects of stimulation and the importance of titrating stimulus intensity/duration to achieve physiologically meaningful responses.

      We have removed the term “pre-capillary” to avoid causing confusion, and now use the term arteriole-capillary transition to denote the alpha-SMA positive segment that lies between the penetrating arteriole (0th order) and the alpha-SMA low/negative capillaries (>4th order). The rationale for this terminology is provided in our new review (PMID: 34672718), which explains why the transitional zone should be considered a separate vessel type that is not arteriole and not capillary.

      We agree with the reviewer that titration of stimulation power/duration will be important and will depend on the application. We addressed this point by performing measurements of arteriole diameter with graded laser powers (Figures 5 & 7). There are many parameters to explore, but for the purposes of this manuscript, we clarify that the effect is titratable and that users should define physiological ranges in their specific circumstances, which may differ based on the experimental goals, age of mice, arteriolar size and vascular zone, and other factors.

      We also note that some applications may want to mimic pathophysiological levels of constriction, for example to mimic the effects of arterial vasospasm after subarachnoid hemorrhage, or ensheathing pericyte contraction with MCAo stroke (PMID: 26119027), or to examine the neural consequences of transient small vessel occlusion.

      3) In assessing effects of laser power, the authors assert that "increasing the laser power only slightly expanded the range of constriction". This seems a bit of an overstatement, given that increasing power (30-fold) had a greater effect on the spread (3x) than the magnitude (2x) of the response.

      Thank you for pointing this out. We have re-worded this section to avoid the overstatement and to emphasize the results more clearly on the spatial spread of constriction relative to laser power.

      The difference images in Figures 4B-C, G-H demonstrated that there was very limited spread of the constriction beyond the stimulation spots. We tested the effect of laser power on the spatial spread of constriction by stimulating with a broad range of power levels. We found that increasing the laser power led to a small increase in the spread of constriction. For example, a 30-fold increase in power (from 5 mW to 150 mW total power) led to ~3-fold increase in the spread of constriction (from ~25 µm to ~75 µm) (Figure 5A-H).

      4) The suggestion that penetrating brain arterioles possess a mechanism for upstream conduction of constrictive responses is intriguing (although this intrigue is tempered by the lack of experimental support for the operation of such a mechanism in the brain microvasculature).

      We are also intrigued by this hypothesis, which was supported by some evidence from a recent study of retinal vasculature. Kovacs-Oller et al. showed using neurocytin tracer injections into capillary pericytes, that they are linked through gap junctions and there is upstream directional diffusion of tracer. Further, they showed that electrical stimulation of a pericyte could lead to directional constriction from capillaries back to the arteriole in the retina (PMID: 32566247). The planar orientation of retinal vasculature makes this phenomenon easier to see. However, the 3D architecture of cortical vasculature is more challenging to study, particularly since the propagation along arterioles occurs along the Z axis, where spatiotemporal resolution of imaging is limited.

      Given our new data on the effects of laser power on axial spread (see reply to points 10-13 below) and the difficulty in separating active propagation from out-of-focus activation, we think there is not sufficient evidence to claim that penetrating arterioles are propagating the signal through some active process. Further experiments, including studies of the mechanisms involved, will be needed to address this hypothesis. Therefore, we have removed any discussion of potential propagation of the signal, and instead focus on the relationship between laser power and axial resolution of activation.

      5) The authors' premise for comparing contractile kinetics with sensory-evoked kinetics is flawed. In attempting to use the kinetics of optogenetic-induced constriction to infer something about the kinetics of sensory-evoked dilation, they are implicitly assuming that the kinetics of contraction and dilation processes intrinsic to mural cells are the same. This is highlighted by their use of the phrase "kinetics of the vasculature", which elides the possibility that dilation and contraction kinetics intrinsic to mural cells are different. Support for this latter possibility is provided by a previous report on renal afferent arterioles showing that the kinetics of myogenic constriction in arterioles are "substantially faster" than those of dilation (PMID: 24173354). Thus, their data do not rule out the possibility that the delay between sensory stimulation and vascular response reflects a slower intrinsic dilatory response rather than the time course of neurovascular coupling mechanisms. Furthermore, arterioles have an internal elastic lamina (IEL), which also determines the rates and degree of constriction and dilation. The IEL ends with the arterioles, and vessels with ensheathing contractile pericytes (and downstream) lack the constraints of the IEL.

      We thank the reviewer for this constructive critique. We agree that there are many issues in comparing kinetics between sensory evoked dilation and our optogenetic constriction. We have re-worded this section to avoid any mechanistic implications in the discussion of the kinetics of the different processes. However, we wish to still incorporate the details about the rapid kinetics of constriction to highlight the utility of the approach to intervene/perturb sensory-evoked responses, given that contraction can be titrated and precisely timed. We discuss the utility of this approach further below.

      6) It's not at all clear how overriding sensory-evoked dilation with optogenetically generated constriction provides a means for distinguishing neural activity from vascular responses. In particular, it is not clear how performing this maneuver while monitoring neuronal activity can provide the suggested insight into "aspects" of functional hyperemia that are essential to neuronal function beyond the relatively trivial observation that there is a point at which blood flow is too low to support continued neuronal activity.

      Thank you for raising this point. We have added more detail to our thoughts on why over-riding functional hyperemia could provide insight into the dependence of neural activity on the blood flow increase. Neural circuits are extremely complex with many different sub-types of neurons playing different roles. These subtypes have been shown to have different metabolic sensitivities and thus, may be differentially affected by blocking functional hyperemia (PMID: 26284893). This could lead to altered circuit activity which could have profound consequences for neural processing. Additionally, the energy budgets of different cellular functions within neurons are quite different (PMID: 22434069) and reducing available energy by blocking functional hyperemia could lead to differing degrees of dysfunction across important cellular processes (e.g. re-establishing the membrane potential, recycling neurotransmitters) which could again have important consequences for neural coding. Furthermore, it has been shown that there is a steep gradient of oxygen moving away from penetrating arterioles, and so neurons at greater distances from vessels may be differentially affected by blocking the hyperemic response (PMID: 21940458).

      7) With the exception of vasculo-neural coupling, where it would be the method of choice, the technology described leaves the impression of a capability in search of an application. That said, the ability to control blood flow to the point of completely stopping it may ultimately have applications in pathological settings.

      In addition to our response above on the utility of over-riding arteriole dilation during functional hyperemia, we have added to the discussion more potential uses of the technique. These include: (1) To be able to manipulate blood flow without using pharmacology or having to induce neural activity could be useful for a variety of studies involving intrinsic reactivity and compliance of vessels in both health and disease. (2) The different microvascular zones have distinct contractile kinetics. There are details that remain unstudied, such as the kinetics of different sized vessels, their location in the network, their identity as collateral arterioles or pial arterioles. Vascular optogenetics can dissect the contractile characteristics of different vessel types, similar to probing a circuit board. (3) Studies of the physiological significance of vasomotion, with respect to brain clearance of metabolic waste products. Being able to directly drive vasomotion and alter its amplitude and frequency will be an important tool for studies in this field. (4) Functional hyperemia is also impaired in many diseases, but this dysfunction could arise from impaired activity of neurons, astrocytes, or vessels. Therefore, a method to disentangle specific changes to blood vessels in vivo could be useful for understanding the vascular contributions to such diseases.

      Reviewer #2 (Public Review):

      The manuscript by O'Herron et al. describes a new technique for all-optical interrogation of the vasculature in vivo. They expressed optogenetic actuator ReaChR in vascular smooth muscle. They activated ReaChR using single-photon or 2-photon absorption. In both cases, they observed rapid and reversible constriction (presumably, due to Ca increase). Single-photon activation produced widespread constriction; two-photon activation allowed targeting of individual vessels. Using a commercial 2-photon system with a spatial light modulator on the photoactivation 1040-nm beam, they demonstrated localized constriction at multiple points along the small and large cerebral arterioles at once targeted by individual beamlets. Overall, this is a very interesting paper that clearly lays out the methodology and experimental design and carefully considers a number of potential limitations and pitfalls. This paper will serve as a valuable recourse for a large community of eLife readers interested in cerebrovascular physiology in health and disease as well as in neurovascular coupling and interpretation of noninvasive imaging.

      Given the chronic nature of the optical window, it is not clear why imaging was done under anesthesia. This point requires explanation. There is a concern that targeting of the vessel wall not possible in awake animals due to brain motion. If yes, that would be a serious limitation of the methodology.

      To ensure that our method is compatible with awake experiments, we have added awake data to the manuscript (Figure 10). We show that individual vessels can be independently targeted in the awake animal and the outcomes are not profoundly different than in the anesthetized state. As with all awake experiments, due diligence must be taken to ensure the preparation is as stable as possible, and the occasional trial may have to be removed if motion artifacts are too large.

      Reviewer #3 (Public Review):

      Strengths: In the vascular field, previous implementation of optogenetics to constrict and dilate blood vessels, has used either single photon full field and fiber illumination, or alternatively confocal and 2-photon scanning of individual vascular segments with raster scanning. The former is limited in spatial precision, activating multiple vessels over a large area, whereas raster scanning is not ideal for accumulating currents and often results in slow temporal precision. Spatial light modulator (SLM) generated diffraction patterns to achieve patterned illumination have become increasingly used in neuroscience to achieve reliable 2-photon activation of targeted neuron populations. Here the authors use this technology to depolarize and constrict smooth muscle cells in vivo. By imaging and stimulating with 2 laser lines and different optical paths they are able to stimulate opsin expressing cells and image simultaneously, which is advantageous. By using the Red-shifted opsin ReaChR for their experiments, it is possible to combine this approach (cautiously) with imaging many of the classically used 2-photon fluorophores and genetic indicators, with excitation spectrums <1040nm. Future work using variations of the technique is likely to gain valuable insight into neurovascular biology.

      Weaknesses: A major limitation of the current study is that although the authors achieve high spatial precision of ReaChR activation in the xy plane, the axial precision appears extremely poor compared to what would have been expected. For example, in Fig. 5-1 (using a 0.8NA, 16x objective), the authors achieve equivalent levels of surface arteriole constriction even when the SLM is focused 200um above the brain, and even larger constrictions as they initially move the focus away from the imaging plane. Although the axial spatial resolution appears better with the 1.1NA - 25X objective, such a large point spread function largely limits the utility of the technique, as there will always be a concern as whether the effects are spatially specific and not due to activation of vascular cells above and/or below the site of interest. This experiment that the authors have presented on axial precision is extremely important as it outlines a very important limitation of the technique (which is likely power dependent), but it remains to be completely characterized and understood. One possibility is that the power levels used by the authors are already above saturation, a problem raised by Rickgauer and Tank (2009)- PMID: 19706471, and therefore they may be able to refine the axial precision by using lower power. Further controls would be valuable to understand the precise cause of this large axial spread as it doesn't quite add up with the diameter of the bleach spot shown in figure 5-1D (some suggestions outlined in recommendations to the authors).

      We agree with the reviewers on this point. We conducted several new experiments to help elucidate the limits of axial resolution. First, we have dropped the comparison between objectives with different NA’s. This leads to unnecessary confusion, and it is common knowledge that lower NA objectives will have poorer resolution in the axial plane. We now mention this as a factor to consider, but have removed it from the figures. Second, we have shown, as the reviewer suggests below, that the stimulation power used has a dramatic effect on the axial spread of constriction (Figure 6E and Figure 7). Low powers indeed show a more narrow axial spread. However, we typically use higher powers (near or above 100 mW) to generate large constrictions in penetrating arteries, and we also include these levels to show the greater axial spread they cause. In summary, we confirm with lower powers the 3D precision of the two-photon optogenetic technique, and we show that higher powers can be used to broadly constrict penetrating arterioles for studies seeking to modulate blood flow in columns of cortical tissue supplied by penetrating arterioles.

      Regarding the stated inconsistency with the bleached spots, we think this mostly has to do with the difference between photo-bleaching fluorescent material (requiring lots of laser power) and photo-activating opsin channels (which can be done with much less power for very sensitive opsins). Additionally, the slide we bleached is optimally activated at ~800nm and so our 1040 nm stimulation required enormous power to burn the spot.

      The current version of the paper also lacks adequate quantification of the results as it is composed primarily of representative examples, which limits a proper assessment of reproducibility and variability of the effects.

      We agree that showing population averages will be more informative to the field. In the original submission, we showed mostly examples because the large parameter space (size and number of spots, position on vessels, duration and intensity of stimulation; if a stimulation train, the duration, number, and inter-pulse interval of stimulation) was explored in the early data rather than picking one set of conditions. However, we have now collected new data where parameters were typically the same and included population average plots in the figures that previously had only individual examples (Figures 2G,I, 4I,M, 4-1C, 5I, 6E,F, 7, 11-2 ) as well as the new data (Figures 8, 9, 10).

    1. Author Response

      Reviewer #1 (Public Review):

      LaRue, Linder and colleagues present an automation (GLO-Bot) and analysis pipeline building on the previously developed GLO-Roots, which makes use of a constitutively expressed luciferase gene to image plant roots in thin soil containers (rhizotrons). After validation of the system using a set of 6 accessions, the authors then take advantage of the increased throughput to phenotype root system architecture (RSA) of 93 natural Arabidopsis accessions and perform genome-wide association to identify polymorphic genomic regions that are associated with specific RSA traits. I appreciate that the authors made all data available via zenodo.

      The authors succeeded in automating the GLO-Root system. Overall, the GLO-Bot appears to be a nice platform to collect time-lapse images of root growth in soil-substrate using rhizotrons. The automation of the GLO-Roots system using the GLO-Bot is well described, although not in sufficient detail to be rebuilt by interested researchers, e.g. the software controlling the robot is not described or made available, precluding wide adoption of the method. The image processing pipeline is clearly described in the methods and in Figure 2. The pipeline open source and available for use and appears to work well overall, although in some cases the vector representation of the root system appears to be incomplete.

      We thank reviewer #1 for raising these concerns. We have now made the general code for the software available (GitHub: https://github.com/rhizolab/rhizo-server). In addition, we uploaded the rhizotron laser cutting files (Zenodo DOI: https://doi.org/10.5281/zenodo.6694558) that would facilitate rebuilding the robot.

      We understand the concerns about the vector representations of the root system.

      These root system structures visible on the GLO-Bot images are indeed disconnected in many locations, due to variability in the reporter’s intensity and obstruction of the light path by soil particles. For traits like root angle, the disconnected nature of the root system is much less impactful as this method naturally uses “segments” of the root as individual elements for angle measurements.

      The authors then present a quantitative analysis of RSA using a set of 93 accessions, with 6 replicates per accession, generating a large dataset on the diversity of RSA in Arabidopsis. Using average angle per day, the authors identify SNPs that significantly associated with angle at 28 days after sowing, and they describe a correlation between this trait and the mean diurnal temperature range at the site where the accession was originally collected. The main weakness of the manuscript in its current form are some details of the quantitative genetic analysis. In my opinion the quantitative genetic analysis would benefit from additional quality control as there are peculiarities in the dataset that was used as the basis for GWAS.

      We understand the concerns from reviewer #1 about the quantitative genetic analysis. Ultimately, we performed the analyses in the way we explained in the paper with careful consideration. We have added in additional descriptions of the rationale for chosing certain methods that hopefully elucidate why we did the analyses in the way we did. We hope this paper serves as a resource for others to pursue additional studies on traits relevant to their research.

      Reviewer #2 (Public Review):

      Therese LaRue and colleagues have developed a second generation of the GLO-Roots system that had been developed in their lab and published in 2015. Importantly, the new system (GLO-Bot) and the analysis of the resulting images has now been largely automated and therefore provides a throughput allowing for genetic studies. In an impressive endeavor the authors have transformed more than 100 diverse accessions that had been selected using sensible criteria with the luciferase construct, which then allowed the RSA of these accessions to be measured using the GLO-Bot system. On a set of 6 diverse accessions, the authors carefully identify meaningful RSA traits that they then quantified in the accessions of a larger panel of almost 100 accessions. They also benchmarked the new imaging processing tools against gold-standard manual tools. Overall, they show that the data acquisition and analysis is reproducible and reasonably accurate. They then proceeded to conduct GWAS using the RSA traits and identified several significantly associated candidate SNPs. Finally, they correlated the RSA with environmental variables and found interesting correlations that are consistent with prior studies.

      Strengths:

      The manuscript presents interesting root phenotyping technology, a comprehensive atlas of RSA under rhizotron lab conditions in Arabidopsis, candidate genes potentially underlying RSA traits, and interesting associations of RSA and climate variables. This will be inspiring and useful to many other researchers and has the potential to be explored further in future studies.

      We thank the reviewer for the encouraging feedback.

      Weaknesses:

      Some aspects of the data analyses are not well described and should be described more. The trait data is heavily processed to "breeding values" and it is a bit unclear when unprocessed and processed trait data is used and why. Also, limitations and caveats are not discussed sufficiently. For instance, presenting and discussing the issues and caveats of measuring RSA that was generated in thin and not very wide soil sheets using the GLO-Bot system when natural growth in soil is usually largely unconstrained. Moreover, the analysis of potential candidate genes from the GWAS is not very well developed. Finally, the trait data was not available with the manuscript and a major impact of a resource like this will come from the data being fully available to the community.

      We appreciate the broad comments on the manuscript and have tried to address them through the specific responses below. Overall we believe the approaches we used are effective but with specific caveats and have used the revision as a means of better communicating the limitations of the approaches chosen.

      Reviewer #3 (Public Review):

      The authors provide a thorough description of a method to transform plants to be bioluminescent upon applications of the require substrate such that roots are visible on the windows of rhizoboxes. They have expanded on previous work by automatic the imaging process with a robot that moves rhizoboxes to an imager where images are captured. They have improved the image analysis pipeline to be mostly automated with a user presumably needed to run various scripts in batch mode on directories of images. One novel aspect of the image analysis pipeline is in using image subtraction to subtract the previous time root system from the current in order to identify new growth.

      We thank the reviewer for highlighting the strengths of the manuscript.

      Overall, I think the authors provide a great amount of detail in parts needed and the methods, but some recommendations to increase reproducibility are more information about actual root traits measured. For example, one concern would be if root length is only summing pixels without considering diagonal pixels having a length of square-root of two, sqrt(2).

      This is a valid concern, rather than just summing the pixels, the length of the segments is actually calculated using the “Feret Diameter” (or caliper length) function in imageJ which does take diagonals into consideration

      While the methodological aspects of the paper are compelling, the authors have furthered the significance through a biological application for genetic analysis among accessions of Arabidopsis and correlating root traits to climatic 'envirotypes' or data from the origin site of the respective accession. This genetic analysis would be furthered by greater consideration of time series analysis and multi-trait analysis, which is possible in GEMMA. The authors could consider genetic analysis of the PCA traits as well. Given the novelty of this type of time-series, multi-trait data - the authors can reach further here.

      Absolutely, PCA approaches to disentangle the phenotype space would be highly interesting to further investigate, which we started in the Supplemental Figure 8. This figure decomposes all the data points including replicates and temporal values of the same replicate. The PC1 therefore mostly captures how plants change over time, while PC2 seems to capture the main trade-off of wide/horizontal vs deep/vertical root architectures that we describe throughout the text. We could make use of this PC space to quantify the average value per genotype in PC2 and utilize this value for GWA, although it is not obvious how replicated and temporal measurements behave in PCA and what would be its consequences when computing a genotype value. There will definitely be interesting work that we aim to pursue in this direction in the future.

      Regarding the additional capabilities of GEMMA. We are not aware of a subtool that is able to analyze time series directly in GEMMA, but we will look into it. The multi-trait analysis in GEMMA is also interesting. We have utilized the multi-trait feature in the past, but this is limited to very few traits. We have 8 time points, thus 8 traits. For reference, when we have run multi-trait LMM with 2 traits, we have typically seen runtimes of ~9 days in large clusters. New tools continue to emerge in the field of quantitative genetics, such as the use of summary statistics of multiple GWAs to gain new insights, which we will pursue in the future. We have added possible future directions to the discussion section (page 14).

      As far as the general structure of the manuscript, I struggled with the results mixing in the methods such that I was never sure if the lack of detail in methods there would be addressed later, along with the mixture of discussions. Perhaps these are personal choices, but the methods were also after supplemental. I simply ask the authors to consider the reader here by being honest with my own experience reading this manuscript.

      We appreciate this comment of reviewer #3. Since this is a “Tools and Resources” article, we believe that a substantial part of the results section should include the methods that were applied. The methodology mentioned in the results section should always help the reader to understand the illustrated results in the figures. If readers would like to apply certain methods, however, more details can be found in the materials and methods section. We apologize if this was not always successful and led to confusion. In the final formatted version, all supplemental figures would be linked to the main figures so that the materials and methods section would follow the discussion.

      Overall, I believe this manuscript advanced root phenotyping by providing relatively high-throughput (imaging is slow due to the long exposure times) data and doing the time-series, multi-trait genetic mapping. The authors mention imaging shoots but no data is presented - presumably, it would be interesting to tie that in but they may be reasons to not. The authors could also discuss more the advantages of this approach relative to color imaging that has also advanced significantly since the original GLO-Root paper was released. Last, I am not sure the description of the 6 accessions study adds much value to the paper, and probably many other preliminary studies were done to prototype. Overall, this is fantastic and substantial work presented in a compelling way.

      Unfortunately, the shoot images that were taken did not have sufficient quality for further analysis and due to technical problems, the set of shoot images is not complete. We removed the part of shoot imaging from the text. It now reads:”Inside the imaging system, the rhizotrons were rotated using a Lambda 10-3 Optical Filter Changer (Sutter Instrument®, Novato, CA). If it was the first imaging day or a designated luciferin day (every six days), GLO-Bot added 50 mL of 300 μM D-luciferin (Biosynth International Inc., Itasca, IL) to the top of each rhizotron immediately before loading the rhizotron into the imager.”

      The advantages of the GLO-Roots method over color imaging is clearly that the GLO-Roots method can capture a more complete image of root systems with finer roots (like Arabidopsis). We have added the possibility of using RGB imaging for bigger root systems to the discussion section (page 13).

    1. Is maintenance a privilege?

      I think in many ways it has become a privilege. In an age when practical skills and ability to repair are relatively rare, and when it is often cheaper (in money and time) to buy new, I think maintenance is a privilege. Can we share it, teach people to fish so to speak? Perhaps knowing how to maintain simply isn't enough; slim margins of personal time may not be best spent maintaining things (as opposed to maintaining oneself).

    1. Reviewer #1 (Public Review):

      The key question that Huang et al. are addressing is which approach, paratransgenesis, transgenesis, or the combination of both, is the most promising to combat malaria, killing parasites without affecting the mosquito host. They explored this question by generating a transgenic mosquito line secreting two effector molecules in the midgut and salivary glands, and infecting mosquitoes with Serratia bacteria expressing effector molecules. Their major finding is that a combination of both strategies has the highest inhibition of parasite development compared to transgenesis or paratransgenesis alone. This is further confirmed by mouse infections with a rodent malaria model showing that a combination of both strategies inhibits transmission to naïve mice.

      This study is comprehensive and provides significant information on the possible use of these approaches for malaria control. The effects on parasite development are clear and convincingly confirm that these strategies have the potential for reducing malaria transmission. It cannot be ruled out, however, that the more pronounced effects on parasite development of the combined approaches may be due to differences in the fitness of these mosquitoes rather than a true additive or synergistic action between transgenesis and paratransgenesis. Another limitation is that the authors do not show when parasites are killed and do not provide direct evidence of the role of the bacterial-expressed factors in the killing mechanism.

      The authors show very convincingly that transgenic mosquitoes (all possible combinations) have comparable fitness to wild types. However, these fitness studies are lacking in Serratia-infected mosquitoes, and in the transgenic-paratransgenic combination. Are those mosquitoes as fit as WT? Fitness costs could negatively affect parasite development indirectly, rendering the comparison between the treatments impossible (and negatively impacting this possible strategy). These are key controls that need to be added to the manuscript in order to support the finding that the combination is the best approach.

      It is surprising that the Sg/E line inhibits oocyst development given it uses a salivary gland promoter. The authors hypothesize that this is most likely explained by mosquitoes ingesting saliva with the blood meal. This hypothesis is interesting but needs to be tested by determining the presence of Scorpine and MP2 protein in the blood bolus. Also, at what stage are parasites killed?

      While the authors test the expression levels of Scorpine and MP2 by qRT-PCR and western blot in transgenic mosquitoes, they did not test levels in paratransgenic ones. In which tissues are these factors produced in Serratia-infected mosquitoes? Are Scorpine and MP2 produced in the midguts and/or salivary glands? And at what level? A quantitative comparison of scorpine and MP2 protein levels in transgenic and paratransgenic mosquitoes is important to determine whether levels are correlated to the effects on parasite development.

      Related to this, the engineered Serratia bacteria appear to express 5 effector molecules rather than just MP2 and Scorpine. This obviously can affect the results and also makes a direct comparison less meaningful, but we couldn't find any information on the other effectors, or on whether they are expressed and potentially responsible for the observed anti-parasitic activity.

      More information about the experimental setup is needed. The authors used a piggybac approach that has led to multiple insertions in some of the mosquito lines. Which lines did they use for the experiments? This is not clear in the manuscript. If multiple insertions were used, this should be stated and the feasibility of maintaining them (and efficacy) over different generations should be discussed.

      Oocyst and sporozoite data are not normally distributed, and therefore presenting the median instead of the mean is more informative. Furthermore, the statistical analyses done do not appear to be appropriate for this data. The authors need to either FDR-correct for multiple comparisons or do a Kruskal-Wallis test with post hoc testing. It would also be important to do statistical analyses on the prevalence.

      When discussing the ethical consequences of this approach, the authors should also discuss the possible effects of QF2, scorpine, and MP2 secretions in humans upon a blood feed.

      The authors show Serratia vertical transmission over three generations, but as the CFUs decrease over multiple generations, they should discuss whether low levels of Serratia can still block parasite development. In general, the manuscript lacks a thorough discussion of the limitations of this study.

      The discussion around line 280 should be more nuanced. I don't think the word 'protected' can be used as mice were not immunized but were simply not infected.

    1. Reviewer #1 (Public Review):

      The authors look at a few different nematode species to compare the dynamics of anaphase. They find that in some species the spindle oscillates transversely in anaphase, and in other species it does not. They ask what accounts for this different behavior. To address this question, they use ablation of the central spindle, and conclude from the result, correctly, that after the ablation the centrosomes are pulled to the opposite poles of the cell in all species. However, the magnitude, half-time and initial velocity of the recoil differ.

      To understand what accounts for the quantitative difference, the authors

      1) use a simple viscoelastic model of a constant force, F, pulling against a spring (with constant stiffness k), while the object moves through the viscous medium.

      2) estimate the cytoplasmic viscosity from tracking yolk granules,

      3) estimate parameters F and k from fitting the exponential recoil curves. They find that the greatest correlation between having transverse oscillation or not is with lower or higher viscosity, not with magnitude of the force or stiffness of the spring.

      Two major problems with this study can be identified:

      1) Meaning and significance: It is not clear if the transverse oscillation have a functional significance. In fact, they are more likely than not simply a byproduct of complex nonlinear mechanics of the mitotic spindle. It is important to understand what we can learn about the spindle mechanics from these oscillations, but there may be no evolutionary significance here. If the authors were asking - how, in many different species, the spindle scales with the cell size in the same way (as was done in Farhadifar et al 2020, which the authors do not to cite) despite large parameter variations - that would be a different story. But asking which parameter change is responsible for the behavior change is less meaningful.

      2) The study is not convincing, mainly because the model used for the fit is overly simplistic. The force is not constant, the spring stiffness is not constant, the mechanics is not, etc. There are a few different, very complex models, of the anaphase spindle with transverse oscillations - comparing to simulations of these models would be more convincing. Also, I am not quite sure whether the volume fraction of yolk is a useful parameter. Does not measuring MSD give us the diffusion coefficient and viscosity directly? I think using the factor depending on the volume fraction artificially inflates the viscosity differences. Lastly, I do not understand the theoretical argument based on comparison with Nedelec's model: in that model, increasing viscosity only slowed the oscillations down, not abolished them.

      In short, much more thorough investigation would be needed to understand which differences between the species account for the presence or absence of the oscillations, and one may question whether the answer would have a deep impact on our understanding of spindle mechanics.

    1. Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve:

      The authors have developed an approach to prediction of T cell receptor:peptide-MHC (TCR:pMHC) interactions that relies on 3D model building (with published tools) followed by feature extraction and machine learning. The goal is to use structural and energetic features extracted from 3D models to discriminate binding from non-binding TCR:pMHC pairs. They are not the first to make such an attempt (e.g., Lanzarotti, Marcotili, Nielsen, Mol. Imm. 2018), but they provide a detailed critical evaluation of the approach that sets the stage for future attempts. The hope is that structure-based approaches may have better power to generalize from limited training data and/or to model unseen pMHCs.

      An account of the major strengths and weaknesses of the methods and results:

      The authors first report (section 4.1) that their structural and energetic features contain information on binding mode, highlighting complexes with reversed binding polarity, for example, and partly discriminating MHC class I from MHC class II structures. This is encouraging but not terribly surprising. Also, with regard to MHC I vs II discrimination, it is not clear how the class II peptides are registered with respect to one another. This needs to be done by alignment on MHC and mapping of structurally-corresponding peptide positions, since the extent of N- and C-terminal peptide overhangs varies between structures and is largely irrelevant to the docking mode. Interactions between the TCR and MHC are ignored in the feature extraction process; it's possible that including these interactions could improve performance. The authors state: "To be noted that not all structures could be successfully modelled by TCRpMHC models, and so we could not submit them to the feature extraction pipeline." It's unclear what effect this could have on the results: if the modeling failures are cases of structures for which no good CDR templates could be identified, then perhaps this could bias the results.

      Section 4.2 reports a negative result: unsupervised learning applied to the extracted features is unable to discriminate binding from non-binding complexes. This suggests that there is not likely to be a simple energetic feature, such as overall binding energy, that reliably discriminates the true binders. In Section 4.3, the authors turn to supervised learning, in which training examples inform prediction by a classifier. One finding is that the pure-sequence approach using Atchley-factor encoding of the TCR:pMHC outperforms the structure-based approaches, though not by much. A combined model incorporating Atchley factors and structural features does slightly better. These results are a little hard to interpret because we don't know how challenging the 10-fold internal cross-validation is. It doesn't sound like there is any attempt to avoid testing on TCR:pMHCs that are nearly identical to TCR:pMHCs in the training sets, and the structural database is highly redundant, containing many slight variants of well-studied systems. It's also not clear how overlap between the template database used for 3D modeling and the testing set was handled; my guess is that since the model building is an external tool this was not controlled. Together, these factors may explain why the results on independent test sets are, for the most part, significantly worse than the cross-validation results. Another take-home message from the independent validation is that the sequence-only method seems to outperform the sequence+structure or structure-only methods. Although these are described as "out-of-sample validation", it's not clear how different these independent TCR:pMHC examples are from the structure dataset on which the model was trained.

      Sections 4.4 and 4.5 report that prediction accuracy varies significantly across epitopes, and this is in part determined by sequence similarity to the structural database (which provides templates for modeling and also constitutes the training set for the model). In section 4.6, the authors determine that the model does not appear to be able to predict binding affinity (as opposed to the binary decision, binding versus non-binding). Finally, in section 4.7 the authors benchmark the predictor against two publicly available, sequence-based predictors. When predicting for epitopes present in their training sets, all methods do reasonably well, with the edge going to the sequence-based ERGO method. When predicting for epitopes not present in their training sets, none of the methods perform very well. The authors state that "these results suggest that the structure-based models developed in this study perform as well as the state-of-the-art sequence-based models in predicting binding to novel pMHC, despite learning from a much smaller training set." This may be true, but the predictions themselves are not much better than random guessing (AUROCs around 0.5-0.6).

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      I'm doubtful that the proposed methods will form the basis of a practical prediction algorithm. In the absence of ability to generalize to unseen epitopes, simpler sequence-based approaches that leverage the ever-growing dataset of TCR:pMHC interactions seem preferable. I still think the study has value as a template and roadmap for future efforts, and a baseline for comparison. For me, a key unanswered question is whether the model-derived structural features are just a different, slightly noisier way of memorizing sequence, or actually contain orthogonal information that can enhance predictions. It might be possible to gain insight into this question by looking more carefully at the impact of model-building accuracy on performance (the authors use sequence similarity as a proxy, but this is confounded by overlap between the training set and the template set used for modeling). If model-building really adds something, it seems plausible that it does so by accurately capturing physical features of the true binding mode.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      As state above, I think the present work will have a positive impact on the field of TCR:pMHC prediction by critically evaluating the structure-based approach (and also by testing two previously published methods on independent data). I am less convinced of the utility of the specific methods than of the overall conceptual framework, evaluation procedures, and training/testing sets.

      Any additional context you think would help readers interpret or understand the significance of the work:

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** Techniques to probe the local environment of membrane proteins are sparse, although the influence of lipids on the membrane protein's function are known since many years. Therefore, the paper by Umebayashi et al. is important. The environment-sensitive dye Nile red (NR) coupled to a membrane protein is an appropriate sensor for monitoring the local membrane fluidity. Linking of Nile red to the receptor via a flexible tether was achieved with the acyl carrier protein (ACP)-tag method. Experiments showed that depending on the ACP site a certain linker length is required to have NR inserted in the membrane and thus be an effective sensor for lipid disorder. This technology could be of general usability to study the environment of membrane proteins in the context of their function. As an example, the technique allowed insulin induced membrane disorder in the close insulin receptor vicinity to be observed. Further, results suggested that tyrosine activity is required for this disorder to happen. The experimental results appear to be complete and controls were made.

      **Major comments:** 1) Sometimes technical terms are used without explanation: What is the GP value? What is ACP-IR? The spectrum was measured in number of rois? The reader can find those abbreveations out, but it would be nice to have them defined.

      We have made a list of abbreviations.

      2) Fig. 1d) is confusing. The ACP-IR labelling is evident in 3 panels, but there is no difference in the color (emission spectra of 1992-ACP-IR vs 2031-ACP-IR should be visible??). The DAPI staining is very different. When doing the latter, how difficult is it to get the staining equal?

      The differences in spectra cannot be seen because we used pseudo colors for display of the DAPI and CoA-PEG-NR staining. The reviewer’s comments about the unequal DAPI staining is correct. The reason for this is most likely that the cell membrane is unequally permeabilized by PFA treatment. As the point of this figure is just to show that the plasma membrane is labeled, dependent upon the expression of the ACP-tagged insulin receptor, we don’t think that the variable intensities of the DAPI staining is important. DAPI is simply used to indicate the position of the cells.

      3) How can one interpret Fig. 4: a) Control goes over 4 frames, at 240" insulin is added, and 10 frames should show a fluctuation difference?

      We showed 4 frames after control treatment that showed no significant change was observed by control treatment. We expected that clear changes would be invoked by insulin treatment in GP images, however these changes, while visible in the GP images, are difficult to see for the untrained observer. This is the reason why we used the ZNCC method in the subsequent figures to better visualize the changes.

      1. b) A color shift from blue to green is visible after insulin addition. But it is faint - difficult to assess from the pseudo color scheme. What does 1000 pixel top/1000 pixel bottom mean in c). Is it an attempt to better visualize the fluctuation? It is difficult to recognize a difference before and after adding insulin. d) It seems that the kymograph set should show this. What is the color scale? Why is 3 so untypical, i.e., no change? Box 6 is also peculiar: the left side does not show a strong change upon insulin administration, the right side does. Why? We appreciate the helpful comments for improving our manuscript.

      As pointed out, the change of GP value is extremely small before and after insulin addition, so it is difficult to fully visualize the change with normal pseudo-color expression. To deal with this, we adopted the following two methods to visualize minute changes.

      1) Visualization of local changes of the statistical GP value showed by ZNCC throughout the time-lapse images (Fig. 6 and Fig. S2B).

      2) Visualization of the top/bottom 1000 pixels of the sorting ZNCC value in each image (Fig. 7 and Fig. S2C). The top 1000 pixels are the ones that showed the largest changes. The bottom 1000 pixels are the ones that showed the smallest changes.

      Owing to these expressions, we found out that the level of the response against the insulin signal was spatially and temporally heterogeneous in the membrane.

      As for the color scale, in order to clarify the meaning of the difference of color, we have added the description about the relationship between the color and the ZNCC value in the results section.

      4) How is the kymogram calculated? The legend says 'The horizontal dimension represents the averaged ZNCC inside the rectangular area, and the vertical dimension represents time'. The averaged ZNCC is a single value, so it is not clear why the kymogram shows a variation from left to right. May it be the ZNCC was averaged just vertically?

      We apologize that we did not provide information regarding making the kymograph.

      In the yellow rectangular area (Fig. 6B), the ZNCC values of the pixels with the same x coordinate value were vertically averaged, which were represented as the horizontal direction of the kymograph. That is, one horizontal line of the kymograph holds the spatial distribution of the ZNCC value along the horizontal direction of the membrane, and the vertical direction shows their time changes. To make it easier to understand, we refined the description about the kymograph in the legend of Fig. 6.

      5) When calculating cross-correlation values on images, they need to be aligned. What fraction of the total image does the selected 19x19 box represent? As described, I imagine that a rolling CC over 19x19 pixels is calculated over an image from the time lapse series comparing it with the reference Iave(x,y). Compared to the 3x3 median filtered CP image, the ZNCC image should then be much more blurred??

      Below we provide more information regarding the calculation of ZNCC.

      Each local window for ZNCC calculation is set to a 19x19 pixels centered on every single pixel excluding the edges of an image. The ZNCC value calculated in that window is set to a center pixel of that area. After that, a new window centered on the adjacent pixel is set and calculate the new ZNCC. That is, the calculation window is slid throughout the image. Also, the calculated ZNCC value is not set to all the pixels of the window, but is set to only the center pixel of the window, so there is no blur effect like median filtering.

      The figure below shows a schematic view of our ZNCC calculation.

      Schematic view of our ZNCC calculation

      **Minor comment:** On page 16 supplementary is not spelled properly.

      corrected

      Reviewer #1 (Significance (Required)):

      The key point of this paper is convincing and the new technology appears to have a lot of potential. It can be applied to study membrane protein function in the context of its environment, the lipid bilayer.

      Membrane fluidity measurements have been developed (e.g., using fluorescent probes like laurdan). However, the trick to link a probe like nile red by ACP technology to the insulin receptor and to observe its activity is quite new.

      A most recent description of such a technology is in TrAC Trends in Analytical Chemistry Volume 133, December 2020, 116092.

      This is an interesting review, but not directly impacting on our work.

      **Referees cross-commenting**

      All comments are constructive and important. The paper is important but needs to be amended as proposed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** In this manuscript, authors generated an ACP-attached Nile Red probe in order to specifically label Insulin receptor in the membrane. Owing to this specificity, one can measure the lipid membrane properties around a specific protein in the membrane. **Major comments:**

      For the conclusions in the manuscript to be convincing, in my opinion, these additional data need to be added. Some of these are new experiments, and some are detailed analysis of existing data. The new experiments are not for new line of investigation, instead it is to confirm their statements and conclusions. The major point is the reliability of spectral shift. In usual environment sensitive probes, it is certain that they are in the membrane whatever is done to the membrane. However, when the probe is attached to a protein, it is not trivial to have the same confidence that the probe is always inside the membrane, and it is in the same plane of the membrane. 1992-ACP-IR is a good example; authors state that it binds to the protein outside the membrane, but when there is cholesterol addition and -maybe more interestingly- cholesterol removal, the dye still reacts and changes its emission (even PreCT changes its emission quite a bit at the 570 nm region). This is a clear indication of a change in localization of the probe upon some changes in the membrane. This implies that observed spectral shifts may not be due to lipid packing differences, but due to localization of the probes. For this reason, it is crucial to know where any environment sensitive probe localize in the membrane with respect to membrane normal, and this knowledge is more important for this probe. Related to this, the spectral difference upon insulin treatment and activation of insulin receptor could be due to changes in probe's localization in the membrane. Especially because authors show in Fig1e, the spectra can change depending on the probe localization. Relatedly, quantum yield of NR should be significantly different when it is inside vs outside membrane. Authors should show QY for 1992-ACP-NR and 2031-ACP-NR with different PEG lengths and upon insulin treatment.

      We understand the logic of the request to measure the QY, since the QY of Nile red is much higher in organic solvents than in aqueous solutions, so it might be predicted that the QY of Nile red is higher in a lipid bilayer than when covalently bound to the protein in an aqueous environment. However, this argument depends upon the mechanism for the increase in quantum yield when going from aqueous to a non-polar solution. One possible explanation is based on the intrinsic properties of the dye under the two conditions. The alternative explanation would be that the dye would aggregate (be insoluble) in aqueous solution and therefore either not fluoresce or self-quench. In this case, we believe that the latter is the explanation because we and others have previously shown the turn-on properties of the probe when binding to proteins (SNAP-tag and others). It is not simple to measure QY in the cell under a microscope, but we have done something similar shown in supplementary figure 4. We labeled the three ACP-receptor complexes with PEG11-Nile red and co-stained with antibody to the Insulin Receptor. We then calculated a relative quantum yield. There were very little differences at all between the relative quantum yields, so we conclude that it is not the environment of the probe, which affects the quantum yield under these conditions, but the fact that it is covalently attached to a protein and incapable of forming aggregates. What distinguishes these constructs is the emission spectrum, not the quantum yield. In supplementary Table 2 we also did QY measurements in vitro and we could reproduce the increase of quantum yield by association with liposomes or in organic solvents. We tested whether non-covalent association with a protein would increase the QY by incubation with the lipid binding protein, BSA, in PBS. This was not the case, strongly pointing to the conclusion that it is the covalent association with the protein that increases the QY, not association with a protein. We believe that our demonstration of changes in fluorescent spectra with changes in cholesterol, large changes in fluorescent spectra with linker length for the 1992 construct and voltage sensitivity using patch-clamp prove that the Nile red is reporting on the membrane environment under the conditions we propose.

      **Minor comments:** - Fig 1d requires quantification We do not agree on this. This is simply to show that the labeling is dependent upon expression of the relevant ACP-IR constructs. There is no detectable labeling of the control.

      • Voltage sensitivity of different PEG length of 2031-ACP probe should be added. We have added this data in figure 2 panel E.

      • Fig 3a graph should show all data points, not only bar graphs. Also, the band in 3a for +CoA-PEG-NR is dimmer than other bands, is it specific to this particular gel since quantification does not show any difference?

      There is no significant difference- Fig 4d, colour code is needed.

      Done

      • Fig 5b and Fig3d are basically the same experiments in terms of control measurement, why is the difference in 3b is 0.04 GP unit while it is 0.007 GP unit?

      We explain in the MS, but have improved the title of Y-axis in Fig.5 b graph so that the difference in what is plotted is clear. - Why is inhibitor data so noisy? We should discuss.

      We don’t know the exact reason why inhibitor data is noisy, but we speculate that the actin cytoskeleton and phosphoinositide-dependent signaling could affect the membrane stability, and the membrane environment would be fluctuated in the presence of latrunculin B or PI3K inhibitor.

      Reviewer #2 (Significance (Required)): Overall, this is a very useful approach, and this line of research will yield very useful tools to shed light on how lipids surrounding proteins can change their function. Major advance of the paper is the new chemical biology tool. There is also biological data on how insulin can change the insulin receptor's membrane environment which is contradictory to some old literature claiming that InsR becomes more "rafty" upon insulin treatment (e.g., PMID: 11751579).

      If this type of tagging proves robust and reproducible (limitations and concerns listed above and below), it could be used by other researchers to tag their protein of interest and investigate the lipid environment around those proteins.

      The downside of this method is that the probe requires ACP tag, a relatively less used tag than others in biology, therefore researchers interested in using this probe should have their proteins with ACP tag. Moreover, the linker length and ACP-tag position are quite crucial parameters (and probably should be optimized for each protein). Longer PEG lengths cannot report on changes efficiently (Fig3b), while shorter lengths are prone to artefacts as they can go out of membrane (Fig1 and Fig2). This might limit its widespread use.

      The reason for using the ACP tag is that neither the SNAP tap nor the HALO tag working. The tethered Nile Red preferred to bind to the tqg rather than inserting into the membrane.

      **Referees cross-commenting** I agree with all comments and concerns of other reviewers. I see the usability and potential of this new technology along with its limitations as all three reviewers pointed out.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): See below. No concerns on any of these issues.

      Reviewer #3 (Significance (Required)): **Critique:** This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity. This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      **Specific Comments:** (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      This has been discussed in the revised version.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      There is a long list of attractive post-signaling events of the insulin receptor and how this works in different cell types that could be tested. We believe that this is beyond the scope of this study and we encourage others to do this.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR? We determined localization to AP2 adaptor containing clathrin coated pits at the cell surface and showed that during the time-course of the experiment that there is no significant change in co-localization or evidence for endocytosis (new figure 9). Therefore, we decided not to do the clathrin inhibitor blocking experiment because we believe that it could only lead to indirect effects.

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation? This is highly unlikely given the fact that fluidification of the membrane environment is found with all length linkers. Given the intervals in increases in linker length on the 2031 construct, which is the closest to the membrane, it is very difficult to conceive that any of the ones larger than 5 PEGs restrict significantly the membrane insertion of the dye. **Referees cross-commenting**

      I think we have a consensus opinion

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      See below. No concerns on any of these issues.

      Significance

      Critique:

      This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity.

      This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      Specific Comments:

      (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR?

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation?

      Referees cross-commenting

      I think we have a consensus opinion

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the important question of understanding the cellular physiology of cholinergic interneurons in the striatum. These interneurons play a key role in learning and performance of motivated behaviors, and are central to movement disorders, psychiatric disease, and addiction. Their unique physiology, which includes tonic pacemaking activity and active conductances that shape integration of dendritic inputs, is critical to their function but is still incompletely understood. The authors cleverly integrate a series of innovative electrophysiological and optical approaches to gain insight into dendritic physiology of these neurons. Their creative approach yields some interesting and novel findings. However, there are technical and conceptual concerns that need to be addressed before these results can be readily interpreted. Some refinement of analysis and presentation, and potentially some additional experiments, will therefore be required to strengthen the conclusions and facilitate interpretation of the results.

      We believe that with several new sets of experiments and simulations, we have successfully refined the analysis and addressed the technical and conceptual problems. Indeed, we strengthened the conclusion with a novel pharmacological experiment that provided model-independent evidence of proximal-only boosting.

      Major concerns:

      1) This manuscript focuses on differential physiology of proximal and distal dendrites contribute to physiological activity and integration of inputs in cholinergic interneurons, suggesting that NaP and HCN currents act in concert to selectively boost inputs onto proximal dendrites (from thalamus), relative to inputs onto distal dendrites (from cortex). The results presented in Figures 1-4 are consistent with a distinct physiology of proximal-vs-distal dendrites based on purely electrical properties. Indeed, Figure 5 initially appears consistent with this model as well, since thalamic inputs (onto proximal dendrites) are boosted by an NaP conductance, while cortical inputs (onto distal dendrites) are not. This raises a key conceptual question: why are cortical inputs onto distal dendrites not boosted? Any depolarization of distal dendrites must pass through proximal dendrites before reaching the recording electrode at the soma. Shouldn't this signal be subject to the same active and passive conductances, and consequently the same boosting that shapes thalamic inputs onto proximal dendrites?

      You are absolutely right in the case of a linear model (passive or quasi-linear). However, for a nonlinear system, there can be preferential boosting of proximal inputs. The new Appendix 2, addresses this point with computer simulations.

      2) The quasi-linear approach to characterizing active and passive membrane properties is promising, and the choice of a cable-based model is well supported. However, the model itself is rather opaque, which limits confidence in the interpretation of the results. Additional analysis and description should be presented to alleviate concerns about whether the experimental data, which has a limited number of measurable values, may be over-fit by a model with too many free parameters. For example, why is the radius of the dendrite a free parameter that is allowed to vary in the full field vs proximal experiment (Lines 253-256) - and isn't it a serious red flag that the value returned for proximal dendrites is smaller than for the full field? Additional tables (e.g. fixed and free parameters and how they were determined), and figures (plots of how those parameters influence the fits, and how the parameters interact with one another) would considerably strengthen confidence in the conclusions drawn by the authors.

      Thank you very much for this comment. We have added in the new ms a table with all the parameters fit in the various figures, and have discussed the possible pitfalls of overfitting. Most importantly, we have provided a new appendix (#1) to the manuscript that explains the effects of the various model parameters in a systematic fashion, beginning with a passive dendrites, followed by the effects of boosting and then the effect of restorative currents that give rise to resonances. This appendix addresses the questions raised by the reviewer regarding how the various parameters influence the fits.

      We apologize, if we created a confusion, with respect to the meaning of the parameter r. It does not represent the radius of the dendrites (which is not explicitly represented at all, only implicitly through the space constant) but rather the electrotonic range of illumination. We indeed find that the fits consistently estimate a value of r for the proximal illumination which is smaller than that estimated for the full-field illumination, as it should.

      Finally, our new pharmacological demonstration of differential boosting in the case of proximal vs. fullfield illumination (see above) is entirely independent of the quasi-linear model fit. So for the main thrust of the ms, which is to demonstrate a proximal localization of nonlinearities and its correspondence to the spatial localization of excitatory afferent inputs, this is now achieved, at least vis-à-vis the NaP current, independently of the qausilinear model. However, we still find the model useful as it is used to estimate the distribution of HCN currents and provides a framework to think about how to manipulate dendritic nonlinearities experimentally.

      3) Technically, the use of ChR2 to modulate dendritic currents is creative. While the authors rightly acknowledge that activation/deactivation kinetics of the ChR2 channel will contribute to filtering, this important point should be expanded with additional analysis and potentially with new experiments. Of particular concern is the transition of ChR2 channels to an inactivated state over the comparatively long oscillating light pulse in Figure 3 Inactivation of ChR2 is prominent over this timescale and would precisely co-vary with the shift in oscillation frequency. To address this, the authors should present a direct measurement of this inactivation and account for it in their analysis of the chirp data. Alternatively, the chirp stimulus could be presented backwards (starting at high frequency), so that comparison of forwards-vs-backwards chirp recordings could disentangle this artefact. Either one or both of these additional experiments would be critical for interpreting the roll-off in photocurrent responses at high frequencies reported in Figure 3.

      Touché! You were spot on with this critique and we were wrong. We have now conducted several new experiments (that appear in the main text and in Figure 3 and all its supplements) that show that including ChR2 kinetics explicitly in the model fits actually makes the fits more self-consistent and removes some of the glaring differences between the results from the somatic voltage perturbations (Figures 1–2) and the optogenetic illumination (Figure 3). So as per your request, we have now presented a direct measurement of the deactivation (Figure 3–figure supplement 1) and we have played the “chirp” backwards (Appendix 1–figure 2) to address the issue of inactivation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First we would like to express our deep gratitude to the reviewers for thoroughly and fairly reviewing our work.


      Reviewer #1:

      Major Concerns

      1. A major concern I have is with the use of DAPT to modulate Notch signaling, and investigate the impact on integrins, Yap, cadherins, etc. Gamma-secretase, the target of DAPT, cleaves not only Notch receptors, but also IntegrinB1, Nectins, Cadherins, Ephrins and more. This recent review lists 149 substrates (Guner & Lichtenthaler Seminars in Cell & Developmental Biology 2020). The risk that some of the results reflect DAPT impact on IntegrinB1, Cadherins etc themselves is significant. The authors should validate their findings with more specific modulation of Notch activity, for example with a Notch blocking antibody, with siRNA, or with SAHM1. We agree with the reviewer´s comment and will add additional key experiments using SAHM1 as alternative inhibitor of Notch activity.

      Furthermore, EGTA was used to "acutely destabilize VE-Cadherin". But EGTA chelates Calcium, which is essential for Notch structure, and EGTA is thus a well-known activator of Notch signaling (see eg Rand MD et al. (2000) Calcium depletion dissociates and activates heterodimeric notch receptors. Mol Cell Biol). The authors rightfully describe and cite this paper, but the use of EGTA nonetheless confounds interpretation. The authors check for NICD levels (at what timepoint?) but the staining is cytoplasmic (also not labelled in the figure per se, but described in the figure legend? - please label the staining in the panel). And in any case, NICD is very short-lived and nuclear staining cannot be taken as a hallmark of signaling activity. In particular if staining is performed at a time point at which the receptor and NICD may have been exhausted/depleted. The authors should validate these observations/conclusions with the Notch reporter to conclusively demonstrate whether EGTA does not activate Notch in their system.

      To test whether transient treatment with EGTA causes Notch activation we will repeat this experiment with Notch reporter activity as readout.

      Trans-endocytosis of NECD on different substrates: the authors suggest that trans-endocytosis of NECD by Dll4 increases on softer substrates. But the authors also show that soft substrates lead to spreading out of cells, which could confound interpretation (is overlapping membranes, not internalization). The authors could validate trans-endocytosis by FACS: check if red Dll4+ cells contain more NECD. It is also not clear to me in this experiment whether the authors are looking at green NECD, or Notch1 full length, since they write "overlap of Notch1 and Dll4", which would not reflect trans-endocytosis but interactions at the cell surface for both cells. Please also define "overlay intensity", or explain further.

      We will validate the trans-endocytosis by flow cytometry. In addition, we describe the procedure for microscopic analysis more clearly (methods section, p 4; results section, p 17-19)

      The authors conclude their introduction with a statement that mechanosensitivity of Notch is linked to endocytosis, but their conclusion from Fig 6C was that Notch stiffness-dependence was independent of endocytosis, using the rhDll4..?

      We have now rephrased this sentence.

      • *

      Minor concerns

      1. In the introduction, the authors describe Dll3 as a Notch ligand that activates Notch signaling in trans. To my knowledge, Dll3 has only been described as a cis-inhibitor of Notch signaling. (I think this may have arisen during repeated edits of the manuscript!) This has now been corrected in the current version.

      In the introduction, the authors state that Notch1, Dll4 and Jag1 control angiogenesis, but then they only describe what Notch1/Dll4 do in the next few sentences. Perhaps one sentence to describe the role of Jag1 would help avoid the feeling of being "left hanging".

      This has now been corrected in the current version.

      Data presentation: please show all bar graphs with the individual replicates (dotplots).

      We have now changed all bar graphs into scatter plots.

      Data analysis/normalization: many graphs represent normalization of values in multiple steps which are not described in the methods/legends/results. For example, Notch reporter gene activity (Fig 1A) is Firefly divided by Renilla, and presumably normalized to the control condition at 1 (or an average of 1 for the three controls?). This is not explained. Also, it is not clear whether the data reported for the Control condition are Huvec on rhDll4 compared (normalized) to Huvec on control substrate (and similar for each other condition). What controls are included in this experiment? Please provide the full data to provide insight into the magnitude of activation by Dll4 itself. Perhaps "Control" is without rhDll4? But the bar underneath A/B implies this rhDll4 was used in all conditions.

      We have edited our manuscript accordingly to avoid these ambiguities.

      Statistics: data should be presented as means +/- standard deviation, not standard error of the mean (see for example Barde & Barde Perspect Clin Res. 2012): "SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD."

      We now use SD instead of SEM.

      Statistics: In the Methods section, the authors state that one-way ANOVA was followed by Dunnett's multiple comparison test, and two-way ANOVA was followed by Tukey's multiple comparison test. Dunnett is used to compare every mean to a control mean, while Tukey is used to compare every mean with every other mean. Fig 1 describes using Dunnett for Fig 1B, but the end of the legend days Tukey was used. However Fig 1A,C show internal pairwise comparisons to plastic. Please be sure to explain which statistics were used where, and why, and if plastic was set as the comparator, please be explicit about this. Fig 3 uses "Sidak's corrected two-way ANOVA" and "Sidak's multiple comparison test"? I think Sidak is a method to correct alpha or p for multiple comparisons, as stated in the first instance, but it is described why this was used here, and not in other analyses, and whether the authors then applied Tukey's post-hoc test as described in the methods section? Similar comments for Fig 6. It is counter-intuitive that the plastic -1.5kPa PDMS difference with no error-bar overlap in 1A would be 1-star significance, while the plastic-70kPa difference with almost overlapping error bars in 1B would be 4-star significance. Please check/show values. In Fig 1B Figure legend, the authors write "Data is presented in a bar plot and compared with the integrin β____1 intensities without DAPT treatment", but this is not the statistical comparison presented. Fig 3B shows a very minor difference with overlapping error bars as 3-star significance? Is this correct?

      We have checked all statistical issues and corrected where necessary. Since the sample size and variance were homogenous in all comparisons we now uniformly use ANOVA and Tukey´s multiple comparison test as post hoc to keep things simple.

      How much nuclear NICD (NICD intensity) is there in control conditions? (Control missing from Fig 1B, D).

      We will repeat the experiment and compare the NICD levels with those in non-activated cells on plastic.

      A DAPI counterstaining for 1B/D right panels would facilitate evaluation of whether NICD nuclear intensity is increased. The same applies for nuclear YAP assessment in Fig 3B. I assume a nuclear counter-stain was done for quantification of nuclear NICD intensity, and nuclear YAP intensity, but this is not described in the Materials and Methods, please add a description of how intensity was quantified, and provide nuclear counterstain images. (Also, what is the unit on the y-axis of "intensity" graphs? Arbitrary units (a.u.)?

      The counterstaining method with Hoechst as well as the use of the nuclear staining for quantitative analysis of images are now described in the Methods section and where needed in the figure legends. The y-axis of the intensity graphs now has a dimension (a.u.). We decided against overlay of the nuclear staining with the NICD or YAP images for graphical reasons (visibility of the respective staining).

      How much "overall" integrin B1 is there in DAPT-treated conditions in Fig 2C? (related to the concept that DAPT could be cleaving integrin B1, it could be depleted at 24 hours..?)

      We will additionally add this experiment and validate the effect of Noch inhibition on the overall intergrin level by the alternative inhibitor SAHM1

      More details regarding the analysis procedure need to be added to the Methods Section. Were cells segmented and then mean intensity estimated for the whole cell? Was this done by means of Intensity Ratio Nuclei Cytoplasm Tool plugin for Fiji alone? Were images background corrected, corrected for inhomogeneous illumination, normalized? In the case of Integrin beta 1 active, the expression seems to be patterned, was intensity expressed as mean intensity of every pixel corresponding to cytoplasm? For VE Cadherin staining, how was intensity estimated (only pixels corresponding to membrane were considered or every pixel of the cell)? Many figures are originated from a confocal microscope: were z-stacks acquired and then maximum projections done? Were z-stacks acquired and then fluorescence quantified in 3D images? Was a single plane acquired or analyzed, and if that is the case, how was this plane chosen?

      The requested information has now been inserted in the respective results and method sections.

      In Fig 4A, how is VE-Cadherin intensity quantified? As an average per field of view? Or per cell? And if per cell, how was each cell delineated? And if not per cell, how were equal cell numbers ensured? In FRAP experiment, how was intensity quantified? Was it per cell, per field of view or per region? Was each bleached region analyzed separately, or each cell? The datapoints should be either added to Figure 4C or as supplementary to assess the fitting. How many bleached regions per cell were done and how many cells were analyzed? In FRAP experiment, was bleaching done with an increased pixel dwell time? Was laser intensity increased? Do you have an estimation of laser power (not percentage) or flux?

      These issues are now described in more detail in the respective figure legend.

      Figure S2 is not referenced in the manuscript - I think a reference to "Figure S3" in the NECD transendocytosis section (no page numbers or line numbering) should be to Fig S2 instead?

      Sorry for this mistake! We corrected this now.

      In Figure 5A NICD nuclear intensity normalized somehow (normalization not explained), and stiffness no longer appears to regulate NICD levels as shown in Figure 1B.

      We have now described the normalization better in the figure legend. The difference to the results in Fig. 1B is that in Fig. 5A the cells were not activated by Dll4 sender cells or rhDll4 (endogenous Notch activity). This is now stated more clearly.

      Fig 6B: From the immuno at right there is a clear stiffness-dependent difference in Transferrin uptake. How were "single cell uptake" and "number of particles" quantified? (How were cell bodies identified?) Uptake could also be verified with FACS.

      In this point, we disagree with the reviewer: we really do not see a systematic difference in intensities between the different substrates. The process of image analysis is now better described in the figure legend. The result was so clear that we did not use FACS as complementary approach.

      Fig 6C: there appear to be very different numbers of cells in the brightfield image at right. Are the 70, 1.5, and 0.5 kPa Notch reporter activities different from one another or only different from plastic? Might these results reflect cell density/increased Notch signaling due to more cell-cell contacts?

      Unfortunately, with decreasing stiffness the PDMS gels become optically more and more cloudy, giving the false impression of a higher cell number. We tried to circumvent this by changing contrast and brightness of the images, but to no satisfying effect. We now mention this issue in the figure legend.

      How was the Dll4 coating of the different substrates done?

      The coating of the substrates is now described under a specific subheading in the Methods section.

      It would be helpful to describe the composition of Collagen G (Collagen I) in the text (it is a risk to expect vendor information to remain available indefinitely).

      The role and composition of the Collagen G coatings was included in the text (p 7). Further information on the manufacturer of the product used is included in the methods section.

      Please list catalog numbers for all reagents, and dilutions used for antibodies.

      We have added this information wherever possible.

      Instead of using red and green for images, maybe cyan, yellow and/or magenta could be used to help the reader see what is being shown (especially if the reader might be color blind).

      We will of course adhere to the respective policy of the publishing journal, once the manuscript is accepted.

      Packages and tools such as Intensity Ratio Nuclei Cytoplasm Tool plugin for FIJI should be referenced.

      We have now referenced respective tools.

      Reviewer #2:

      *Major comments: *

      Is there difference on a growth rate of cells on softer vrs stiffer gels that could affect cell morphology/signaling pathways?

      This is an important point and we will perform additional respective experiments.

      Nuclear localization of NICD and YAP would be good to validate with western blot.

      Quantification of Western Blots (especially after nuclear isolation) is – at least in our hands – much less sensitive and reliable then quantitative imaging. We do not think that this experiment would strengthen our study.

      In Figure 3 and Figure 5, siRNA experiments would strengthen the data. DAPT is not only an inhibitor of Notch but affects to other proteins as well. This should be stated.

      A similar point was raised by Reviewer#1 with the suggestion to use SAHM1 as an alternative to DAPT. As suggested we will add these experiments.

      How was the mean VE-cadherin branch length determined? This term often refers to angiogenesis assay/sprout formation and maybe another one should be considered here to describe VE-cadherin junction morphology.

      Add to all figure texts how many cells were used for the analyses*. *

      The cell number is now added wherever appropriate.

      In Fig. 6C the cell morphology of HUVECs look abnormal in comparison to other images and should be re-done.

      In contrast to all other experiments the cells where not confluent in this case. The different morphology is a sign of the lack of neighbours, not of some problem with the cells.

      Was all the data normally distributed and thus ANOVA was used? Please add more details on the statistics part. Did you remove outliers?

      Like also suggested by Reviewer #1 we have added more information on statistics and streamlined this. The data are normally distributed, outliers wer not removed.

      MTT assay of DAPT would need to be presented as it can be cytotoxic. Cells are not well visible in Fig 2C with DAPT. DAPI and F-actin staining would help to see the cell morphology.

      We will add respective data on cell viability after DAPT (and SAHM1) treatment in a revised version of the manuscript.

      Minor comments:

      Please clarify how coating with rhDDL4 is done as this was unclear at least for this reviewer.

      The coating of the substrates is now described under a specific subheading in the Methods section.

      HUVECs are known to be hard to transfect. Please provide data on transfection efficiencies of all transiently transfected cells.

      We did not systematically monitor transfection efficiencies in this context, since there was always an internal control (e.g. co-reporter in the reporter gene assay) or the data were obtained on a single cell based quantification. Generally, we yield transfection efficiencies around 30% with HUVECs.

      Reviewer #3:

      Major comments:

      • *

      1) The authors use recombinant Dll4 or Dll4-expressing ("sender") cells to activate Notch in co-cultured cells. This is per se fine however, one might over-estimate all other observed downstream effects as endogenous Notch activity is lower. It would be important to see how naïve HUVEC or other primary endothelial cells respond to changes in stiffness. qPCR of Notch target genes such as Hey1, Hey2, Hes5, Dll4 is frequently used as a readout of Notch activity in this context. Also. the Notch transcriptional reporter assay might be a suitable read-out-

      In Fig.5A we show data on endogenous Notch activity (- EGTA) on substrates with different stiffness. In this case NICD levels in the nucleus do not differ. It will definitely be interesting to repeat this experiment based on the reporter gene assay.

      2) As the authors mention in the Discussion, cell density could be of utmost importance given the fact that Notch signaling usually is assumed as an in trans signaling event between adjacent cell membranes. However, also other signaling modes (in cis, cis inhibition, JAG1 vs DLL4 ratio) might be important. As such, the authors should carefully document an report on cell density in all experiments. Secondly, the authors should use other conditions such as sparse cell density and thirdly the authors should measure transcriptional effects of stiffness on Notch ligand expression.

      In all experiments (with the exception of Fig. 6C) we used confluent cells. With the sparse cells (Fig. 6C) we also observe stiffness dependency. Investigating Notch ligand expression is definitely a good idea and will be investigated in the revised manuscript.

      3) The authors need to compare stiffness in their model with physiological conditions in developing tissues and ideally also in tumor which often have increased tissue stiffness.

      *Good point! We have now integrated such comparisons in the Discussion. *

      4) Is Notch activation due to changes in stiffness dependent on the presence of ligands or could it be that (unspecific) binding of Notch receptors to ECM could trigger cleavage just by conformational change?

      Since there is no stiffness dependent response on collagen (Fig. 6C, left panel), an effect of unspecific binding is highly unlikely.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, the authors investigated the role of sleep and brain oscillations in visual cortical plasticity in adult humans. The authors tested the effect of 2 hours of monocular deprivation (MD) on ocular dominance measured by binocular rivalry. In the main MDN session, MD was performed in the late evening, followed by 2 hours of sleep, during which EEG was measured. After the sleep session, ocular dominance was measured, which was followed by 4 hours of sleep, then ocular dominance was measured again in the morning. The results show that the effect of MD was preserved 6 hours after MD. The effect of MD correlated with sleep spindle and slow oscillation measures. The questions asked by the study are timely and findings are important in understanding the visual cortical plasticity in human adults, but I have some concerns regarding the experimental design, analysis, and interpretation of the results, which are listed below.

      Thank you for the positive summary of our results.

      • The authors investigated EEG activities in the central and occipital regions. The results of the relationship between slow oscillations / sleep spindles and deprivation index are very interesting. However, it appears that the activities were averaged across hemispheres in the occipital region. Previous studies (e.g. Lunghi et al., 2011; Binda et al., 2018) have demonstrated that MD is associated with up-scaling of the deprived eye and with down-scaling of the non-deprived eye (page 11). I wonder whether sleep slow oscillations and / or spindles are modulated locally in the deprived occipital region? To answer the first question raised by the authors (how MD affects subsequent sleep), wouldn't it be important to compare between deprived vs. non-deprived regions?

      In humans, the pure monocular recipient cortical regions are very small and represent only very far visual periphery. These regions are impossible to be located by EEG and they are also difficult to locate also with high resolution fMRI (ref to Koulla CB). Visual cortical organization is based on the visual field map: neurons whose visu.al receptive fields lie next to one another in visual space are located next to one another in cortex, forming one complete representation of contralateral visual space, independently of the eye from which the visual information comes. However, at finer scales ocular dominance columns exist and Binda et al (2018) showed that in adult humans MD boosts the BOLD response to the deprived eye, changing ocular dominance of V1 vertices, consistent with homeostatic plasticity. All these are well known facts to the visual community, and we believe are not worthwhile to discuss them.

      • To answer the second question (how sleep contributes to consolidation of visual homeostatic plasticity), the authors compared the deprivation index between two sessions, the main MDN and a control MDM session. The experimental designs for these two sessions were quite different. For example, MD was conducted in the evening in MDN, whereas it was conducted in the morning in MDM. Since there may be circadian effects on plasticity (Frank, 2016), the comparisons between these sessions may not be sufficient in investigating the effect of sleep itself (it could be merely due to circadian effect).

      Thank you for raising this important issue. We performed the dark exposure experiment in the morning because we wanted to minimize the occurrence of sleep during the two hours spent by participants lying down in complete darkness. Preventing sleep under these conditions in the late evening would have been extremely challenging. In order to investigate a possible influence of the circadian rhythm on visual homeostatic plasticity and its decay over time, we have performed an additional experiment. In this experiment, we have tested the effect of 2h of monocular deprivation in the same participants either early in the morning or late at night (at a time of the day comparable to the MDnight and MDmorn conditions in the main study). We report the results of this control experiment in the supplementary materials (Figure S2). We found that the effect of monocular deprivation follows a similar timecourse for the two conditions (ocular dominance returns to baseline levels within 120 minutes after eye-patch removal). Moreover, we also report that the effect of MD is slightly (but significantly) larger in the morning, compared to the evening. The results of this experiment rules out a contribution of circadian effects and reinforces the evidence of a specific effect of sleep in maintaining visual homeostatic plasticity.

      • The authors argue that NREM sleep consolidates the effect of MD. However, consolidation may last days to months or even years (Dudai et al., 2015). Since the effect is gone in 6 hours or so, it may be difficult to interpret it as consolidation. Although the findings of the effects of sleep on ocular dominance plasticity are interesting, the interpretations of the results may need to be clarified or revised.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. Having said that, we would like to point out that the MD boost in amblyopic patients gets consolidated for up to one year and increases across night sleep as we reported in Lunghi, Sframeli et al (2019). Although these data strongly suggest that real consolidation may occur, we agree with the reviewer that our data did not directly address this question and changed accordingly the manuscript.

      Reviewer #2 (Public Review):

      This manuscript is an interesting follow up on a substantial literature on the role of sleep in promoting critical period ocular dominance plasticity, and the role of sleep in promoting adult V1 plasticity following presentation of a novel visual stimulus. For nearly all of that literature (i.e. coming from cats and mice), the focus has mainly been on Hebbian mechanisms. The authors here propose to advance the field by investigating plasticity in adult human V1, which the authors consider to be homeostatic rather than Hebbian, and which the authors consider to be a form of sleep-dependent consolidation. This is an exciting goal, and the overall study designs and control will test the effects of brief MD and subsequent sleep or wake in the dark on V1 processing for the two eyes.

      Thank you for the positive commentary on our study.

      However, the outcomes of the study suggest that the changes observed in V1 across sleep may actually be the opposite of consolidation - rather it is decay of an effect on V1 function caused by prior wake experience (MD), which disappears over subsequent hours.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. We have revised the entire MS through the various sections to handle this important aspect and to consider that a classic correlate of memory consolidation during sleep (spindles density) also turns out to be associated with maintenance of the MD-induced ocular dominance effect.

      The authors claim differences due to sleep, but there is not a direct statistical comparison between sleep and awake-in-the-dark controls.

      We now directly compare the effect of monocular deprivation and its decay after two hours in the sleep vs dark exposure condition (MDnight vs MDmor). We now plot the results of the two conditions in the same graph (Figure 2). We found a significant interaction effect between the factors TIME (before and after) and CONDITION (MDnight and MDmor), indicating a specific role of sleep in prolonging the decay of short-term monocular deprivation.

      There is also no quantification of sleep architecture across the sleep period, to determine whether REM or NREM play a role.

      We have provided a summary table of sleep architecture in the revised version of the Supplementary Materials. The table shows descriptive statistics of sleep architecture on MDnight and CN. Also, we report the result of the paired comparison between the nights and the Spearman correlations between the deprivation indices (DI before and DI after) and the changes between the nights in sleep architecture. Tests indicate that MD does not produce any main effect on the sleep architecture and that there are no substantial associations found between sleep architecture parameters and deprivation indices. Thus, it appears that changes in SSO and spindle frequency and amplitude did not lead to an alteration in the amount of N2 or N3 sleep, as we might expect. At the beginning of the Results section we refer to the table and to the lack of statistically significant effects.

      Finally, while there are tests of changes in NREM oscillations with previous plasticity in wake, there are no direct tests of changes across sleep - i.e. the very changes that could be considered consolidation.

      We thank the reviewer for stimulating us to investigate whether there are any NREM parameters whose change within the sleep cycle can be related to the degree of plasticity maintenance observed at the end of the two hours of sleep.

      For this aim, we 1) partitioned SSO and spindle events into tertiles according to their occurrence time, 2) estimated the average measures of events belonging to the first and last tertile, and considered the variation between tertiles as an estimate of the changes across sleep. We then tested whether there is a consistent relationship between measures of individual retained plasticity (DI after) and changes in SSO and sleep spindles across sleep.

      We did the across sleep analysis of the SSO and spindles measurements and as previously explained none of the parameters showed associations across sleep with the individual DI after sleep. We report these results in the supplementary materials (Figure S8).

      Finally is also not clear that the decay of response changes is due to homeostatic plasticity - it could be just that- decay of plasticity that occurred previously. The terminology used - e.g. consolidation, homeostatic vs. Hebbian - don't seem well founded based on data.

      Thank you for raising an important point. In our study homeostatic plasticity refers to the effect of short-term monocular deprivation (so the plasticity occurred before sleep). We have rephrased the interpretation of our results in terms of stabilization/maintenance rather than consolidation of plasticity

      About homeostatic vs Hebbian plasticity, there is a quite large agreement in the literature stating that indeed the effects are different. Now we make clear in the text that Hebbian plasticity is usually associated to the boost of most successful signals in driving a neuronal response or a behavior. Here the MD produced a boost of the unused, and probably silent, eye and as such the boost it is very difficult to explain in term of Hebbian plasticity. We make now this clear in the introduction.

      Reviewer #3 (Public Review):

      In this study, Menicucci et al. induced plastic changes in ocular dominance by applying an eye-patch to the dominant eye (monocular deprivation, MD). This manipulation resulted in a shift toward even more dominance of the deprived eye, as assessed though a binocular rivalry protocol. This effect was stabilized during sleep whereas it quickly decreases in waking (in the dark). The authors interpret the MD effect as the resultant of cortical plasticity over primary visual areas and its maintenance during sleep as the consolidation of these changes. The authors thus connect their work to the literature on sleep consolidation. They further show that the magnitude of the MD effect is positively correlated with sleep markers that are involved in memory consolidation (slow oscillations and sleep spindles).

      However, I have first conceptual issues with this study. Indeed, previous findings on the replay of memories during sleep and their consolidation were mostly obtained in hippocampus-dependent forms of learning. Here, I do not really see what is it that would be replayed. Thus, I struggle understanding how rhythms, such as sleep spindles, that have been linked to the transfer of hippocampal memories to the neocortex, would be mechanistically associated with low-level plastic changes restricted to primary visual areas. In addition, the effects were observed over occipital electrodes, where sleep spindles are far fewer and lower in amplitude than other cortical regions. Furthermore, the association between MD-related plasticity and slow oscillations is interesting but, since these slow oscillations organize sleep slow waves, the lack of correlation with slow wave is surprising.

      We agree with the review that many of our results are indeed surprising, especially those related to the involvement of the spindles and for these reasons we believe that eLife would be the appropriate journal to present our work. At present the fact that sleep spindles have been associated manly in mediating transfer of memory does not exclude a more general involvement in other sensory functions.

      Connected to these conceptual issues, I think the present work has some important methodological limitations. First of all, the analyses included a rather small number of participants, which could make some analyses, in particular correlational analyses, severely underpowered.

      We thank you for stimulating us to emphasize this limitation. In the section Participants within Materials and methods we pointed out that the complexity of the experimental design and the need to take into account the complexity of sleep expressed through different parameters, the sample size used and the need for corrections for multiple tests led to highlight only associations characterized by strong effect size.

      Secondly, the approach used to explore the correlation between plasticity and sleep features focused on subset of electrodes (ROI) defined a priori. It is therefore difficult to conclude on the specificity of the results. Given the topographical maps provided by the authors, I am wondering if a more exhaustive analysis of the effect at the electrode level could not yield more robust findings.

      The need for ROIs is based on the interindividual variability of brain structures, in particular the large anatomical variability of V1 orientation implying a variably oriented dipole and a variable maximal representation of visual potentials over electrodes from Oz to CPz. Moreover, we have to cope with the volume conduction effect that limits EEG spatial resolution.

      With these limitations in mind, we very gladly adhere to the reviewer's request to evaluate the effects on individual electrodes in more detail. To this end we have prepared supplementary figures which show boxplots and scatterplots for the electrodes inside the ROIs to evaluate main effects and associations, respectively.

      Finally, given the number of features tested, I think it is important to clarify the strategy used to correct for multiple comparisons.

      We thank the reviewer for highlighting an unclear point. In the revised version of the Statistical analyses section, we have provided missing details of the procedure used for handling false positives due to multiple testing. Basically, we applied the FDR correction for each question we asked.

      For example, “at which time points does dominance remain significantly different from baseline?” or, “which EEG feature and in which area of the scalp shows changes significantly dependent on plasticity induced by monocular deprivation?” For each of these questions, we made a group of tests (for the first example, dependent on the number of points at which ocular dominance was assessed until the morning; for the second example, on the number of EEG features examined multiplied by the number of areas in which they were assessed) to which Benjamini & Hochberg's FDR correction was then applied.

    1. Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:<br /> The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest. For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      2) Training history vs learning sets vs behavioral flexibility:<br /> The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      3) Calcium imaging data versus interventions:<br /> The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

    1. Reviewer #1 (Public Review): 

      This study compares concentrations of immune mediators in vaginal samples of young women who report having had or report not having had vaginal sex. The study finds that the concentration of many immune markers is higher in samples of women who report having had sex than in samples of women who report not yet having had sex. While the results are interesting and suggestive, I do not believe this result necessarily indicates that vaginal sex increases levels of these immune mediators (a causal relationship) and that the evidence presented here is strong enough to draw this conclusion. 

      This study presents many methodological strengths. The sample size is amply sufficient to achieve high statistical power for this research question. A particular strength of this analysis is the relatively large number of participants who provided paired before and after sex samples. These samples are particularly valuable because stronger conclusions can be drawn from them, as their comparison is less likely to be confounded by unmeasured confounders. The statistical methods are largely appropriate for the research question, with the use of random effects to account for the correlation in multiple measures per participant. 

      The reason I would not draw causal conclusions from this analysis is that there is a high potential for unmeasured confounding of the association between sex and the concentration of immune mediators. The variables that were included in the multivariable analysis were for the most part not confounders, so the authors cannot claim that their results are free from potential confounding. Confounders are in general variables which are common causes of both the exposure of interest (vaginal sex) and the outcome (level of immune markers), and which are not on the causal pathway and are not a downstream effect of the outcome (inverse causality). The only variable included that is potential confounders is age. Most other variables (pregnancy, contraception, Nugent score, Chlamydia infection, and HSV-2 seropositivity) are either potential mediators of the effect of sex or downstream effects of the level of immune markers. It does not follow that adjustment for these variables would necessarily lead to an underestimation of the causal effect, as it is possible some of these variables have complex relationships with immune mediators, so it is difficult to predict how adjusting for these variables would influence results. Some of these variables are also potentially colliders, so adjustment for them may lead to bias (see an introduction to this topic in Holmberg MJ, Andersen LW. Collider Bias. JAMA. 2022;327(13):1282-1283. doi:10.1001/jama.2022.1820). There is no consideration of general social determinants of health that are more likely to be confounders because they potentially influence both sexual behavior and the immune system: socioeconomic status, ethnicity, education, employment, housing, food security, access to health care, etc. There is overwhelming evidence that young people who are sexually active tend to have very different socioeconomic characteristics than young people who are not sexually active. It is therefore difficult to assess whether the higher level of immune markers in women who are sexually active truly represents a causal effect of sex or simply reflect differences in the type of women who have sex. 

      The paired analysis also suggests that the main analysis is likely to be confounded. The evidence from the paired analysis is much stronger than the evidence from the unpaired main analysis because the paired analysis inherently adjusts for many unmeasured confounders that lead to women having sex by a certain age; the differences in paired samples are likely much closer to the causal effect of sex than the differences from the unpaired samples. We see that, in the paired analysis, the differences in levels of immune mediators before and after sex is systematically much smaller and non-significant for most immune markers. This suggests to me that the main analysis is confounded and overestimates the effect of sex on immune markers. If there is a causal effect, it is likely to be much smaller than the one estimated in the main unpaired analysis. 

      The authors argue that the smaller effects seen in the paired analysis might be due to an effect of time, where samples closer to the start of sex show smaller differences. However, I would need more evidence to be convinced of this. Notably, they use a spline analysis in Figure 4 to show the effect of time since vaginal sex. However, I would have liked to see the p-values for the time-dependent spline effect, in order to see whether the data supports that a difference in slopes before and after sex significantly improves the model. I suspect many of the splines are not significant and may not lend strong support to the hypothesis that time since sex has an effect. It is however difficult to assess this visually without a formal test. 

      While the results from the systematic review and meta-analysis are interesting and show that at least two other studies have shown similar results, I wonder whether these other studies do not have similar issues of confounding. The other previous studies have even fewer paired samples, so are likely to have weaker evidence than the current study. 

      In summary, I think this study has some important methodological strengths in terms of sampling and study design. However, I believe the interpretation of the results should be more tempered and cautious; while there are differences in levels of immune markers in women who have had and not had sex, there is not to my mind sufficient evidence that this difference is the result of a causal effect of initiation of vaginal sex, as there is likely to be some collider bias and unmeasured residual confounding in the analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Radtke et al. use a model of helminth infection in IL-4-IRES-eGFP (4get) mice, in which transcription at the Il4 locus is reported by eGFP, in order to define the transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in the mesenteric lymph nodes (mLN) and lungs. By infecting 4get mice with the hookworm Nippostrongylus brasiliensis, which is well described to induce a robust type 2 immune response, the authors isolated and sorted eGFP+CD4+ T cells from the mLN and lungs at 10day post infection and performed single cell RNA-seq analysis using the 10X Chromium platform. Transcriptional profiling of activated CD4+ T cells with scRNA-seq has been performed in a murine model of allergic asthma, including the lung and lung-draining lymph nodes, but this study involved unbiased capture of all activated CD4+ T cells (Tibbitt et al., Immunity, 2019). Radtke et al. have used a distinct model with Nippostrongylus brasiliensis and have focused on sorting Il4-licensed, CD4+ T cells, allowing for a greater number of captured CD4+ T cells with a "type 2" lymphocyte program for single cell analysis. Furthermore, this study sought to identify distinct and overlapping transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in two "distant" tissues. In support of such an approach, there is growing evidence for tissue-specific and model-specific features of CD4+ T cell differentiation (Poholek, Immunohorizons, 2021; Hiltensperger et al., Nature Immunol, 2021; Kiner et al., Nature Immunol, 2021).

      Upon dimension reduction, the authors found mLN- and lung-specific clusters, including two juxtaposed clusters that form a "bridge" between the mLN and lung compartments, suggesting immigrating and/or emigrating cells. Consistent with previous studies, the dominant lung cluster (L2) exhibited unique expression of Il5 and Il13, enhanced IL-33 and IL-2 signaling, and exhibited an effector/resident memory profile. The authors did find a small cluster in the mLN (ML4) with an effector/resident memory signature that also expressed CCR9, suggesting the potential for homing to the gut mucosa. Whether this population is specific to the mLN or would also be found in the lung-draining lymph nodes remains unclear. In the mLN, the authors also describe an iNKT cell cluster with CCR9 expression and a CD4+ T cell cluster with a myeloid gene signature, but the significance of these populations remains unclear.

      The authors then use RNA velocity analysis to infer the developmental trajectory of Il4licensed, CD4+ T cells from the two tissue sites. Consistent with previous studies, the authors found that T cell proliferation was associated with fate decisions. Furthermore, among the two lung CD4+ T cell clusters, L1 represents highly differentiated, effector Th2 cells while L2, which is juxtaposed to the mLN clusters, represents a population likely entering the lung with the potential to differentiate into L1 cells.

      Next, the authors perform TCR repertoire analysis. The authors identified a broad TCR repertoire with the majority of distinct TCRs being found in only one cell. Among the TCRs found in more than one cell, a substantial number of clones can be found in both tissue sites, which is consistent with the findings that individual CD4+ T cells clones can produce different types of effector cells (Tubo et al., Cell, 2013). The authors find significant overlap of clones between the mLN and lung. In addition, they also identify clones enriched in a particular site and suggest that this represents local expansion. However, an alternative possibility is that certain CD4+ T cell clones are expanded at a particular site because the specific TCR preferentially instructs a particular cell fate. For example, fate-mapping of individual naïve CD8+ T cells suggests that certain T cell clones exhibit a greatly heightened capacity to form tissue-resident memory T cells over other cell fates (Kok et al., J Exp Med, 2020). Lastly, the authors analyze CDR3 sequences, finding the most abundant CDR3 motif belonging to the invariant TCRa chain of iNKTs. Among conventional CD4+ T cells, the abundant CDR3 motifs were not restricted to an exact TCRa/TCRb combination beyond a slight preferential usage of the Trbv1 gene. While TCR repertoire analysis allows for defining clonal relatedness among Il4-licensed, CD4+ T cells, the importance and relevance of the above findings to the in vivo type 2 immune response remain unclear.

      There are several limitations of the study:

      (1) The authors use the term "Th2 cells" to describe all Il4-licensed, CD4+ T cells. While CD4+ T helper cell nomenclature has evolved, Th2 cells and Tfh2 cells are generally used to describe distinct subsets driven by unique transcriptional programs (Ruterbusch et al., Annu Rev Immunol, 2020). While previous data suggested that Tfh2 cells are precursors to effector Th2 cells, subsequent studies support a model in which Tfh2 and Th2 cells represent distinct developmental pathways and should be designated as distinct subsets (Ballesteros-Tato et al., Immunity, 2016; Tibbitt et al., Immunity, 2019). Consequently, the authors' broad use of "Th2 cells" and a description of "Th2 cell heterogeneity" includes CD4+ T cell subsets with distinct developmental pathways that includes canonical Th2 cells as well as Tfh2 and iNKT cells. The clarity of the manuscript would be improved by describing eGFP+CD4+ cells as Il4licensed, CD4+ T cells rather than Th2 cells.

      We thank the reviewer for the helpful comment and state now that our IL-4 reporter positive population also includes cells that don’t meet the Th2 criteria in the introduction (lines 76-78).

      (2) The authors used perfused lungs to isolate Il4-licensed, CD4+ T cells for scRNA-seq of "Th2 cells" in the lung tissue. However, previous studies indicate that leukocytes, including CD4+ T cells, in lung vasculature are not completely removed by perfusion, which confounds the interpretation of a tissue cell profile due to contaminating circulating cells (Galkina, E et al., J Clin Invest, 2005; Anderson, KG et al., Nat Protoc, 2014). This is particularly true in the lung and relevant as the authors found a lung cluster (L2) with a circulating signature and suggested that L2 may represent a recent immigrant "Th2 cells". Thus, it is unclear whether L2 cluster identifies immigrant Th2 cells or simply reflect the circulating Th2 cells trapped in the lung vasculature. The study would benefit of using the intravascular staining to discriminate cells within the lungs from those in the circulation (Anderson, KG et al., Nat Protoc, 2014) for the proper isolation of Il4-licensed lung CD4+ T cells to truly define immigrant "Th2 cells" within the lung parenchyma.

      According to the reviewers suggestion we performed an intravascular staining to discriminate cells within the lungs from those in the circulation (new Figure 2—figure supplement 1). According to the vascularity staining method (with slightly increased time between i.v. and sacrifice compared to Anderson, KG et al., Nat Protoc, 2014 for higher probability of successful staining) the L2 lung cluster is a mixture of circulating cells and immigrating cells which we describe in the text (lines 210-213). The finding that the cells from the vasculature and the cells we classified as “migrating” seem to cluster together based on the similarity of their expression profiles on our UMAP further supports the classification of the L2 tissue fraction as “recent immigrants”. We thank the reviewer for this helpful comment which improved the quality of the manuscript.

      (3) The authors describe T cell exchange/trafficking across organs. However, in general, interorgan trafficking refers to lymphocyte trafficking between distinct non-lymphoid tissues, rather than trafficking between lymph nodes and peripheral tissues (Huang et al., Science, 2018). Rather than inter-organ trafficking, the authors have described shared and distinct features of Il4-licensed, CD4+ T cells from a draining lymph node of one organ (gut) and a distant non-lymphoid organ (lung). The experimental approach used makes interpretation of some of the findings challenging. Specifically, canonical effector Th2 cell differentiation is well described to occur via two checkpoints, including the draining lymph node and the peripheral (non-lymphoid) tissue (Liang et al., Nature Immunol, 2011; Van Dyken et al., Nature Immunol, 2016; Tibbitt et al., Immunity, 2019). In the draining lymph node, Th2 cells acquire the capacity to express IL-4 alone, but do not complete effector Th2 cell differentiation until trafficking to the inflamed peripheral tissues and receiving additional inflammatory signals. Consequently, it is unclear whether the differences identified in the mesenteric lymph node and lungs simply reflect well-described differences between the two Th2 cell checkpoints or organ-specific differences (gut vs lung). Il4-licensed, CD4+ T cells from the intestinal mucosa and lung-draining lymph node would also be needed to truly define organ-specific differences during helminth infection.

      According to the reviewers suggestion, we avoid the term “inter-organ trafficking” and replaced it by “at distant sites” in the title. As the reviewer points out we chose the setup of comparing a lymphoid and a non-lymphoid organ to acquire a broad picture of Th2 developmental stages in Nb infection. The limited overlap in clusters on the UMAP shows that expression profiles between MLN and lung strongly differ. However, this notion is not in conflict with cells of both organs being in a different developmental stage. We added information to highlight it in the manuscript (lines 99-101). Lung and MLN (rather than medLN and MLN) were selected to enable clonal relatedness/distribution analysis of T cells at distant sites. As part of the revision we additionally provide newly generated single cell sequencing data that compares medLN and MLN cells at day 10 after Nb infection and find that UMAP clusters are largely overlapping between medLN and MLN (new Figure 1—figure supplement 3). This suggests that there is no broad medLN/MLN site specific signature present that would force the medLN and MLN cells to cluster apart. Addition of the newly generated medLN/MLN data on the lung/MLN UMAP based on shared anchors (Stuart et al. Cell. 2019) also leads to a clear separation between all LN and lung cells supporting that cells don’t cluster due to a site-specific respiratory tract vs intestinal tract signature but likely based on developmental stages (new Fig. 1C,D). An exception are defined effector clusters that show signs of a site-specific signature (L1 expresses Ccr8, MLN4 and MLN6 express Ccr9, differences are also suggested by clustering described in lines 247-252). A similar phenotype to the one observed on the transcriptional level is observed when we cluster medLN/MLN and lung cells based on scRNAseq suggested surface marker expression after flow cytometric analysis, extending analysis to medLN on protein level (new Fig. 3). It would have also been interesting to include lamina propria T cells as the reviewer suggested but we were not able to extract high quality cells at day 10 after Nb infection which is a common limitation in the Nb model.

      (4) The study includes a single time point (day 10) whereas Tibbitt et al. performed scRNAseq in the lung and lung-draining lymph node at multiple time points during type 2 immunity (Tibbitt et al., Immunity, 2019). As a result, it remains unclear how similarities or differences between the mesenteric lymph node and lung response would change over the duration of helminth infection, especially given the helminth life cycle involves multiple infection stages.

      As part of the revision we screened for surface marker expression in the single cell sequencing dataset on transcript level and stained these on protein level (new Fig. 3 and Figure 3—figure supplement 1). This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) by flow cytometry during Nb infection. We compared medLN, MLN and lung. The dynamic of the response in the medLN and the MLN seems similar with a small delay in the MLN compared to medLN.

      Nb with its relatively well defined migratory path through the body provides a relevant complex model antigen naturally present in the respiratory tract and the intestine during infection. However, analysis of complexity and relevance does often invoke limitations. While stage 4 larvae are found in lung and gut and certainly provide a shared antigen basis between both sites (migration stage from lung to intestine; Camberis et al. Curr Protoc Immunol. 2003), we also think that there is a reasonable number of antigens shared between different larval stages and antigen (either actively secreted or from dying larvae) that are systemically distributed. However, there are probably immunogenic differences between larval stages but to analyze these is beyond the scope of the manuscript.

      While i.e. Tibbitt et al. nicely define cell clusters with a limited number of cells they don’t include any TCR analysis and clonal information. Not much was known about the expansion of T cells in the different clusters in one organ and between organs and we provide relevant data in this regard. Furthermore, HDM as an allergy model might invoke different Th2 differentiation pathways as. i.e. Tfh13 cells are found in allergic settings but not in worm models (Gowthaman U, Science. 2019). With our approach on single cell level we were able to show effective distribution of a number of T cell clones in a highly heterogeneous immune response and describe and functionally validate successfully expanded clones / expanded TCR chains later on (i.e. new Fig. 6). This kind of analysis has not been performed for a worm model before.

      (5) The study analyzed one scRNA-seq experiment that included two mice without validation via flow cytometry or other method to infer a role of a particular finding to the type 2 immune response in vivo.

      As noted above, we screened for surface marker expression in the single cell sequencing dataset on transcript level and measured these on protein level by flow cytometry as the reviewer suggested. This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) during Nb infection (new Fig. 3). Furthermore, we added a newly generated set of scRNAseq data which confirms and extends findings made in the initial sequencing experiment (Fig. 1C,D and Figure 1—figure supplement 3). We also included validation experiments based on the performed TCR analysis and retrovirally expressed three TCRs from our study and confirm Nb specific expansion for one of them in vivo (new Fig. 6 and Figure 6—figure supplement 1).

    1. The third UDL principle is to provide multiple means of expression and action. We find it helpful to think of this as the principle that transcends social annotation: at this point, students use what they’ve learned through engagement with the material to create new knowledge. This kind of work tends to happen outside of the social annotation platform as students create videos, essays, presentations, graphics, and other products that showcase their new knowledge.

      I'm not sure I agree here as one can take other annotations from various texts throughout a course and link them together to create new ideas and knowledge within the margins themselves. Of course, at some point the ideas need to escape the margins to potentially take shape with a class wiki, new essays, papers, journal articles or longer pieces.

      Use of social annotation across several years of a program this way may help to super-charge students' experiences.

    1. “How might we, both individually and as a society, creatively generate new visions of what it means to grow old?”

      I agree with Minha's assessment of the project. Her research question is phrased perfectly for the overall topic of these combined videos. I can't stop, and I think I won't stop thinking about what it truly means for me to age. Each voice represents a background that provides a resource for both the voice owner and the audience to answer this question. Aging for me means being more cautious with words and actions. I consciously do this because I see everyone around me go through this process and talk about it. Aging for me means looking at my grandparents and and thinking what I will do and what I will look like when I reach their age. I thought about this question a few times when I was much younger, then there was a long period of me not worrying about it at all, and in college, the question came back to me at higher rate of frequencies. I often ask myself if my future kids/grandkids (if I ever have them) would care about me and life after death was something that seems to be in my head for the longest time. Aging for me means carrying new responsibilities. I know that there are things that was acceptable when I was one year younger and became inapplicable for me the year after, and vice versa. "What it means to age?" is repeatedly asked throughout the video, motivating us to give it a try and craft our own response. This research question has well summarized for the bigger and better understanding of the purpose that these 'storytellers' and collaborators embed in this project. Same with taylortots, I may revisit this project from time to time with newer perspectives about the definition of growing old. Thank you for the insightful post!

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the relationships between humans' heartbeats and their ability to perceive objects using touch.

      Strengths: This study is a large and sophisticated one, with great attention to detail and systematic analysis of the resulting data. The hypotheses are clear and the study was carried out well. The presentation of the data visually is very informative. With such a large and high-quality set of data, the conclusions that we can draw should be clear and strong.

      Weaknesses: The main drawbacks for me were first, exactly how the data were analysed, and second that there seem to be too many results reported to get an overall view of what the study has found.

      First, there are always a number of choices that researchers can make when analysing their data. Too many choices in fact. So we always need to see a consistent, principled, and transparent account of how those choices were made and what the effects on the data were. At present, I think this needs to be improved, partly in the justification of the analyses that were done; partly by re-doing some analyses and the presentation of results.

      Second, I admit to being a little lost when trying to understand all of the analyses - why there were done, what choices were made, and what the findings were. In some cases, it felt a little bit like the analyses were decided on only quite late - after exploring the data. One clear way to address this would be to divide the main results into two kinds: confirmatory (those that the authors expected to do before the study was run), and exploratory (those that the authors decided to do only after seeing the data). This would be both good practice and would help to focus the reader on what are the most critical findings.

      Achievements: I think the presentation of results needs to be strengthened before I can decide whether the aims are achieved.

      Impact: This will also depend on the revision of the results.

      We thank the Reviewer for these comments. In the original manuscript we thought we have been clear as to those analyses that were planned and those that were exploratory. The planned analyses are in keeping with the previous studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021; Grund et al. 2021). The only exploratory analysis was the inclusion of touch variance as a co-variate. We had not expected that participants would differ so much in how long they held their touch.

      Reviewer #2 (Public Review):

      In this article, the authors set out to discover whether the cardiac cycle influences active tactile discrimination, to better understand the putative relationship between interoception rhythms and exteroceptive perception. While numerous articles have looked at these relationships in the passive domain, here the authors designed an innovative active sensing task to better understand the interaction of sensorimotor processes with the cardiac rhythm.

      The authors report a series of consecutive analyses. In the first, they find that while active discriminative touch is not modulated by the cardiac cycle, non-discriminative touch is such that the start, median duration, and end time of touches are shifted forward along the cardiac cycle towards diastole. Next, the authors examined the proportion of total start and end touches within systole versus diastole and found that across both discrimination and control conditions, touch was roughly 10-25% more likely to terminate during diastole. Further, examining the median holding time, the authors found that touches initiated during systole were lengthened in duration, consistent with a perceptual inhibition by this phase. This last effect appeared to be greatest for the highest stimulus difficulty levels, further supporting the notion that some cardiac inhibition of sensory processing may be at stake. Finally, when examining physiological responses, the authors found that cardiac inter-beat intervals were lengthened during active touch, consistent with the hypothesis that the brain may exploit strategic cardiac deceleration to minimize inhibitory effects.

      Overall, the key effects of the manuscript are fascinating and robust. A major strength of the approach here is the task itself, which utilizes a well-controlled stimulus with multiple levels of task difficulty, as well as an elegant positive control condition. This enabled the authors to look rigorously at difficulty and stimulus condition interactions with the cardiac phase. This clearly pays off in the analyses, as the authors are able to construct a more informative story about how precisely cardiac timing events modulate perception.

      Statistically speaking, I found the overall approach to be rigorous and sound. The study is well powered for a psychophysical investigation of this nature, and the interpretation of results is based on robust effects in the presence of a strong positive control.

      We thank the reviewer for these positive comments on the original version of this paper.

      Reviewer #3 (Public Review):

      The manuscript presents a carefully designed and well-controlled study on active tactile perception and its relationship to internal bodily rhythms - the cardiac cycle. This work builds on previous studies which also showed that active perception/voluntary actions occur in certain phases of the cardiac cycle, but the previous research failed to show/was not designed to show the significance of these synchronizations for perception or behaviour. To my knowledge, this is the first report that seems to experimentally show that active perception in the cardiac diastole leads to behavioural advantages - better tactile discrimination.

      The manuscript itself is very clearly written, the introduction is concise but sufficient, while the results section is very well organised and I especially like how the authors guide the reader through the analysis and additional steps taken to understand the findings even better.

      Yet, despite careful study design, effective visualisations, and elegantly constructed story, there are some analytical choices that, in my opinion, are not sufficiently justified or explained (e.g., selecting a diastolic window equal in length to the duration of systole, instead of using the whole duration of diastole). Such analytical decisions could have (at least some) effects on the obtained results and thus conclusions drawn.

      We thank the Reviewer for these comments. The analyses referred to here were planned and specifically the choice of the windows for defining systole and diastole were identical to the studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021).

    1. Reviewer #2 (Public Review):

      Context:<br /> The authors propose a new analysis of an already well-studied conceptual model of adaptation to a new environment. Individual genotypes are characterized by some (breeding value for) phenotype under gaussian stabilizing selection (meaning that fitness is a gaussian function of phenotype, centered around some optimum value). The scenario assumed is that an isolated population of fixed size is initially at equilibrium (between mutation, selection and genetic drift). This population is diploid and sexual with many unlinked loci acting additively on phenotype (across loci and between homologous chromosomes). This view simplifies the analysis but is also not inconsistent with various empirical analysis of locus specific effects on quantitative traits (the empirical support is discussed and reviewed in both introduction and discussion).

      Then a change in the environment induces a shift in the optimum without affecting any other parameter (strength of selection, population size, mutation effects, existing phenotypes), see figure 1. We wish to know how the population responds to this change, both in terms of phenotype distributions, and the underlying genetic basis (how alleles of various effects change in frequency and contribute to the phenotypic response).

      This process has been at the core of the modelling of adaptation for more than a century, as it is maybe the most natural conceptual framework to describe adaptation to a new environment (a "niche shift" so to speak). It is relevant to both the study of demographic/ecological and phenotypic responses to changing conditions, and to the genomics of the changes associated with this process.<br /> However, in spite of this long history (reviewed in introduction in broad lines), we do not have an exact mathematical description of this process. The reason is that the problem is in fact very complex: the genome is a sea of various genes, each bearing various alleles (depending on the individual), that further interact mutually by selection (even though loci are additive on phenotype), because fitness is not a linear function of phenotype. The simple population genetics with two alleles and one locus seem far away...

      I think it is fair to say that the main route to handle this problem, in predominantly sexual species, has been through the approximations of quantitative genetics. There, each locus is assumed of small effect and linkage disequilibrium between them is neglected. This has led to empirically testable, and often quite accurate, predictions on the response to selection in terms of mean phenotypic change. Yet, even under this broad approximation strategy, there are various ways to derive predictions, each neglecting one force or another (genetic drift most of the time), or looking at the process over short or longer timescales.

      Aim and achievements:<br /> The authors include their work within this broad framework, but set to derive new approximations that are intended to cover several of the existing approach as subcases, and especially to handle genetic drift effects in finite populations (large ones), and short vs. longer timescales. I believe they succeed quite well in doing so: they provide clear approximation methods (in appendix mostly) and substantial simulations to show their accuracy. The derivations are fairly technical but most of the time they manage to give an intuition of where they come from and illustrate this intuition via figures in the main text. They produce a prediction of two main observable dynamics: that of the (breeding value for) phenotype itself (its mean over time, variance, third moment), and that of the genetic contribution of various loci and alleles along the genome (depending on the allelic effect on phenotype). They also describe two timescales where the dynamics are fairly different, a short timescale where the mean phenotype is shifting (quite rapidly over tens/hundreds generations) towards the new optimum, and a longer timescale where the higher moments and mostly the genetic basis changes while the mean phenotype merely wanders in a narrow vicinity of the new optimum. The connection between the two timescales is important as it is the slight differences in allele fates during the first one that result in differences in long term behavior in the longer one (illustrated in figure 3).

      The main achievement on the phenotypic response is mostly to reobtain previous approximations under somewhat different or broader assumptions. This is not useless as it may explain why these known predictions (the "Lande model") are surprisingly robust to deviations from the required conditions (e.g. figure 2). However, I think that some extra exploration of the parameter space (away from the required conditions) would allow to really see when the Lande model does fail on mean phenotype dynamics over short timescales, as anticipated. The question of whether this range is relevant remaining open to empirical measurement.<br /> Therefore, the main contribution of this ms is not on phenotypic responses but on the underlying genetic basis, and what we may expect to observe when measuring QTL's or GWAS between two populations separated by an environmental shift in the past: are there many loci contributing limited difference, or fewer loci contributing most of it. In that respect, eqs 20-21 and 25-26-27, and figures 5 and 6 display the main findings and thei check by simulations. These findings, although stemming from quite elaborate derivations, yield a fairly simple and yet accurate outcome, at least in the parameter range studied. Various other parameter sets are also checked against simulations in the appendix, and the simulation code is made available for any further check (as exploring all the possible parameters is a fairly taunting task, for an article of its own probably).

      Limits:<br /> I believe the main limit of this work is fairly explained in the discussion: to achieve mathematical tractability (a full numerical treatment being inherently impossible given the many parameters), many simplifying assumptions must be made (simple fitness landscape, simple effect of the environmental change, simple demography etc.). This means that it is possible that empirical observations will differ from the predictions for various reasons. However, quantitative genetics have already proven reasonably robust and accurate in predicting observed phenotypic dynamics, using comparable approximations so it is not madness to hope that the same will happen concerning the genetic basis of adaptation. Also, I would suspect that the methods proposed in appendix will most likely extend fairly easily to some deviations from the model's assumption: change in phenotypic variance with the new environment (a form of plasticity), or in width of the fitness function, or change the population size, without too much effect on the main conclusions. Still, some other limits may not be overcome as easily (e.g. pleiotropy among multiple traits, or non-stationary optimum), but it seems (a priori) that part of the approach could still be adapted for these situations. The main "wall-hitting" limit of the paper is inherent in the very basis of the approach, namely assuming mild changes occurring in weakly linked polymorphic and numerous loci as opposed to strong changes occurring on more tightly linked and fewer loci. These limits are all fairly described in discussion.

      Overall, this paper is not an easy read, but not by lack of clarity, rather because the problem at hand is complex, and there is a lot of material to describe. Each part flows quite well in my opinion, but there are many parts to read.

      Potential impact:<br /> I believe that because it yields relatively simple analytic outcomes (at least the predictions in main text), the paper could be useful to data analysis, mostly in the field of genomics of adaptation where it may provide testable predictions for GWAS and QTL data. It could also be used to infer genetic distributions (v(a),f(a)) from observed QTL or GWAS data, if the model is deemed valid.

      In the field of theoretical population genetics, it may also provide a methodology to capture sexual adaptation dynamics in other contexts by mixing various approximation methods: connecting distinct timescales, connecting deterministic approximations for phenotype and diffusion approximations for allele frequencies. This may not be the first time of course (see e.g. "stochastic house of cards" and their extensions), but it is here used in the context of adaption dynamics rather than equilibria, for the first time I think.

    1. Rule #7: Predict the future. The same way you would predict what's going to happen in the next season of your favorite show. Is Beechum going to kill the President? Figure out what you think is going to happen in the future based on the details of what's happened in the past.This can however very quickly lead into the mistake of Historicism. The predictive power of History has a limit. While it is true that you can identify historical trends and you can make educated guesses as to them happening again if the conditions are met, I don't think it's necessarily a good thing to rely on it. The reason for this being that we then begin to look for similarities and lose sight of other factors which may change the situation, and we run the risk of attempting to formulate historical laws. In this sense I agree with Karl Popper, in that these 'laws' aren't falsifiable, they aren't testable, unlike the sciences; history does not have this luxury

      Predict With The Objetive of Validating And Changing Your Mental Model, trying to be as comprehensive as possible. Try to Think About the powers that rules and how their oppositions can react to their actions.

      Do not make your prediction long in extension. The further you try to get the chances that you would make a critical error in your prediction grows exponentially

      Some propms always have reveal valuable and memorable information about the period.

      Changing Your Mental Model, Not To Get IT

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting and well-performed study that adds to the literature base. The authors investigated the role of a discrete brain pathway in binge drinking of alcohol. They adopted a multidisciplinary approach that overall suggested that alcohol-induced changes at synapses of anterior insula (AI) cortex inputs to the dorsolateral striatum (DLS) maintain binge drinking. Further, they suggest this may be a biomarker for the development of alcohol use disorder (AUD).

      Strengths:

      1. Extends previous studies and builds further evidence for AI→DLS involvement in aberrant alcohol intake.

      2. Adopts elegant approaches to isolate the defined connections. This included in vivo optogenetic stimulations (both open and closed loop), recording of defined synapses in slice preparations, applying in vivo optogenetic stimulation parameters to isolated brain slices

      3. Well-controlled for the most part, although at times the authors assert "specific" effects without unequivocal proof. For example, the insula also projects to the ventral striatum and this pathway has been implicated in regulation of alcohol intake in rodent models (Jaramillo et al., 2018), and is activated in heavy drinking humans during high threat related alcohol cue presentation (Grodin et al., 2018).

      4. Measures the microstructure of drinking behavior in subjects.

      5. Employed an artificial neural network and machine learning to interrogate data. After training the network it could predict both the fluid consumed (water vs alcohol) and the virus type based on drinking microstructure data.

      6. Applied a series of behavioral tests to confirm that stimulating the defined pathway was not in and of itself reinforcing, anxiogenic or altered locomotion.

      Weaknesses:

      1. Only used male mice, in humans binge drinking in females is a major problem and rates of AUD between males and females have been converging in recent times (Grant et al., 2015).

      We took age-matched female mice that were injected with AAV-ChR2 into AIC and had them undergo the same 3 weeks of Drinking in the Dark to replicate the male data displayed in Figure 1 with an experimental focus on AIC inputs. We then performed whole cell patch clamp electrophysiology in DLS brain slices from these female mice. We measured optically evoked input-output responses (oEPSCs), AMPA/NMDA current ratios (oNMDA/oAMPA), and paired pulse ratios (oPPR). These data are presented in supplemental figure 4. In contrast to males, we did not observe any effect of alcohol consumption on AIC inputs into the DLS of female mice compared to males. We also combined both male and female datasets to statistically determine if we had sex differences for these specific measures by the existence of a main effect and/or a sex x fluid interaction. We report these statistics in text from lines 180 to 195, where we note that we did not have a sex x fluid effect for oEPSCs but did note that we had a sex x fluid effect for our oNMDA/oAMPA synaptic plasticity measure. This finding further justifies the behavioral data and circuit manipulations being conducted in solely male mice.

      While this is a fascinating sex difference and important data for the field, this manuscript is not specifically about exploring sex differences per se. We believe we have done our due diligence and correctly reported the existence of sex differences, or the possibility of sex differences, but the electrophysiological findings that we later modulate in vivo are only present in males. We point out that future work is needed to determine the contribution of circuit-specific changes in females at these synapses. Ultimately it will take much more work to fully elucidate sex difference circuit-specific mechanisms that we feel are far beyond the scope of this manuscript.

      1. At times over-interpreted, especially with regards to specificity.

      We are not exactly sure what the reviewer is referring to with “regards to specificity,” but we have done our best to address what we think they are asking and hope that we have adequately addressed this critique. We added sentences (lines 173-178) regarding alcohol-induced plasticity at other inputs to DLS that were not tested and (lines 442 - 446) how we are not sure whether these synapses control consumption of other non-alcohol substances (but point out our prior sucrose drinking data from Muñoz et al., Nat. Comm. 2018).

      1. Lacks a mechanism, although the authors do acknowledge this.

      This is just a first step towards discovering a mechanism. We previously identified an unusually alcohol-sensitive synapse and are now elucidating its behavioral role and some associated plasticity at that synapse that may be part of a mechanism. With our new single session alcohol data to compare our 3 week drinking data to, we are closer to beginning the process of discovering a mechanism. Additional work that is beyond the scope of this manuscript is needed.

      1. I would like some more discussion about the potential for this to be a biomarker in humans.

      We have removed language in the body of the manuscript and expanded on the implications of our findings at the end of our results and discussion from lines 514 to 548.

      Reviewer #3 (Public Review):

      Haggerty et al. assess how the projection from the agranular insular cortex to the dorsolateral striatum contributes to binge drinking in mice. The authors use whole-cell patch-clamp electrophysiology to examine synaptic adaptations following binge drinking (Drinking-in-the-Dark) in male mice, finding a constellation of changes that include increased AMPA and NMDA receptor function at insula synapses onto striatal projection neurons. They go on to assess a causal role for this projection in regulating binge drinking using optogenetics, finding that stimulating insula->striatal transmission in vivo reduces total ethanol consumed during DID, along with several specific behavioral measurements of drinking microstructure. One of the most interesting of these findings is a decrease in "front-loading", or drinking during the very beginning of the session, a phenotype that has been associated with problematic drinking and alcohol use disorder in humans. Finally, the authors use machine learning to build a predictive model that can reliably discern stimulated mice from controls. These studies improve our understanding of the neurocircuitry that mediates binge drinking and synaptic and circuit adaptations that occur following binge drinking. Experiments are blinded and performed in a rigorous manner, including physiological validation experiments in support of the in vivo optogenetic manipulation. Despite many strengths, there are significant limitations and gaps in the electrophysiology studies included in this version of the manuscript. As acknowledged by the authors, there are curious findings that are seemingly at odds with each other, and further studies addressing cell type specificity and/or feedforward inhibition would significantly improve the interpretation of this work. Furthermore, the manuscript would be significantly improved by an expanded Introduction containing more specific background information along with a standalone Discussion to place these findings within the broader literature. Lastly, a major limitation of these studies is the low number of mice used for the in vivo optogenetic control experiments and the exclusion of female mice throughout.

      Major concerns:

      1) Expanded Introduction and Discussion. The Introduction does not discuss and/or downplays historical literature investigating neuroadaptations following binge drinking. Studies examining changes in glutamate receptor function within striatal circuits should be discussed in greater detail, rather than the broad pass and review citation included. Behavioral studies examining how the function of the insula and DLS regulate ethanol exposure should also be discussed, especially including work examining the insula to accumbens pathway. It would also be worthwhile to reference human studies implicating the insula and DLS in AUDs.

      We have expanded the introduction and discussion to include these topics.

      2) It is difficult to form a comprehensive picture of the electrophysiological changes reported in Figure 1. The data seems to indicate increased AMPAR function, even more increased NMDAR function, decreased glutamate release probability, and decreased population spikes. These conflicting findings are acknowledged and there are two possible factors mentioned in the manuscript - differential engagement of MSN populations and changes in feedforward inhibition through local interneurons. I disagree with the authors' dismissal of potential MSN subtype-specific effects contributing to these discrepancies. Although AIC inputs innervate D1 and D2 MSNs comparably under control conditions, it is quite possible that the pathways are differentially altered following DID, as has been observed in many reports of alcohol or drug exposure (e.g. Cheng et al. Biological Psychiatry 2017). On the other hand, I wholeheartedly agree with the authors that AIC-driven feedforward inhibition through local interneurons (or even MSNs) could explain the curious divergence between the synaptic and population-level changes depicted in Figure 1. I think additional experiments addressing to help connect the dots are critical in interpreting the changes described in this manuscript. The authors could consider targeted recordings from specific cell types (e.g. D1, D2, and/or interneurons), measurements of AMPA/NMDA receptor subunit stoichiometry, and/or additional experiments in conditions where feedforward transmission is blocked (e.g. PTX or TTX/4AP).

      The reviewer has excellent points that will help elucidate a mechanism. Many of these suggestions are planned experiments in our laboratory, but are, in our opinion, beyond the scope of the present manuscript. Please see our response to Reviewer #2’s 3rd stated weakness. We have revised the text to incorporate some of the points raised here.

      3) N=2 mice in the ICSS experiment in Figure 4J is not sufficient to interpret, and including error bars on this data set is misleading. There also appears to be a difference in distance traveled between GFP and ChR2 mice in Figure 4C, but statistics are not reported. It is also hard to understand what that might mean given the way these data are normalized.

      For this revised manuscript we reran this experiment with 6 animals per group and updated Figure 4 I and J and the accompanying methods section titled “Intracranial self-stimulation” to reflect the change. We also note that the new, correctly powered experiment confirmed the previous claim that AIC inputs to the DLS do not modulate operant responding behaviors.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors have used every possible combination and permutation of treatments at different stages of diapause and post diapause development in the mouse and used conditional gene knockouts at different stages to tease out the interactions of Foxa2 with Msx1 and LIF in the reactivation and implantation process in mice. The authors extend diapause further after treatments with progesterone and an estrogen-degrading chemical to show that this will prolong diapause in the presence of Msx1. Overall this study advances our knowledge of the cross-talk between uterine endometrium and the blastocyst during and after the remarkable phenomenon that is diapause.

      Strengths

      Demonstrating that Msx1 is critical to maintaining diapause, and that diapause is maintained in Foxa2 deficient mice have clarified their interactions. It is interesting that LIF triggers implantation on day 8 but cannot support the pregnancy to full term. Suppression of the estrogen effects by progesterone or fulvestrant increases the duration of diapause. Demonstrating that Foxa2 induces diapause via interactions with MSX1 shows Foxa2 plays such an important role in the control of diapause and adds another 'cog' to the complex wheel of its control.

      Weaknesses

      There is an assumption that everyone will understand the various manipulations that are done in this study - some effort needs to be made to clarify each experimental stage. How long are the embryos viable after the extension of the diapause by the various manipulations.

      The very positive review by a well-known expert in the field of diapause is reassuring, and we agree with her suggestions to improve the quality of the manuscript. As recommended, we now provide a scheme to summarize our findings to illustrate the length of embryo dormancy (see Fig. 7).

      Reviewer #3 (Public Review):

      Matsuo et al. have authored a manuscript describing the effects of depletion of the forkhead box gene, Foxa2, on embryogenesis and gestation in the mouse. The effects of this treatment are the induction of the diapause arrest in the development of the embryo and consequent dormancy. The manuscript is wellprepared, and the figures, for the most part, are didactic and interpretable. Although the conclusions are interesting, the principal weaknesses of the manuscript are the lack of novelty and the perceived absence of some controls and follow-up experiments.

      Controls and Follow-ups:

      1) The Cre/lox system depletes rather than deletes genes. Although in situ data are presented, these are not judged to be quantitative. The usual qPCR analysis of tissues could have established the quantity of depletion. Stupid but can be done. This is important because the frequency of implantation sites in both Cre/lox models (lines 111-113) may be attributable to the residual expression of Foxa2.

      The Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ mouse models used in the current study have been used in the previous studies (refs 7 and 8 in the manuscript). The deletion efficiency of Foxa2 in Foxa2f/fPgrCre/+ mice was examined by RT-PCR and IHC (figure 2 in ref 7); while the deletion efficiency in Foxa2f/fLtfCre/+ mice was examined by IHC (figure S1 in ref 8). The deletion efficiency has been proven by hundreds of publications since the generations of Pgr-cre in 2005 and Ltf-cre mice in 2014.

      Although these mouse lines have been used before, we confirmed the deletion of Foxa2 at the beginning of our study at protein levels (fig 1c) and RNA levels (fig 1d). We understand that the reviewer is trying to link the observation that some of the knockout animals still carried implantation sites on day 8 of pregnancy with the possibility that the deletion of Foxa2 is not complete. However, it is not uncommon to observe such phenotypes that are not fully penetrant even in systemic knockout mouse models. Nonetheless, we now provide real time PCR results of uterine Foxa2 on day 4 of pregnancy in all mouse models used in the current manuscript in the new supplemental figure 1.

      2) The most novel and salient finding of the present study is that the depletion of Foxa2 results in embryos that are in a state that "morphologically resembled dormant blastocysts". A useful experiment would have been to transplant these embryos to normal recipients or to culture them in vitro to determine whether they were capable of reactivation from the dormant state.

      Whether dormant embryos in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be reactivated is the main question we studied. The results in figures 4-6 address this question. The blastocysts in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be activated on day 4 as shown in figure 4b. Without any support, blastocysts in Foxa2f/fLtfCre/+ uteri still can be reactivated on day 8 (figure 4b). In the following experiments and results shown in figures 5 and 6, we tried to improve the uterine environment by supplementing progesterone and estrogen. Dormant embryos are successfully re-activated by a LIF injection and the pregnancies proceeded to full terms.

      This reviewer suggests using normal recipients to test the reactivation of dormant embryos. Given dormant embryos can be reactivated in a knockout uterine environment, embryo transfer experiments using normal recipients are an addition measure to test the integrity of embryonic dormancy. The embryo transfer experiments may be futile attempt in our studies because of the following reasons.

      The numbers of mated mutant females that yield blastocysts are relatively meager and so are the numbers of blastocysts recovery, especially from diapausing donors. It is well known that implantation rates after blastocyst transfer are compromised due the surgical trauma and anesthesia. Therefore, the results from these experiments may not provide meaningful information.

      Furthermore, during the pandemic our mouse colonies were drastically reduced, and we are still recovering from this downturn during this “New Normal”. Notably, pregnancy rate fluctuates throughout the year even if mice are housed in a controlled environment, and pregnancy rate is often relatively poor in mutant mice which of course depend on the genetic background and diets (DOI: 10.1126/scisignal.aam9011). Most importantly, viability of diapausing embryos is amply evident from our experiments (Figs. 4-6)

      3) Figure 3C indicates that embryos recovered on Day 8 had an extensive proliferation of ICM cells, but not trophoblast. Previous studies have explored the progression of entry and exit from diapause in the mouse (DOI: 10.1093/biolre/ioz017) showing that reactivation of the embryo from diapause commences in the ICM and then proceeds to the trophoblast. It therefore may be possible that proliferation in the trophoblast is not suspended, rather than the recovered blastocyst has resumed development and that mitotic activity has not yet reached the trophoblast.

      It is common to see KI67 expression in the ICM of dormant embryos. Figure 4D from the paper quoted by this reviewer presents Ki67 staining on embryos undergoing diapause at different stages. In our study, we showed Ki67 staining on dormant embryos collected on day 8, which equals D7.5 in their figure. Our data in figure 3C is consistent with observation shown. Without LIF, embryos remain dormant in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri.

      4) In Figure 4B, neither the Ltf nor the Pgr Cre treated uteri appear normal on Day 8. This is not consistent with the conclusion in lines 170 et seq. of the manuscript. It is difficult to discern normality from Figure 4C, but it is clear that the PgrCre-lox uterus does not conform to the controls. It is later noted that there is edema in the uteri at this time in the Day 8-treated PgrCre/lox mice (lines 217-218).

      We have clarified our description.

      Lines 173-176: Notably, implantation sites with a normal appearance were observed in Foxa2f/fLtfCre/+ uteri when LIF was given on day 8 of pregnancy (Figure 4b), albeit Foxa2f/fPgrCre/+ uteri with edema have only faint blue bands. Histology of implantation sites confirmed this observation.

      In line 217, we stated that “the uterine edema in Foxa2f/fPgrCre/+ females two days after LIF injection on day 8…”. Figure 4B showed that Foxa2f/fPgrCre/+ uteri with edema have some very faint blue bands suggesting implantation-like reaction. But we do not think they are real implantation, which is confirmed by figures 4c and e.

      5) In Figure 6B, the implantation sites appear substantially smaller in mice of both mutant genotypes. Supplemental Figure 4 suggests that this is not the case. It is unclear whether the samples chosen for figures are representative of the uteri and whether variation in the size of implantation sites was observed.

      In figure 6B, the Foxa2f/f uteri samples were collected on day 10 of pregnancy, which is same as when Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ tissues were collected. Since embryos implanted in Foxa2f/f uteri on day 4 night but in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri on day 8 after LIF injections, the implantation sites are bigger in Foxa2f/f uteri. However, in supplemental figure 4 the implantation sites were collected from Foxa2f/f females on day 6 of pregnancy, which show similar size as compared to implantation sites collected from Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ females 2 days after LIF injection.

    1. Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work. Excellent work and I think it will generate a lot of interest in the community.

    1. Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      1. Quintana-Murci L. Understanding rare and common diseases in the context of human evolution. Genome Biol. 2016 Nov 7;17(1):225. PMCID: PMC5098287<br /> 2. Blekhman R, Man O, Herrmann L, Boyko AR, Indap A, Kosiol C, Bustamante CD, Teshima KM, Przeworski M. Natural selection on genes that underlie human disease susceptibility. Curr Biol. Elsevier BV; 2008 Jun 24;18(12):883-889. PMCID: PMC2474766<br /> 3. Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. Public Library of Science (PLoS); 2009 Aug;5(8):e1000592. PMCID: PMC2714078<br /> 4. Granka JM, Henn BM, Gignoux CR, Kidd JM, Bustamante CD, Feldman MW. Limited evidence for classic selective sweeps in African populations. Genetics. Oxford University Press (OUP); 2012 Nov;192(3):1049-1064. PMCID: PMC3522151

    1. Author Response:

      Reviewer #1:

      The manuscript by Bellio and colleagues is based on the experimental model of T. cruzi infection in WT, MyD88-/- and IL-18-/- mice previously described by the same group in a 2017 eLife publication. The main message of the current study is that, in addition to IFN-g+ Th1 effectors, T. cruzi infection induces an even larger population of cytotoxic CD4+ T cells.

      The characterization of the cytotoxic CD4+ T cells is well documented. The data shown are convincing. However, since Burel et al. (2012) described the existence of a similar population in humans infected with P. falciparum (an intracellular pathogen), the authors should modify the statement (line 35-36) in the abstract.

      First, we would like to thank Reviewer #1 for the positive comments on our work.

      Please note that our statement in the abstract is: “Here, for the first time, we showed that CD4CTLs abundantly differentiate during mouse infection with an intracellular parasite” refers to mouse experimental models of parasite infection and not to human studies. We could not find any article with Burel JG as first author published in 2012; we believe that Reviewer# 1 is referring to a study published in 2016 (Burel et al. PLoS Pathog. 2016 Sep 23;12(9):e1005839), in which a population of CD4 T cells with cytotoxic properties was described in humans after primary exposure of blood-stage malaria parasites. Please note that the finding of the important role of T-cell intrinsic IL- 18R/MyD88 signaling for the development of a strong CD4CTL response is also part of the main message of our manuscript.

      Similarly, the title "Cytotoxic CD4+ T cells… predominantly infiltrate Trypanosoma cruzi-infected hearts" is an overstatement. If cytotoxic CD4+ T cells outnumber 10:1 IFN-g-secreting population (in lymphoid tissue) their higher representation in hearts of infected mice is not a selective phenomenon but rather expected.

      We would like to thank Reviewer #1 for this comment, giving us the opportunity to clarify this point. Of note, we were not referring to the ratio of CD4CTL to Th1 cells, but to the frequency of CD4CTL among all the CD4+CD44+ (activated/memory) T cells. In fact, as shown in Figure 7-figure supplement 2, (now added to the revised ms), we found that the frequency of GzB+ cells among all activated/memory CD4+CD44hi T cells is significantly increased in the heart compared to the frequency of GzB+ among CD4+CD44hi T cells found in the spleen. Please also note that the frequency of CD4+ T cells expressing both GzB and PRF also increases in the heart compared to the spleen (Fig. 7F, middle panel and Fig. 1D left panel). We are now including this information in the revised manuscript, clarifying this point.

      My major concern is that the function of these cells remains undefined. Are they beneficial or detrimental for the host? It appears that the authors themselves could not make up their minds. The GzB+ CD4+ T cells protect but do not decrease the parasite load (Fig 6G).

      Our results in the mouse model of infection with T. cruzi, employing the adoptive transfer of WT CD4+GzB+ T cells to the susceptible Il18ra-/- mouse strain, indicate a clear beneficial role of CD4CTLs in the acute phase of experimental T. cruzi infection. Significantly extended survival was observed in the group of mice receiving sorted CD4+GzB+ cells, without, however, decreasing parasite load (Figure 6G). We would like to comment here that in order to be beneficial to the host, an immune response does not always result in decreasing the pathogen load. In fact, in certain circumstances, to hinder the excessive inflammatory response (which can lead to host death), is an advantage for the host, even if this does not result in the reduction of the pathogen numbers. The advantage conferred to the host by regulating the inflammatory response was probably also explored in pathogen/host co-evolution, giving rise to chronic infections, where the host can survive for a longer period and the pathogen increases its chances of transmission (Schneider DS & Ayres JS., 2008, Nat Rev Immunol;8(11):889; Medzhitov R, et al, 2012, Science; 335(6071):936). Therefore, the results shown on Figure 6G are fully compatible with a potential regulatory role exerted by CD4CTLs, previously proposed by other authors (Mucida et al, Nat. Immunol. 2013), and point to the beneficial role of CD4CTLs for the host in the acute phase of infection with T. cruzi, probably by contributing to the decrease of immunopathology, the detrimental side of an exacerbated immune response, as discussed. Also favoring this hypothesis, the frequency of CD4CTLs expressing immunoregulatory molecules is increased when compared to other activated CD4+T cell subsets (Figure 3 and new Figure 7-figure supplements 3 and 4). Please see our complete discussion on this subject in the revised manuscript.

      On the other hand, during the chronic phase of the disease, the persistence of the immune response against the parasite might involve functional changes in the CD4 T cell response. This hypothesis could explain the association found between CD4CTLs and cardiomyopathy in chronic Chagas patients. Therefore, a beneficial role for CD4CTLs in the acute phase is totally compatible with the hypothesis that, during the chronic response in a persistent infection, CD4CTLs might acquire a detrimental role, contributing to immunopathology. Of note, several studies in the literature have shown a beneficial role for Th1 cells during the acute phase of infection with T. cruzi, while the Th1 response has also been associated to a pathologic outcome during the chronic phase of Chagas disease (reviewed in Ferreira et al, 2014 World J Cardiol 2014 6(8):7820 and in Fresno & Girones, 2018, Front.Immunol. 9;351). Therefore, it is not implausible that the CD4CTL subpopulation, could also display different roles in the acute versus the chronic phases of the infection with T. cruzi. However, at present, this hypothesis remains speculative as stated in the manuscript discussion. An extensive investigation of the role of CD4CTLs, as well as of immunoregulation mechanism acting in chronic Chagas patients need to be conducted to fully answer this question, which is beyond the scope of the present work. Nevertheless, we acknowledge that the alternative possibility remains, in which the higher levels of CD4CTLs in chronic patients reflect elevated parasite burden and/or inflammation in the heart, without a direct involvement of this cell subset in the pathology. Please see our answer to Review #2 on this topic and the inclusion of discussion clarifying this point in the revised manuscript.

      Are they terminally differentiated or "exhausted" effectors? GzB+ CD4+ T cells can be found in the hearts of chronically infected mice, but we do not know if they are specific for pathogen or self Ags. Do they express the markers of exhaustion on day 14 in the heart?

      1) We have commented in the first version of the manuscript that one of the limitations of our work is the fact that very few CD4 epitopes of T. cruzi presented by I-Ab have been described so far, and this limits the investigation on the specificity of CD4CTLs in our model. This is a very interesting and important question, which, however, is not possible to address in the present work.

      We would like to thank Reviewer#1 for the suggestion of performing a broader analysis on the expression of immunoregulatory markers associated with exhaustion and/or terminal differentiation, which adds for the comprehension of CD4CTL biology in the model of acute infection with T. cruzi. Whether GzB+CD4+ T cells are terminally differentiated or "exhausted" effectors is an interesting and debated question. It was initially hypothesized that since exhausted T cells share features with terminally differentiated T cells, this would suggest a developmental relationship between these cell states (Akbar, A.N. & Henson, S.M., 2011 Nat. Rev. Immunol.11:289; Blank, C.U. et al, 2018, Nat.Rev.Immunol,19:665). However, subsequent studies showed that exhausted T cells seem to be derived from effector cells that retain the capacity to be long-lived (Angelosanto, J.M. et al., 2012, J. Virol. 86: 8161). In the first version of our manuscript, we investigated the expression of several markers associated with exhaustion such as 2B4, Lag-3, Tim-3 and CD39, besides the downregulation of CD27 on GzB+ CD4+ T cells (Figures 1E, 3B, 3D-E and 5E). In general, cells losing the expression of CD27 have been characterized as Ag-experienced further differentiated cells (Takeuchi and Saito, 2017, Front.Immunol. 8:194). Our finding that, differently from GzB-negative cells, most GzB+CD4+ T cells had lost the expression of CD27, suggested to us that CD4CTLs present in the spleen of mice infected with T. cruzi might be further differentiated T cells (Figure 3E). The transcription factor Blimp-1 controls the terminal differentiation of cells in a variety of immunological settings and its high expression in CD4+ and CD8+ T cells is associated to the expression of immunoregulatory markers (Chihara, N. et al, 2018, Nature 558:454). The observed high expression of Blimp-1 by GzB+CD4+ T cells (Figure 5D) is also compatible with the hypothesis that CD4CTLs are terminally differentiated. Of note, most of the exhaustion studies were performed on CD8+ T cells and it is still not well established if this phenomenon is equally regulated in CD4+ T cells. We have now extended the investigation on the expression of terminal differentiation/exhaustion markers, including PD-1 staining, on GzB+PRF+ CD4+ T cells in the spleen and in the heart of infected mice. Results in Figure 7-figure supplement 3, show that CD44hiGzB+PRF+ CD4+ T cells compose the subset of activated cells among which the higher frequency of cells expressing these markers is found, both in the spleen and in the heart, at day 14 pi. The only exception was the equal ratio of cells expressing PD-1, and at equivalent levels, when comparing CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells in the spleen. Non-significant differences in the percentages of cells expressing PD-1 among CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells were found in the heart. However, the intensity of expression of the PD-1 marker (MFI) was significantly higher among CD44hiGzB+PRF+ compared to CD44hiGzB-PRF- CD4+ T cells infiltrating the heart. Furthermore, we also compared the frequency of CD44hiGzB+PRF+ CD4+ T cells expressing Lag-3, Tim-3, CD39 and PD-1, and their corresponding MFI values, between the spleen and the heart (Figure 7-figure supplement 4). Of note, while MFI values of Tim-3, CD39 and PD-1 expression were increased on CD4CTLs (CD44hiGzB+PRF+) in the heart compared to CD4CTLs in the spleen, Lag-3 expression levels were decreased on CD4CTLs infiltrating the cardiac tissue. Despite exhaustion being often seen as a dysfunctional state, it is important to note that the expression of these inhibitor molecules allows strongly activated T cells to persist and partially contain chronic viral infections without causing immunopathology and that highly functional effector T cells can also express such inhibitory receptors (reviewed in Wherry, E.J., 2011, Nat. Immunol.,12:492; Blank, C.U. et al, 2018, Nat. Rev. Immunol., 19:665). Interestingly, only PD-1, but not Lag-3, Tim-3 or CD39 expression is upregulated on CD8CTLs in the heart relatively to the spleen, an indication that the T. cruzi-infected cardiac tissue is a less so-called exhaustion-inducing environment compared to certain tumors (Figure 7- figure supplement 4). It is known that many immunomodulatory molecules, including Lag-3, Tim-3, PD-1 and CD39 are co-expressed as part of a module composing a larger co-inhibitory gene program, which is expressed in both CD4+ and CD8+ T cells under certain activation conditions, driven by cytokine IL-27 (Chihara, N. et al, 2018, Nature 558:454). The opposing behavior of Lag-3 expression, which is downmodulated on CD4CTLs in the heart in comparison to the spleen, indicate that CD4CTLs infiltrating the heart are not typically exhausted cells. Of note, a recent study has shown that exhausted CD8+T cells can partially reacquire phenotypic and transcriptional features of T memory cells, in a process that includes the downmodulation of Lag-3 expression (Abdel-Hakeem, M.S. et al, 2021, Nat.Immunol., 22:1008). As requested, these new data were included (Figure 7-figure supplements 3 and 4) and discussed in the revised manuscript.

      The factors that control differentiation of cytotoxic CD4+ T cells are the same as for IFN-g- Th1 cells. MyD-88-/- and IL-18-/- mice significantly lack both populations and succumb to T. cruzi infection. In their 2017 eLife publication, this group reported that survival of infected MyD-88-/- and IL-18-/- mice can be rescued by adoptive transfer of purified total WT CD4+ T cells, which was attributed entirely to their ability to secrete IFN-g (at least in the case of MyD-88-/- recipients). In the current study, the authors only used infected IL-18-/- recipients and show that this time transfer of GzB+ CD4+ T cells is sufficient to confer the protection. When compared with the old data, the rescue of the infected IL-18-/- with only GzB+ CD4+ T cells looks weaker (2 surviving animals out of 10 pooled from 2 experiments), strongly suggesting that IFN-g Th1 cells do play a significant role. It is unclear when the parasite load in Fig G6 was evaluated. It would be good to show deltaCT values for individual mice.

      We thank Reviewer #1 for the opportunity to clarify the point on the protective role of Th1 and CD4CTLs cells during T. cruzi infection and to better discuss our data. Please note that we do not question the beneficial role of Th1 cells in this infection model. In our paper published in 2017 in eLife, we have shown that the adoptive transfer of IFN-g- deficient CD4+ T cells do not result in the decrease of parasite loads in susceptible recipient mice. These results are totally in agreement with the known beneficial role of Th1 cells during infection with T. cruzi, through the microbicidal action of IFN-g, which was also described by other groups.

      The new information that our present study brings is that the adoptive transfer of GzB+CD4+ T cells with poor (GzB-YFP+) or no (Ifng-/-) capacity of IFN-g secretion, also significantly extended survival of infected Il18r-/- mice, which have lower levels of both Th1 and CD4CTLs, compared to WT mice (Figure 6G and Figure 6-figure supplement 2). Please note that 3 (not 2) out of 10 mice that received GzB+CD4+ T cells survived. We stated in our discussion that, together, our present and past data demonstrate that both Th1 and CD4CTL are important for improving survival, although through different mechanisms, since adoptively transferred GzB+CD4+ T cells (as well as Ifng-/- CD4+ T cells) were not capable of reducing parasite load but, notwithstanding, extended survival.

      Following the guidelines of the Animal Care and Use Committee, in order to prevent/alleviate animal suffering, all laboratory animals found near death must be euthanized. Therefore, parasite load in the hearts was evaluated in mice found at the moribund condition (a severely debilitated state that precedes imminent death, as defined in Toth, L.,2000; ILAR J, 41:72), presenting unambiguous signals that the experimental endpoint has been reached. We have now included 2ˆDeltaCT values for individual mice in Figure 6G, as requested.

      Because donor IFN-g-/- CD4+ T cells do express IFN-gR (Supp Fig 6-2), IFN-g produced by IL-18-/- host cells could enhance the activity and/or help expand cytotoxic CD4+ T cells among the IFN-g-/- CD4+ donor population. To directly test the protective role of cytotoxic CD4+ T cells in the absence of IFN-g, the authors should treat infected IL-18-/- mice that have received IFN-g-/- CD4+ T cells with anti-IFN-gamma Ab.

      It is known that IFN-g is critically important for resistance against infection with T. cruzi. Accordingly, Ifng-/- mice are extremely susceptible, dying at early time points of infection (Campos, M. et al, 2004, J.Immunol, 172:1711). Of note, IFN-g production by other cell types, and not only derived from CD4+ T cells, is relevant for resistance against infection, as demonstrated for CD8+ T cells (Martin D & Tarleton R. Immunol Rev. 2004, 201:304). In our present work, we performed experiments where Ifng-/- CD4+ T cells were adoptively transferred to susceptible Il18ra-/- mice, with the goal of testing whether the transferred cells would be able to confer some increment in the survival time of infected mice, despite of not being able to decrease parasite loads, a direct consequence of their deficiency in IFN-g production, as previously shown (Oliveira et al., 2017, eLife). In fact, this turned out to be the case and we showed that the transfer of purified Ifng-/- CD4+ T cells extended survival (Figure 6-figure supplement 2). Of note, our data demonstrate that the percentage of GzB+CD4+ T cells is not affected in the total absence of IFN-g, since Ifng-/- mice display the same frequency of this cell population as found in WT mice (Figure 4B). The increased survival of adoptively transferred mice is compatible with a regulatory function of GzB+CD4+ T cells, which additionally express several immunoregulatory molecules, as shown. Whether IFN-g produced by the host is enhancing the activity and/or expanding cytotoxic CD4+ T cells among the transferred T cell population is not an essential point here, since we were not aiming to test the protective role of cytotoxic CD4+ T cells in the total absence of IFN-g in the host mice.

      The intracellular cytokine staining in this study appears to be suboptimal. Instead of stimulating with PMA/ionomycin in the presence of Golgi block, Roffe et al. (2012) stimulated lymphocytes with anti-CD3 prior to adding Brefeldin A, an important technical difference which may explain the rather low frequencies of IFN-g+ and IL-10+ cells in this study.

      We respectfully disagree from Reviewer #1 on this point. The frequency of IFNg+ CD4+ and IL-10+CD4+ T cells in the spleen of mice infected with T. cruzi Y strain obtained in our experiments is in the same range to what was previously described by other research groups investigating the immune response to this parasite, including studies that have employed anti-CD3 stimulation and brefeldin A, such as Jankovic, D. et al, 2007, JEM 204:273 (Fig.S1), cited in our manuscript (page 9, lines 218-219), among others (Nihei J et al, 2021, Front. Cell. Infect. Microbiol.11:758273; Martins GA et al, 2004, Microbes Infect 6:1133 – Fig.6B; Hamano S. et al, 2003, Immunity, 19:657- Fig. 2A). In the present work, we used the combination of monensin and brefeldin A after PMA/iono treatment, and found the same frequency of IFN-g+CD4+ T cells described in a previous study of our group, where staining was performed after incubation of splenocytes with parasite-derived protein extract and brefeldin A alone (Oliveira AC et al., 2010, PLoSPath 6(4):e1000870 –Fig. 8D). On the other hand, please note that the study cited by Rev. #1 (Roffe et al., JI 2012) employed a different strain of T. cruzi, the Colombiana strain, which differs in several aspects from the Y strain used in our work. Colombiana induces a different pathology, with distinct kinetics. In that study, intracellular IFN-g and IL-10 detection was performed at a much later time point of infection (day 30 pi), and in cells infiltrating the heart, not the spleen. In summary, frequencies of IFN-g and IL-10 secreting CD4+ T cells described in our manuscript are comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported in articles of prestigious journals by other groups, cited above.

      Reviewer #2:

      In this work, Professor Bellio and her colleagues provide compelling evidence to show unusually strong induction of cytotoxic CD4 T cells (CD4CTLs) in Trypanosoma cruzi-parasitized mice. Using genetic models and mixed bone marrow chimeras they dissect the signals responsible for CD4CTL induction in this infection and identify T cell-intrinsic IL-18R/MyD88 signaling as the key inducer. The CD4CTLs that clonally expand in T. cruzi infection outnumber CD4 cells with typical Th1 profile (IFN-γ secretion) and bear the hallmarks of CD4CTLs described in other model systems and in humans. Utilizing GzmbCreERT2/ROSA26EYFP reporter mice, the authors show that adoptive transfer of CD4 cells that have made GzB can increase the survival of T. cruzi parasitized l18ra-/- mice. Finally, the authors describe a clear correlation between the frequency of CD4CTLs the circulation of patients with T. cruzi-induced chronic Chagas cardiomyopathy, implying a pathogenic role for these cells in chronic disease.

      The findings reported here are an important addition to the understanding of both the origin of CD4CTLs and their potential role in host protection or disease. The evidence provided in support of the main claims is very strong and the association between CD4CTLs and Chagas disease quite intriguing. There are, however, some aspects of the work that would benefit from further clarification or experimental support, so that alternative interpretations of the data can be excluded.

      The defining characteristic of CD4CTLs that separates them from other CD4 subsets is the production of granzymes and perforin and, by extension, the ability to kill target cells in a granzyme/perforin-dependent manner. In contrast, all T cells can kill target cells via alternative mechanisms that are not dependent on granzyme/perforin, for example through expression of TNF family members. It would appear that much, if not most, of the killing activity of T. cruzi-induced CD4CTLs can be attributed to FasL (Fig. 1B). FasL-mediated killing is not restricted to CD4CTLs and as the title of one of the cited studies (Kotov et al., 2018) states, "many Th cell subsets have Fas ligand-dependent cytotoxic potential". It would be important to ascertain if expression of granzyme/perforin by CD4CTLs in T. cruzi infection is also associated with granzyme/perforin-dependent cytotoxicity. This affects the direct and indirect in vitro cytotoxicity assays, as well as the interpretation of in vivo protection.

      Similarly, the protective effect of transferring GzmbCreERT2/ROSA26EYFP reporter-positive cells to Il18ra-/- mice may not be necessarily mediated in a granzyme/perforin-dependent manner or by CD4CTLs for that matter. The reporter will mark cells that express GzB at the time of tamoxifen administration but does not guarantee that these cells will continue to express GzB or that they will prolong survival of recipients in a granzyme/perforin-dependent manner.

      While the authors provide evidence that GzB-producing cells are largely distinct from IFN-γ-producing cells, the reporter-positive cells may still contain genuine Th1 cells. Given Th1 cells have been previously found necessary for protection of Il18ra-/- mice in the T. cruzi model, can a role for Th1 cells in this transfer model be formally excluded? The authors do convincingly demonstrate that IFN-γ itself is not essential for protection, but that does not leave granzyme/perforin-dependent as the only other alternative. For example, the experiment described in Fig. 6G lacks an important control, the transfer of reporter-negative cells. What would the conclusion be if reporter-negative (but T. cruzi-specific) cells proved as protective as reporter-positive cells?

      We would like to thank Reviewer #2 for the positive comments on our study and for giving us the opportunity to better discuss and clarify the relevant points raised in this review.

      (i) Concerning the role of GzB/PRF in cytotoxicity: as explained in more details in our next answer to Reviewer #2, we have now shown that the cytolytic activity of the CD4 T cell subset differentiating in the murine T. cruzi-infection model is totally dependent on a GzB- and PRF-mediated mechanism.

      (ii) Concerning a possible role for Th1 in the adoptive transfer experiments: please note that the parasite load is not decreased by the adoptive transfer of CD4+GzB+ T cells (Figure 6G); Additionally, we showed that the adaptive transfer of Ifng-/- CD4+ T cells also extend the survival of infected mice (Figure 6-figure supplement 2), but did not decrease parasite levels (Oliveira et al., 2017). These results exclude a role for Th1 cells, which are known to exert an important microbicidal function through the production of IFN-g, as previously demonstrated by us (Oliveira, 2017) and other groups. Together, our present and past data support the notion that both Th1 and CD4CTL are important for extending survival, although through different mechanisms. Our results are in accordance with an immunoregulatory role played by CD4CTLs, likely through the GzB/PRF/FasL-mediated killing of infected APCs in an IFN-g-independent manner, although it is not possible to attribute the beneficial role of the adoptively transferred CD4CTLs exclusively to their cytolytic function, as discussed in the revised manuscript. Of note, we also show here that most CD4+GzB+PRF+ T cells express high levels of immunomodulatory molecules, raising the possibility that the beneficial role of adoptively transferred CD4CTLs might rely on the concerted action of their cytolytic function and immunomodulatory activity. Please see the full discussion on this point in the revised version of the manuscript.

      (iii) Concerning the adoptive transfer of GzB-EYFP-negative cells: unfortunately, GzB-EYFP-negative cells cannot be employed as a control, since in the GzmBCreERT2/ ROSA26EYFP mouse line age, only 1 - 3 % of total splenic CD4+ T cells express EYFP after induction by tamoxifen (Figure 2-figure supplement 3). This contrasts to 10-40% of GzB+ and PFR+ cells among CD4+ T lymphocytes, observed by intracellular staining. Consequently, the majority of the CD4+GzB+ T population is EYFP-negative in this system and thus, sorted “GzB-EYFP-negative”, based on the absence of expression of EYFP, would not be bona-fide GzB-negative cells. If it were possible to sort GzB reporter-negative cells, Th1 cells would be among the sorted cells and upon adoptive transfer they would secrete IFN-g and, consequently, decrease the parasite load in recipient mice (Oliveira, 2017). However, in the absence of the proposed immunoregulatory action of CD4CTLs, Th1 cells transferred alone might also increase pathology and, consequently, it is possible that they would not extend survival, albeit diminishing parasite load. It is expected that higher levels of extended survival would be attained when both Th1 and CD4CTLs are transferred, as discussed in the manuscript and in answer (ii) above. Importantly, please note that one current hypothesis is that CD4CTLs differentiate from Th1 and, therefore, the adoptive transfer of Th1 cells will not guarantee that Th1-derived CD4CTLs would not be developing in vivo, unless special engineered mouse strains, not available at present, would be employed for these experiments.

      Reviewer #3:

      By modelling trypanosoma cruzi infection in mice, the authors highlighted the presence of a subsets of CD4 T cells expressing canonical markers and transcription factors of CTLs and capable of exerting antigen specific and MHC class II restricted cytotoxic activity. Mechanistically, using KO mice, the authors have shown that myd88 expression is required for strengthening the CD4 CTLs phenotype during the infection.

      Moreover, by investigating the presence of a previously published CD4 CTLs gene signature in a mixed bone marrow chimera settings they highlighted a cell intrinsic role for Myd88 in imprinting the signature. The study also identifies Il18R as a myd88 upstream receptor potentially responsible for CD4 CTLs development by showing that lack of IL18R phenocopied myd88 deficiency in failing to promote a CD4 CTLs phenotype.

      Finally, by showing the direct correlation between perforin expressing CD4 T cells in Chagas infected individuals and parameters of heart disfunction the authors hinted at a possible involvement of CD4CTLs in a clinical setting.

      -The core finding of the paper, providing the first evidence of CD4 CTLs development in a mouse model of intracellular parasite is well supported by the data. The expression of markers correlated to CD4 cytotoxicity in other settings and gene signatures fits well the phenotype described and suggests possible common features for CD4 CTLs development across infection with different pathogens.

      This manuscript will boost the knowledge over the involvement of non canonical CD4 types in the immune responses to parasites. Moreover the finding that CD4 CTLs are the predominant phenotype in organs importants for viral replication imply an involvement of these cells in the development of the pathology that will have to be taken into accounts in future studies.

      • The understanding of the parental relationship beteween CD4CTLs and Th1 remains unclear and it's complicated by the low numbers of IFNg (regarded as an hallmark of functional Th1) producing CD4 T cells detected in the model. IFN-g production by CD4 is lower than 10% even when achieved by PMA/Iono stimulation and half of Gzb+ CD4 stain positive for the cytokine. On the other hand the putative transcription factor of Th1 development, Tbet, is expressed by all Gzb positive CD4s. This discrepancy and the low number of IFNG+ should be better discussed by the authors.

      First, we would like to thank Reviewer #3 for the constructive criticism on our manuscript. Regarding the apparent discrepancy on the frequencies of IFN-g+ and Tbet+ CD4+ T cells in our model, please first note that the percentage of IFN-g+ CD4+ T cells detected in the present study is comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported by other groups (please see our complete answer to Reviewer #1 on this topic). With this remark done, we think that the apparent discrepancy between the expression of T-bet and the low fraction of GzB+CD4+ T cells producing IFN-g is a very interesting question. It is known that T-bet is a key transcription factor associated with the development of IFN-g-producing CD4+ T cells and that it also coordinates the expression of multiple other genes in CD4+ T cells and in other cell types. Also, T-bet can interact with other proteins, resulting in the induction or inhibition of key factors in T cell differentiation (reviewed in Hunter, 2019, Nat. Rev. Immunol, 19:398). Importantly, it has been shown that during the late stages of Th1 cell activation, T-bet recruits the transcriptional repressor Bcl-6 to the Ifng locus to limit IFNg transcription (Oestreich, 2011, JEM, 208:1001) Therefore, T-bet action is not limited to transactivation of the Ifng gene, but can also act as part of a negative-feedback loop to limit IFN-g production in certain cells. We do not believe that Bcl-6 is playing a role in CD4+GzB+ T cells in our model, since we found that the majority of CD4+GzB+ T lymphocytes express Blimp-1 (Figure 5D), and Blimp-1 and Bcl-6 are known to be reciprocally antagonistic transcription factors.

      However, the possibility remains that another repressor factor is downregulating Ifng gene transcription in the majority of T-bet+ CD4+GzB+ T cells, with the participation of T-bet or not. Of note, Blimp-1 was shown to be a critical regulator for CD4 T cell exhaustion during infection with T. gondii, and CD4+ T cells deficient in Blimp-1 produced higher levels of IFN-g in infected mixed-bone marrow chimeric mice reconstituted with WT and Blimp-1 conditional knock-out cells (Hwang, S., 2016, JEM 213:1799). Furthermore, Blimp-1 attenuates IFN-g production in CD4 T cells activated under nonpolarizing conditions and chromatin immunoprecipitation showed that Blimp-1 binds directly to a distal regulatory region in the Ifng gene (Cimmino, L. et al. 2008, JI 181:2338). We have also shown that, like Blimp-1, Eomes is expressed by around 60% of the GzB+CD4+ T cells (Figure 2G). It is known that Eomes controls the transcription of cytotoxic genes and promotes IFN-g production in CD8+ T cells, binding to the promotor of the Ifng gene. Interestingly, Eomes was also shown to participate in the induction of immunoregulatory/exhaustion receptors, such as PD-1 and Tim-3. Furthermore, deficiency of Eomes led to increased cytokine production (Paley, M.A. et al., 2012, Science 338: 1220). More recently, evidence in favor of the participation of Eomes in the repression of IFN-g production in TCR-gamma-delta T cells was also published (Lino, C. et al.,2017, EJI 47:970). Therefore, these studies indicate the complex control of Ifng gene, in which T-bet, Eomes, Blimp-1 and possible other TFs might play concerted roles. We think it would be interesting to investigate the role of Eomes and/or Blimp-1 in the repression of the Ifng gene in GzB+CD4+ T cells. Kinetics studies on the expression of these TFs, may contribute for the better understanding of the parental relationship between CD4CTLs and Th1 cells, a fundamental question, not completely understood yet. A comment on this subject was included in the revised manuscript.

      On the same note, while the confirmation of a CD4 CTLs gene signature in the model is very convincing, it must be noted that the one used as a reference was obtained by performing single cell RNA seq , taking into account only IFNg+ CD4 cells and then comparing Gzb+ and Gzb- negative in the setting. The authors are instead using bulk RNA seq and comparing populations of cells that would have none VS low levels of Th1. In this view, while the confirmation of the CD4 CTLs signature is striking, addressing the relative relationship with Th1 cells is complicated. Using Gzb YFP reporters in the setting could help improving the resolution between the 2 subsets.

      Our analysis clearly demonstrated the presence of the CD4CTL signature among WT CD4+ T cells, and its absence among Myd88-/- CD4+ T cells from the same mixed-BM chimeric mice. Together with our past work (Oliveira, 2017) and results included in the present manuscript, this analysis strongly contributes to demonstrate the importance of T-cell intrinsic IL-18R/MyD88 signaling for the development of a robust CD4CTL response to infection with an intracellular parasite. Although these results argue in favor of a common origin for CD4CTLs and Th1 cells during infection, an interesting point is that Ifng-/- mice display the same percentage of GzB+CD4+ T cells as WT mice (Figure 4B), suggesting that GzB+CD4+ T cells might emerge independently of IFN-gdependent Th1 cells. Therefore, the possibility remains that not all CD4CTLs are derived from the putative terminal differentiation of Th1 cells but that, instead, a divergence between the Th1 and CTL differentiation programs might occur at an earlier step. Although addressing this fundamental question goes beyond the possibilities of the present study, we believe that our results bring an important and substantial contribution for the understanding of the biology of CD4CTLs in response to infection and highlights the importance of IL-18R/MyD88 signaling for the reinforcement and/or stabilization of CD4+ T cell commitment into the CD4CTL phenotype. Regarding the use of GzB-YFP reporters, please see our answer below.

      • The dependancy on the Myd88/IL18r axis to promote CD4 CTLs is well characterized and the prolonged survival rate of IL18r-/- after the adoptive transfer of Gmb YFP+ CD4 is very convincing. However instead of using PBS as control the authors could have used YFP- or total CD4 cells for the task. While in previous publication it was already showed that protection was achieved by transferring the total CD4 population; comparing GzB + VS GzB- would have added useful insights over the amount of protection conferred by the subtypes and relative roles of CD4 CTLs and Th1 in the model. Parasitemia could also be reassessed in this view.

      We have already discussed the impossibility of sorting bona-fide GzB-negative cells from the reporter mouse strain available. Please see our complete answer to Reviewer 2 on this issue (iii) in this point-by-point letter. Moreover, due to the low percentage of GzB-EYFP cells labeled in the tamoxifen-treated reporter mice, a high number of mice is necessary for performing these adoptive transfer experiments. Unfortunately, due to the COVID-19 pandemic and its consequences on our animal facility, at present it is impossible to repeat this experiment including total CD4+T cells within a reasonable time. However, we have already shown in our past study (Oliveira, 2017), that the transfer of total WT CD4+T cells to Il18ra-/- mice, increased survival and lowered parasite load. On the other hand, our current data demonstrate that the adoptive transfer of GzB+CD4+ T cells increases survival but does not change the parasite load (Figure 6G). Therefore, these data strongly support that GzB+CD4+ T cells act in an IFN-g-independent way and, hence, differ from Th1 in the effector mechanism employed for extending survival of the recipient mice. In summary, our results favor the notion that CD4CTLs and Th1 cells have complementary roles, both being able to extend survival of recipient mice, although only Th1 are effective in lowering parasite load.

    1. Author Response

      Reviewer #1 (Public Review):

      The results are quite interesting and potentially have important therapeutic implications. Nevertheless, in the current form there are several weaknesses that diminish the strength of the findings.

      1) As the authors note, they do not provide direct evidence for the ultimate conclusion of the study that assembly with β2a and β2e subunits are necessary for CaV2.3 channels to contribute to pacemaking in SN DA neurons. The authors state siRNA knockdown experiments in SN DA neurons are technically challenging. Nevertheless, shRNA knockdown studies in SN neurons have been previously published. Such a study is critical to provide direct evidence for what would be a very important and impactful finding.

      Please refer to our detailed response to essential revision 1 above.

      2) Relative contribution of CaV1.3 (L‐type) and CaV2.3 channels to pacemaking in SN DA neurons. As the authors note, a phase III clinical trial for the L‐type channel blocker, isradipine, showed no efficacy for neuroprotection, even though some mice studies suggested this might be efficacious. On the other hand, the authors' previous work with CaV2.3 knockout mice suggest inhibition of this channel would be more appropriate for a neuroprotective response. It would be useful to get a direct comparison of the impact of isradipine and SNX‐482 on pacemaking in SN DA neurons (Figs. 1 and 2). If their impacts on pacemaking (and Ca2+ oscillations) are similar it would suggest something beyond the pacemaking Ca2+ influx could be responsible for neuroprotection (e.g. changes in NCS‐1 expression as previously suggested by the authors).

      The question about the relative contribution of Cav1.3 and Cav2.3 on pacemaking is complex due to the finding that different results have been obtained regarding the role of L‐type channels on pacemaking. In Cav1.3 knockout mice pacemaking frequency is normal (7, 8). Inhibition (of Cav1.2 and Cav1.3) by dihydropyridine Ca2+ channel inhibitors (e.g. isradipine, nimodipine) was found to inhibit pacemaking in some (e.g. 9‐11) but not in all (8, 12) reports. This seems to be dependent on experimental conditions, but the reasons for these discrepancies are currently unclear. Similarly, we find inhibition of pacemaking by SNX‐482 in cultured midbrain neurons (this paper) but, as previously reported, not in Cav2.3‐deficient mice (1). While this toxin is well suited to isolate Cav2.3‐mediated Ca2+ current components, effects on pacemaking in DA neurons have to be interpreted with more caution because (as clearly outlined in our original MS and our previous paper, 1), SNX‐482 is also a potent inhibitor of Kv4.3 channels. We consider this limitation even more in the discussion of SNX‐482 effects on pacemaking in cultured neurons (data now moved to Suppl Fig. 5) in the revised MS (end of page 15, top of page 16), although the SNX‐482 changes suggest an involvement of Cav2.3 for AP generation.

      Although we acknowledge the relevance of the question raised by the reviewer, based on our previous findings (1) the absence of an obvious role of Cav2.3 for pacemaking in SN DA neurons (despite their role for Ca2+ transients) as an experimental read‐out prevents a straightforward approach to study the contribution of different β‐subunits and their splice variants for this process.

      3) The slice recording data (Fig. 9) are confusing and raise concerns about adequacy of pharmacological isolation of CaV2.3 currents in this preparation. The accuracy of interpretation of the data in Fig. 9 rests critically on the idea that the cocktail of CaV channel blockers given successfully isolates CaV2.3 currents. Yet, the amplitudes of the exemplar currents shown for plus or minus the CaV channel blocker cocktail are almost the same. This cannot be due to CaV2.3 providing the dominant current in the slice preparation since addition of SNX‐482 only decreased Ca2+ current amplitude by 13% (Suppl Fig. 5). It is not clear to me why the steady‐state activation and inactivation curves experiments were not conducted in the cultured neuron preparation (Figs. 1 and 2) where there seems to be better control of pharmacological block of different Cav channel isoforms.

      We have now performed the isolation of SNX‐482sensitive currents not only in the cultured neuron preparation as suggested but, in addition, also in SN DA neurons. The latter experiments gave essentially identical steady‐state inactivation parameters as compared to our "R‐type" current (current remaining in the presence of all other channel blockers). This now also allows a direct comparison of SNX‐482‐sensitive current properties in cultured neurons and in slices (see response above). We now also specifically discuss previous reports of SNX‐482‐sensitive Rtype components in the introduction to allow comparison of these reports with our findings. Please also note that in our legend to Fig. 9A (original MS, now Fig. 6) we have explicitly stated that recordings of "similar amplitudes were chosen" to facilitate comparison of current kinetics. We still think that this makes sense and kept this part of the figure but now strengthened this point even more in the figure legend (Fig. 6).

      4) While the transcript data show that β2a and β2e are present in SN DA neurons, numerically they would still represent only a minority of the beta subunits present (<25%). I don't think sufficient thought has been given to this in the discussion of the results. Unless there is some preferential association of CaV2.3 with β2a and/or β2e, there would be a mix of channels with the majority incapable of supporting pacemaking in SN DA neurons. Given this, one would not necessarily expect that the gating characteristics of CaV2.3 would be the same as what is obtained with reconstituted channels in tsA201 cells where all the channels are assembled with β2a or β2e (see point #5 below).

      We now give this important point more thought in the discussion and mention that our data would imply such a preferential association of Cav2.3 with β2a and/or β2e and provide possible explanations. In addition, as in the original MS, we also provide alternative interpretations (Discussion, pg 14, 2nd and 3rd paragraph).

      5) The V0.5,inact of putative CaV2.3 channels in SN DA neurons of ‐52.4 mV was said to be 'very similar' to the value of ‐40 mV that was observed in tsA201 cells. A difference of +12 mV in voltage‐dependence gating of ion channels is substantial and should not be brushed off. A more nuanced interpretation would be that in SN DA neurons CaV2.3 likely associates with other beta subunits in addition to b2a and b2e and so one would not necessarily expect the V0.5,inact to be the same as what is observed in reconstituted channels in tsA201 cells.

      The V0.5,inact of ‐52.4 mV refers to the control current. We correctly stated that the V0.5,inact of R‐type current was ‐47.5 mV (as also shown in Table 3), i.e. only about 7 mV more negative than in tsA‐cells. We now rephrased this chapter because we also included the new data with inactivation data of SNX‐482sensitive currents in cultured neurons and in SN DA neurons recorded in slices (Discussion, page 13, 2nd paragraph). We do not refer to "'very similar" (difference ~5 mV) values anymore as suggested.

      Reviewer #2 (Public Review):

      This reviewer is very enthusiastic about the work but notes that most of the conclusions are based on data obtained by overexpressing Cav2.3 and accessory subunits in a heterologous expression system. The authors make a good argument for cross‐correlation between data in tsA‐201 cells and dopaminergic neurons, but it is unclear that the results will translate from one system to another. More data may be needed to do so (the reviewer does understand that these are challenging experiments), which the authors acknowledge in a section about the study's limitations. Based on this, it seems that the title is misleading without additional data supporting the role of Cav2.3 in dopaminergic neurons. Along the prior line, statements linking the study results to potential pathological implications seem a big stretch not supported by current data, and therefore should be eliminated.

      An issue with this manuscript is that the narrative and organization of the data are difficult to follow. The reviewer understands that the authors are weaving a complex story that involves using multiple techniques and approaches. Still, the way the data is organized and described makes the reader go back and forward to compare and contrast results constantly. This is further complicated by the fact that some experiments are done in dopaminergic neurons and others in tsA‐201 cells (the identity of the cell type used should be made clearer), the order of some figures is not appropriate (Supp Fig 1 for example) and some figure panels are not discussed (Supp Fig 5E to 5J).

      The MS has been completely rewritten, based on the additional SNX‐482experiments we have now performed both in the cultured DA neurons as well as in the midbrain slices. We therefore also moved data on effects on the spontaneous activity of cultured neurons by SNX‐482 into the supplement to make the key results easier to follow. The identity of neurons is indicated in all headers of table and figure legends to identify cell types. We also changed the title to “β2‐subunit alternative splicing stabilizes Cav2.3 Ca2+ channel activity during continuous midbrain dopamine neuronlike activity” to attenuate our previous statement regarding a role in dopaminergic midbrain neurons.

    1. Author Response

      Reviewer #1 (Public Review):

      As we lack empirical data of the response of most species to environmental changes, developing predictive tools based on traits that are easier to access or infer may help us developing better management tools. This is the case even for terrestrial mammals, a rather well studied group but with a large study bias towards temperate Europe and North America. This study uses maximum longevity, litter size and body mass to predict the sign and size of the relationships between annual temperature and precipitation anomalies and population growth rates, using the Living Planet database for times series of abundance and Chelsa for weather anomalies. The authors use a Bayesian framework to relate the size and absolute magnitude of the relationships between detrended population growth rates and weather anomalies, the framework accounting for the uncertainty in estimates as well as phylogenetic dependencies. They did not find any systematic effects -- on average the slopes of the relationships were close to 0 -- but the magnitude of the coefficients decreases for species with high maximum longevity and low litter size. Therefore, this study points to possible predictions of the magnitude of the response to weather variability using simple demographic indices such as longevity and litter size. The study has clear limitations that are common to similar "meta-regressions" using publicly available databases, but they are not ignored when discussing the results. One would hope that such limitations would lead to improving the quality of such databases, both in terms of taxonomic and geographic coverage as well as quality of data.

      We would like to thank Reviewer 1 for their overall positive feedback and constructive comments on the method and our predictions. We have now included complementary analyses based on high-quality subsets (≥ 20-year records; using life history traits estimated from structured population models), have clarified our set of hypotheses and discussed our results accordingly. Detailed responses are given below.

      I would like to challenge the authors in terms of why one would expect relationships of a given sign or magnitude. First with respect to sign of relationships, even for the same species and the same weather parameters, one could expect different signs depending on where the study is done with regards to the climatic niche. If one is close to the warm (or wet) edge, any positive temperature (or precipitation) anomalies would probably have a negative effect, but the reverse would happen when close to the cold or dry edge. There are studies showing such demographic and growth rate variability differences. I find therefore hard to interpret the sign of such weather anomalies and what it tells us about the "effect" of weather variability.

      We think that this is an important point to discuss with respect to the importance of within-species variability in population dynamics. Certainly, from the results L203-206 it is clear that populations of the same species can have responses of differing signs. It is also interesting to note that this may be the result of a population’s position in the climatic niche. However, aside from exploring this for species with long-term demographic monitoring across the range, we do not feel that exploring this was in the scope of the current study across species. We agree fully however that adding this perspective to studies of how populations are responding to changing climates is critical. As well as the paper mentioned below by Gaillard et al. (2013), recent work in Plantago lancelota with extensive spatial replication has also begun to reveal these within-range dynamics as a function of latitudinal or climatic gradients (Römer et al. 2021). We have added further discussion of this to the manuscript L330-340. We believe that this point adds to the context of our results highlighting variability within-species. In addition, we have clarified in the introduction that no clear directional responses of populations to weather anomalies was expected among and within species L133-135.

      Römer, G., Christiansen, D. M., de Buhr, H., Hylander, K., Jones, O. R., Merinero, S., ... & Dahlgren, J. P. (2021). Drivers of large‐scale spatial demographic variation in a perennial plant. Ecosphere, 12(1), e03356.

      Second with regards to the magnitude, it is clear that the maximum growth rate is strongly linked to maximum longevity and litter size -- slow species have a much lower maximum rate of growth than fast species. So, one would expect that variability of population growth rates is larger in fast species than slow species, and therefore the magnitude of their response to environmental variability. Now the question might also be whether weather variability explains a smaller or larger proportion of the variability in population growth rates -- that is, does weather have a relatively larger influence in fast species than slow species? You might have the answer but with the multiple standardizations of the response and predictor variables it is not obvious (that is, when you standardize the response and predictor variables, coefficients are correlations, but this is across species, not for a given population).

      The reviewer raises a very interesting and important point on whether the patterns we observe are simply a result of larger variability in growth rates in short-lived species. We have two responses to this point: 1) while there is indeed larger variation in the population growth rates of short-lived species, we believe that this variability is likely an evolved life-history strategy in response to the environment, and thus a key component of patterns we observe, 2) we also feel that our use of models that included annual effects, and state-space models with explicit process-noise terms, account for any confounding effect of this variation.

      To address the first point in more detail, we expect that life-histories (and thus population dynamics) are evolved responses to the environment (Stearns, 1992). For ‘fast’ organisms therefore, their intrinsic life-history strategy results in boom-bust population dynamics relative to ‘slow’ species. This is clearly observable in transient or non-asymptotic dynamics, where short-lived species more often have short-term population dynamics with a greater magnitude (Stott et al. 2011). On this point, we therefore argue that this variation in population growth is part of what we are trying to capture. Anomalies in the weather are therefore expected to act more strongly in ‘fast’ species. Following this point and the comments of Reviewer #3, we have now included more explicit hypotheses in terms of life-history L133-144.

      For the second point, while we may expect this variability to be the result of dynamics we are trying to capture, this does not preclude other sources of variation in population size confounding the patterns we could observe. For example, hunting pressure may influence both short-term population variability and long-term trends. As a result, we aimed to capture this residual variation using auto-regressive terms for year in our GAMs. While these terms do not explicitly model variability in population growth, they do account for a component of the trend, with variation (error around the trend, which is expected to be larger for fast species), and auto-regressive components of population change. Moreover, we did additional analyses using a state-space modelling approach. In the state-space approach, process noise, which in our case would equate to variability in population growth, is explicitly modelled and accounted for. We therefore believe that our analyses account for residual variability in population growth rates. State space models were also highly correlated with our auto-regressive GAMs, and we can therefore conclude that we do not expect that this variability influences our findings. We have now asserted this in the Methods section L531-535.

      Stearns, S.C., 1992. The evolution of life histories (No. 575 S81).

      Stott, I., Townley, S. and Hodgson, D.J., 2011. A framework for studying transient dynamics of population projection matrix models. Ecology Letters, 14(9), pp.959-970.

      Your analyses remove trends -- that is, climate or other systematic change as opposed to weather anomalies (yearly differences) -- and trends might be the main concerns in terms of conservation. This is made clear in the discussion but perhaps not as much in the introduction where you seem to focus on climate change (the title reflects this well, however, as you mention weather, not climate). This confusion between weather and climate is often made in the literature, when reference is made to climate effects rather than weather effects.

      We agree with the reviewer that climate and weather are often conflated in ecological studies. We apologise for this oversight in the introduction, and agree that the narrative and link to weather was not made explicit in the previous version. Following this point and the suggestions of Reviewer #3, we have now restructured large sections of the introduction to improve the clarity of our hypotheses. To address this point, we have now included specific introduction of different components of climate that species populations may respond to, including short-term extreme weather patterns as we explore in this study. Please find this revised section L80-97.

      Finally, I would like to see a measure of how good is the prediction you can make using traits. You may have "significant effects" but not helping much in terms of prediction (see PB Adler et al. 2011 in Science, for an example with species richness and productivity).

      On this point we disagree with the reviewer. The core of our analysis framework was to examine the predictive performance of models. We do not report any significant effects, and instead use Bayesian inference. Throughout the analysis framework, we used explicit tests of out-of-sample predictive performance with leave-one-out cross validation (Vehtari et al. 2017). This is asserted in the manuscript title and results section when introducing our spatial analysis L188-191. Cross validation was combined with model selection to test the predictive performance of a set of candidate models with respect to base models excluding predictors of interest. This predictive performance framework was not applied to examine the directional effects (question 1), as these models did not contain key predictors. However, model selections using predictive performance were done throughout questions 2 and 3, to explore spatial and life-history effects. We highlight this point in both the results L188-191 and methods sections L608-615. In the case of life-history, we found that relative to the base model, out-of-sample predictions were improved when including univariate life-history traits relative to the base model, and thus life-history traits aid in predicting weather responses.

      We did not explore the relative predictive performance of life-history traits with respect to other traits such as dietary specialisation, which have been shown to be important in climate responses (Pacifici et al. 2017). We believe that this would have been out of scope for the purpose of the current study, where we aimed to test specific hypotheses established in life-history theory.

      Pacifici, M., Visconti, P., Butchart, S.H., Watson, J.E., Cassola, F.M. and Rondinini, C., 2017. Species’ traits influenced their response to recent climate change. Nature Climate Change, 7(3), pp.205-208.

      Vehtari, A., Gelman, A. and Gabry, J., 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27(5), pp.1413-1432.

      Reviewer #2 (Public Review):

      Jackson et al. present a global analysis of the effects of life history on the response of terrestrial mammal populations to weather, showing that litter size and longevity significantly alter how populations respond to anomalies in temperature and rainfall. The topic is highly interesting, as it has implications for what data we should monitor to make more reliable predictions about species' responses to climatic change, and how we should prioritise which species to conserve by identifying those which might be at greatest risk.

      The authors comprehensively validate their results with substantial secondary analyses, and I believe that their assertions are supported by the results presented here. Whilst global scale analyses such as this provide useful generalities, they should be taken as that: an investigation of the general trends observed across large spatial scales, and caution should be taken extrapolating too far away from the species which have been analysed for this study.

      We thank the reviewer for their positive feedback, and agree with not drawing too many generalities from our findings. In the first paragraph of the discussion L253-262, we now explicitly refer to the results in the context of mammal population-dynamics/conservation.

      Reviewer #3 (Public Review):

      In this study, the authors aim to investigate how mammalian species are likely to respond to climate change. To this end, they investigate the effects of weather anomalies on the growth rates of mammalian populations. They use long-term population records for 157 terrestrial mammals from the Living Planet database. They explore three different questions using a two-step modelling approach: (1) whether temperature and precipitation anomalies have significant effects on population growth rates across species; (2) whether responses differ among species and biomes; and (3) whether life-history traits explain species responses to weather anomalies.

      The work undertaken in this manuscript is of broad appeal in the field and has the potential to inform conservation. Overall, the methodology is sound and the modelling framework robust; the authors took care to test the robustness of their models by fitting alternative sets of models. The two-step design of this study is interesting and the choice of the study system is relevant for the questions the authors aim to tackle. The authors also paid attention to some important points that are at times overlooked such as resolving taxonomy before running their analyses. I also appreciated the fact that the authors made their code available.

      We thank the reviewer for their positive feedback on the manuscript, which highlights many of our key goals with the paper.

      I nevertheless think that, in its present form, the main weakness of this manuscript is the clarity of the writing, the framing of the study and the overall flow. I found the manuscript at times a bit difficult to follow. That said, I think there is much scope for the authors to improve it. First, I think the work would benefit from better explanation of the underlying hypotheses. Second, in some places I think the authors go into a lot of details at the expense of clarity. As such, I think the authors should strive to better balance clarity with detailed information (notably in the results and methods; adding summary sentences, for example, could help clarify these sections). Third, I think there is room for improvement in the narrative and the flow of the introduction and the discussion. Finally, I think stronger justifications are sometimes required regarding specific points of the analysis.

      I believe that the conclusions of this work are supported by the data and the analyses, and think they are of interest and relevant to the field. However, I think the discussion should highlight the main limitations of the study. In particular, I think the biases in the data should be discussed, and notably whether these biases are expected to affect the results (and if so, in what way).

      To conclude, I think that beyond the aforementioned weaknesses of this study, the results and the methods are of interest for the field. I think the modelling framework is applicable to other study systems and relevant to the field as well.

      We warmly thank the reviewer for their positive words and thorough constructive feedback. We have extensively re-worked large sections of the manuscript (particularly the discussion and introduction) based on these points, and done our best to address all of them. Generally, we have strived to improve the clarity and succinctness of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Guggenmos proposes a process model for predicting confidence reports following perceptual choices, via the evidence available from stimuli of various intensities. The mechanisms proposed are principled, but a number of choices are made that should be better motivated - I develop below a number of concerns by order of importance.

      I’d like to thank the reviewer for their thorough and excellent review. It’s no set phrase that this review substantially improved the manuscript.

      1) Lack of separability of the two metacognitive modules.

      Can the author show that the proposed model can actually discriminate between the noisy readout module and the noisy report module? The two proposed modules have a different psychological meaning, but seem to similarly impact the confidence output. Are these two mutually exclusive (as Fig 1 suggests), or could both sources of noise co-exist? It will be important to show model recovery for introducing readout vs. report at the metacognitive level, e.g., show that a participant best-fitted by a nested model or a subpart of the full model, with a restricted number of modules (some of the parameters set to zero or one), is appropriately recovered? (focusing on these two modules) This raises the question of how the two types of sigma_m are recoverable/separable from each other (and should they both be called sigma_m, even if they both represent a standard deviation)? If they capture independent aspects of noise, one could imagine a model with both modules. More evidence is needed to show that these two capture separate aspects of noise.

      Testing the separability of the two noise types (readout, report) is a great idea and I have now performed a corresponding recovery analysis. Specifically, I have simulated data with both noise types for different regimes of sensory and metacognitive noise. As shown in the new Figure 7—figure supplement 6, the noise type can be precisely recovered in the most typical regimes.

      I now refer to this analysis in the subsection 2.4 Model recovery (Line 521ff):

      “One strength of the present modeling framework is that it allows testing whether inefficiencies of metacognitive reports are better described by metacognitive noise at readout (noisy-readout model) or at report (noisy-report model). To validate this type of application, I performed an additional model recovery analysis which tested whether data simulated by either model are also best fitted by the respective model. Figure 7—figure supplement 6 shows that the recovery probability was close to 1 in most cases, thus demonstrating excellent model identifiability. With fewer trials per observer, recovery probabilities decrease expectedly, but are still at a very good level. The only edge case with poorer recovery was a scenario with low metacognitive noise and high sensory noise. Model identification is particularly hard in this regime because low metacognitive noise reduces the relevance of the metacognitive noise source, while high sensory noise increases the general randomness of responses.”

      In principle, both noise modules can co-exist and model inversion should be possible (though mathematically more complicated). On the other hand, I anticipate that parameter recovery would be extremely noisy in such a scenario. For this work, I decided to not test this possibility as it would add a lot of complexity, with a high probability of ultimately being unfeasible.

      2) The trade-off between the flexibility of the model (modularity of the metacognitive part, choice of the link functions) and the generalisability of the process proposed seems in favor of the former. Does the current framework really allow to disambiguate between the different models? Or at least, the process modeled is so flexible that I am not sure it allows us to draw general conclusions? Fig 7 and section 3 of the results explain that all models are similar, regardless of module of functions specified; Fig 7 supp shows that half of participants are best fitted by noisy readout, while the other half is best fitted by noisy report; plus, idiosyncrasies across participants are all captured. Does this compromise the generalisability of the modeling of the group as a whole?

      This is a fair point and I understand the question has two components: a) is the model too flexible, potentially preventing generalized conclusions? b) is the flexibility of the model recoverable?

      Regarding a), I should emphasize that the manuscript (and toolbox) provides a modeling framework, rather than a single specific model. In other words, researchers applying the framework/toolbox must make a number of decisions: which noise type? which metacognitive biases should be considered? which link function? To ensure interpretability / generalizability, researchers have to sufficiently constrain the model. Due to this framework character, it makes sense that the manuscript is submitted under the Tools & Resources Article format rather than the Research Article format.

      On the other hand, I agree that it is the duty of the manuscript introducing the framework to provide all necessary information to help the researcher make these decisions. This is where the reviewer’s point b) is critical and I hope that with the new parameter and model recovery analyses in the present revision (see other comments) I meet this requirement to a satisfactory degree.

      To clarify the scope and aim of the paper, I now put a new subsection in front of the example application to the data from Shekhar and Rahnev, 2021 (Line 534ff):

      “It is important to note that the present work does not propose a single specific model of metacognition, but rather provides a flexible framework of possible models and a toolbox to engage in a metacognitive modeling project. Applying the framework to an empirical dataset thus requires a number of user decisions: which metacognitive noise type is likely more dominant? which metacognitive biases should be considered? which link function should be used? These decisions may be guided either by a priori hypotheses of the researcher or can be informed by running a set of candidate models through a statistical model comparison. As an exemplary workflow, consider a researcher who is interested in quantifying overconfidence in a confidence dataset with a single parameter to perform a brain-behavior correlation analysis. The concept of under/overconfidence already entails the first modeling decision, as only a link function that quantifies probability correct (Equation 6) allows for a meaningful interpretation of metacognitive bias parameters. Moreover, the researcher must decide for a specific metacognitive bias parameter. The researcher may not be interested in biases at the level of the confidence report, but, due to a specific hypothesis, rather at metacognitive biases at the level of readout/evidence, thus leaving a decision between the multiplicative and the additive evidence bias parameter. Also, the researcher may have no idea whether the dominant source of metacognitive noise is at the level of the readout or report. To decide between these options, the researcher computes the evidence (e.g., AIC) for all four combinations and chooses the best-fitting model (ideally, this would be in a dataset independent from the main dataset).”

      In addition, the website of the toolbox now provides a lot more information about typical use cases: https://github.com/m-guggenmos/remeta

      3) More extensive parameter recovery needs to be done/shown. We would like to see a proper correlation matrix between parameters, and recovery across the parameter space, not only for certain regimes (i.e. more than fig 6 supp 3), that is, the full grid exploration irrespective of how other parameters were set.

      The recovery of the three metacognitive bias parameters is displayed in Fig 4, but what about the other parameters? We need to see that they each have a specific role. The point in the Discussion "the calibration curves and the relationships between type 1 performance and confidence biases are quite distinct between the three proposed metacognitive bias parameters may indicate that these are to some degree dissociable" is only very indirect evidence that this may be the case.

      A comprehensive parameter recovery analysis is indeed a key analysis that was missing in the first version of the manuscript. I now performed several analyses to address this, rewrote and extended section 2.3 on parameter recovery. The new parameter recovery analysis was performed as follows (Line 455ff):

      “To ensure that the model fitting procedure works as expected and that model parameters are distinguishable, I performed a parameter recovery analysis. To this end, I systematically varied each parameter of a model with metacognitive evidence biases and generated data. Specifically, each of the six parameters (σs, ϑs, δs, σm, 𝜑m, δm) was varied in 500 equidistant steps between a sensible lower and upper bound. The model was then fit to each dataset. To assess the relationship between fitted and generative parameters, I computed linear slopes between each generative parameter (as the independent variable) and each fitted parameter (as the dependent variable), resulting in a 6 x 6 slope matrix. Note that I computed (robust) linear slopes instead of correlation coefficients, as correlation coefficients are sample-sizedependent and approach 1 with increasing sample size even for tiny linear dependencies. Thus, as opposed to correlation coefficients, slopes quantify the strength of a relationship. Comparability between the slopes of different parameters is given because i) slopes are – like correlation coefficients – expected to be 1 if the fitted values precisely recover the true parameter values (i.e., the diagonal of the matrix) and ii) all parameters have a similar value range which makes a comparison of off-diagonal slopes likewise meaningful. To test whether parameter recovery was robust against different settings of the respective other parameters, I performed this analysis for a coarse parameter grid consisting of three different values for each of the six parameters except σm, for which five different values were considered. This resulted in 35·51 = 1215 slope matrices for the entire parameter grid.”

      In addition, I computed additional supplementary analyses assessing a case with fewer trials, a model with confidence biases, and models with mixed evidence and confidence biases. For details about these analyses, I kindly point the reviewer to section 2.3. Together, these new analyses demonstrate that parameter recovery works extremely well across different regimes and for all model parameters, including the metacognitive bias parameters mentioned in the reviewer’s comment.

      1.8: It would be important to report under what regimes of other parameters these simulations were conducted. This is because, even if dependence of Mratio onto type 1 performance is reproduced, and that is not the case for sigma_m, it would be important to know whether that holds true across different combinations of the other parameter values.

      I now repeated this analysis for various settings of other parameters and include the results as new Figure 6—figure supplement 2. While the settings of other parameters affect the type 1 performance dependency of Mratio (with some interesting effects such as Mratio > 1), parameter recovery of sigma_m is largely unaffected. The same basic point thus holds: Mratio shows a nonlinear dependency with type 1 performance, but sigma_m can be recovered largely without bias under most regimes (the main exception is a combination of low sensory noise and high metacognitive noise under the noisy-readout model, which is also mentioned in the manuscript).

      Is lambda_m meaningfully part of the model, and if so, could it be introduced into the Fig 1 model, and be properly part of the parameter recovery?

      I now reworked the part about metacognitive biases to make it more consistent and to introduce lambda_m on equal footing with the other metacognitive bias parameters. I now distinguish between metacognitive evidence biases (the two main bias parameters of the original model, phi_m and theta_m) and metacognitive confidence biases, i.e. lambda_m and a new additive confidence bias parameter kappa_m. The schematic presentation of the model framework in Figure 1 is updated in accordance:

      This change also complies with reviewer 2, who rightfully pointed out that the original model framework put much stronger emphasis on bias parameters loading on evidence than on confidence. The metacognitive confidence bias parameters are now also part of the parameter recovery analyses (Figure 7—figure supplement 2).

      While it is still feasible to combine the two evidence-related bias parameters and lambda_m – as queried by the reviewer – not all mixed combinations of evidence- and confidence-related bias parameters perform well in terms of model recovery (in particular, combining all four parameters; cf. Figure 7—figure supplement 3). Hence, a decision on the side of the modeler is required. I comment on this important aspect at the end of the section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or Km). Parameter recovery can become unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios one or two metacognitive bias parameters are a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

      4) An important nuance in comparing the present sigma_m to Mratio is that the present model requires that multiple difficulty levels are tested, whereas instead, the Mratio model based on signal detection theory assumes a constant signal strength. How does this impact the (unfair?) comparison of these two metrics on empirical data that varied in difficulty level across trials? Relatedly, the Discussion paragraph that explained how the present model departs from type 2 AUROC analysis similarly omits to account for the fact that studies relying on the latter typically intend to not vary stimulus intensity at the level of the experimenter.

      I thank the reviewer for this comment which made me realize that I incorrectly assumed that my model requires multiple stimulus difficulty levels. The only parameter that would require multiple stimulus intensities is the sensory threshold parameter, but for this parameter I already state that it requires additional stimulus difficulties close to threshold (Line 147ff). Otherwise I now made extensive tests that the model works just fine with constant stimuli. My reasoning mistake (iirc) was related to the fact that I fit a metacognitive link function, which I thought would require variance on the x-axis; but of course there is already plenty of variance introduced through noise at the sensory level, so multiple difficulty levels are not required to fit the metacognitive level. I now removed the relevant references to this requirement from the manuscript.

      Nevertheless, I agree that it is interesting to perform the comparison between Mratio and sigma_m also for a scenario with constant stimuli. See both the new Figure 6–supplement 1 with constant stimuli, and the (updated) main Figure 6 with multiple stimulus levels for comparison.

      The general point still holds also for constant stimuli: Mratio is not independent of type 1 performance. Thus, the observed dependence on type 1 performance is not due to the presence of varying stimulus levels. I now reference this new supplementary figure in Result section 1.8 (Line 389).

      5) 'Parameter fitting minimizes the negative log-likelihood of type 1 choices (sensory level) or type 2 confidence ratings (metacognitive level)'. Why not fitting both choices and confidence at the same time instead of one after the other? If I understood correctly, it is an assumption that these are independent, why not allow confidence reports to stem from different sources of choice and metacognitive noise? Is it because sensory level is completely determined by a logistic (but still, it produces the decision values that are taken up to the metacognitive level)?

      The decision to separate the two levels during parameter inference was deliberate. I now explain this choice in the beginning of Result section 2 (Line 416ff):

      “The reason for the separation of both levels is that choice-based parameter fitting for psychometric curves at the type 1 / sensory level is much more established and robust compared to the metacognitive level for which there are more unknowns (e.g., the type of link function or metacognitive noise distribution). Hence, the current model deliberately precludes the possibility that the estimates of sensory parameters are influenced by confidence ratings.”

      Indeed, I would regard it as highly problematic if the estimates of sensory parameters were influenced by confidence ratings, which are shaped by a manifold of interindividual quirks and biases and for which computational models are still in a developmental stage. Yet, from a pure simulation-based parameter recovery perspective, in which the true confidence model is known, using confidence ratings would indeed make sensory parameter estimation more precise (because of the rich information contained in continuous confidence ratings which is lost in the binarization of type 1 choices).

      6) Fig 4 left panels: could you clarify the reasoning that due to sensory noise, overconfidence is expected, instead of having objective and subjective probability correct aligning on the diagonal? Shouldn't the effects of sensory noise average out? In other words, why would the presence of sensory noise systematically push towards overconfidence rather than canceling out on average?

      As an intuitive explanation consider the case that no signal is present in a stimulus, e.g., a line grating in a clockwise/counterclockwise orientation discrimination task with an angle of 0 degrees. Since there is no true information in the stimulus, type 1 performance will be at chance level irrespective of sensory noise.

      However, sensory noise matters for the metacognitive level. Assuming no sensory noise (i.e., sigma_s = 0), the observer’s stimulus/decision variable would be zero and thus confidence would be zero. Thus, confidence would exactly match type 1 performance. Yet, assuming the presence of sensory noise, the stimulus estimate (“decision value”) will be always different from point-zero, if ever so slightly. While the average estimate of the stimulus variable across trials will indeed cancel out to zero, each individual trial will be different from zero (in either direction) and hence also the confidence will be different from zero in each trial. Since confidence is unsigned, the average confidence will be greater than zero and thus give the impression of an overconfident observer.

      Note that this explanation was implicitly included in the paragraph on the 0.75 signature of confidence (“When evidence discriminability is zero, an ideal Bayesian metacognitive observer will show an average confidence of 0.75 and thus an apparent (over)confidence bias of 0.25. Intuitively this can be understood from the fact that Bayesian confidence is defined as the area under a probability density in favor of the chosen option. Even in the case of zero evidence discriminability, this area will always be at least 0.5 − otherwise the other choice option would have been selected, but often higher.”, Line 257ff).

      7) The same analysis as Fig 6 but for noisy readout instead of noisy reports do not show the same results: both sigma_m and m-ratio vary as a function of type 1 performance. Does this mean that the present model with readout module does not solve the issue of dependency upon type 1 performance?

      I refer to this in the Result section: “The exception is a regime with very high metacognitive noise and low sensory noise under the noisy-readout model, in which recovery becomes biased” (Line 391ff). Indeed, the type 1 performance dependency of sigma_m recovery in this edge case is not as good as in the noisyreport model. However, note that recovery is stable across a large range of d’ including the range typical aimed for in metacognition experiments (i.e., medium performance levels to ensure sufficient variance in confidence ratings).

      It is also important to point out that a failure to recover true parameters under certain conditions is not a failure of the model, but a reflection of the fact that information can be lost at the level of confidence reports. For example, if sensory noise is very high, the relationship between evidence and confidence becomes essentially flat (Figure 3), producing confidence ratings close to zero irrespective of the level of stimulus evidence. It becomes increasingly impossible to recover any parameters in such a scenario. Vice versa if sensory noise is extremely low, confidence ratings approach a value of 1 irrespective of stimulus evidence, and the same issue arises. In both cases there is no meaningful variance for an inference about latent parameters. This issue is more pronounced in the noisy-readout case because it requires an inversion of precisely the relationship between evidence and confidence.

      8) In Eq8, could you explain why only the decision values consistent with the empirical choice are filtered. Is this an explicit modeling of the 'decision-congruence' phenomenon reported elsewhere (eg. Peters et al 2017)? What are the implications of not keeping only the congruent decision values?

      I apologize, this was a mistake in the manuscript. The integration is over all decision values, not just those consistent with the choice. I corrected it accordingly.

      Reviewer #2 (Public Review):

      This paper presents a novel computational model of confidence that parameterises links between sensory evidence, metacognitive sensitivity and metacognitive bias. While there have been a number of models of confidence proposed in the literature, many of these are tailored to bespoke task designs and/or not easily fit to data. The dominant model that sees practical use in deriving metacognitive parameters is the meta-d' framework, which is tailored for inference on metacognitive sensitivity rather than metacognitive biases (over- and underconfidence). This leaves a substantial gap in the literature, especially as in recent years many interesting links between metacognitive bias and mental health have started to be uncovered. In this regard, the ReMeta model and toolbox is likely to have significant impact on the field, and is an excellent example of a linked publication of both paper and code. It's possible that this paper could do for metacognitive bias what the meta-d' model did for metacognitive sensitivity, which is to say have a considerable beneficial impact on the level of sophistication and robustness of empirical work in the field.

      The rationale for many of the modelling choices is clearly laid out and justified (such as the careful handling of "flips" in decision evidence). My main concern is that the limits to what can be concluded from the model fits need much clearer delineation to be of use in future empirical work on metacognition. Answering this question may require additional parameter/model recovery analysis to be convincing.

      I thank the reviewer for these encouraging and constructive comments!

      Specific comments:

      • The parameter recovery demonstrated in Figure 4 across range of d's is impressive. But I was left wondering what happens when more than one parameter needs to be inferred, as in real data. These plots don't show what the other parameters are doing when one is being recovered (nor do the plots in the supplement to Figure 6). The key question is whether each parameter is independently identifiable, or whether there are correlations in parameter estimates that might limit the assignment of eg metacognitive bias effects to one parameter rather than another. I can think of several examples where this might be the case, for instance the slope and metacognitive noise may trade off against each other, as might the slope and delta_m. This seems important to establish as a limit of what can be inferred from a ReMeta model fit.

      This is an excellent point and was also raised by reviewer #1. See major comment 3 of reviewer #1 for a detailed response. In short, I now provide comprehensive analyses that demonstrate successful parameter recovery across different regimes and both noisy types (noisy-readout, noisy-report). See Figure 7.

      Regarding the anticipated trade-offs between the confidence slope (now referred to as multiplicative evidence bias) and metacognitive noise / delta_m (now additive evidence bias), there is a single scenario in which this becomes an issue. I describe this in the Results section as follows (Line 480ff):

      “Here, the only marked trade-off emerges between metacognitive noise σm and the metacognitive evidence biases (𝜑m, δm) in the noisy-readout model, under conditions of low sensory noise. In this regime, the multiplicative evidence bias 𝜑m becomes increasingly underestimated and the additive evidence bias δm overestimated with increasing metacognitive noise. Closer inspection shows that this dependency emerges only when metacognitive noise is high – up to σm  0.3 no such dependency exists. It is thus a scenario in which there is little true variance in confidence ratings (due to low sensory noise many confidence ratings would be close to 1 in the absence of metacognitive noise), but a lot of measured variance due to high metacognitive noise. It is likely for this reason that parameter inference is problematic. Overall, except for this arguably rare scenario, all parameters of the model are highly identifiable and separable.” In my experience, certain trade-offs in specific edge cases are almost inescapable for more complex models. Overall, I think it is fair to say that parameter recovery works extremely well, including the ‘trinity’ of metacognitive noise / multiplicative evidence bias / additive evidence bias.

      • Along similar lines, can the noisy readout and noisy report models really be distinguished? I appreciate they might return differential AICs. But qualitatively, it seems like the only thing distinguishing them is that the noise is either applied before or after the link function, and it wasn't clear whether this was sufficient to distinguish one from the other. In other words, if you created a 2x2 model confusion matrix from simulated data (see Wilson & Collins, 2019 eLife) would the correct model pathway from Figure 1 be recovered?

      Great point. I introduced a new subsection 2.4 “Model recovery”, in which I demonstrate successful recovery of noisy-readout versus noisy-report models. See also my response to the first comment of Reviewer #1, which includes the new model recovery figure and the associated paragraph in the manuscript. The key new figure is Figure 7—figure supplement 6.

      • Again on a similar theme: isn't the slope parameter rho_m better considered a parameter governing metacognitive sensitivity, given that it maps the decision values onto confidence? If this parameter approaches zero, the function flattens out which seems equivalent to introducing additional metacognitive noise. Are these parameters distinguishable?

      Indeed, the parameter recovery analysis shows a slight negative correlation between the slope parameter (now termed multiplicative evidence bias) and metacognitive noise (Figure 7). As the reviewer mentions, this is likely caused by the fact that both parameters lead to a flattening /steepening of the evidenceconfidence relationship. For reference, in the empirical dataset by Shekhar & Rahnev, the correlation between AUROC2 and the multiplicative evidence bias is almost absent at r = −0.017. Critically, however, while an increase of the metacognitive noise parameter σm will ultimately lead to a truly flat/indifferent relationship between evidence and confidence, the multiplicative evidence parameter 𝜑m only affects the slope (i.e., asymptotically confidence will still reach 1). This is one reason why parameter recovery for both σm and 𝜑m works overall very well. The differential effects of σm and 𝜑m are now better illustrated in the updated Figure 3:

      Also conceptually, the multiplicative evidence parameter 𝜑m plausibly represents a metacognitive bias, with either interpretation that I suggest in the manuscript: as a an under/overestimation of the evidence or as a an over/underestimation of one’s own sensory noise, leading to under/overconfidence, respectively. In sum, I think there are strong arguments for the present formalization and interpretation.

      • The final paragraph of the discussion was interesting but potentially concerning for a model of metacognition. It explains that data on empirical trial-by-trial accuracy is not used in the model fits. I hadn't appreciated this until this point in the paper. I can see how in a process model that simulates decision and confidence data from stimulus features, accuracy should not be an input into such a model. But in terms of a model fit, it seems odd not to use trial by trial accuracy to constrain the fits at the metacognitive level, given that the hallmark of metacognitive sensitivity is a confidence-accuracy correlation. Is it not possible to create accuracy-conditional likelihood functions when fitting the confidence rating data (similar to how the meta-d' model fit is handled)? Psychologically, this also makes sense given that the observer typically knows their own response when giving a confidence rating.

      While I agree of course that metacognitive sensitivity quantifies the relationship confidence-accuracy relationship, a process model is a distinct approach and requires distinct methodology. Briefly, the current model fit cannot be improved upon, as it is based on a precise inversion of the forward model. Computing accuracy-conditional likelihoods would lead to a biased parameter estimates, because it would incorrectly imply that the observer has access to the accuracy of their choice. While the observer knows their choice, as the reviewer correctly notes, they do not know the true stimulus category and hence not their accuracy.

      I argue in the manuscript that both approaches (descriptive meta-d’, explanatory process model) have their advantages and disadvantages. The concept of meta-d’ / metacognitive sensitivity does not care why a particular confidence rating is the way it is, or whether an incorrect response is caused by sensory noise or by an attentional lapse. On the one hand, this implies that one cannot draw any conclusions about the causes and mechanisms of metacognitive inefficiency, which could be perceived as a major drawback. In this respect, it is a purely descriptive measure (cf. last comment of Reviewer #1). On the other hand, because it is descriptive, it can simply compare the confidence between correct and incorrect choices and thus, in a sense, capture a more thorough picture of metacognitive sensitivity; that is, being metacognitively aware not only of the consequences one’s own sensory noise (as in typical process models), but also of all other sources of error (attentional lapses, finger errors, etc.). I now added an additional paragraph in which I summarize the comparison of type 2 ROC / meta-d’ and process models along these lines (Line 800ff):

      “In sum, while a type 2 ROC analysis, as a descriptive approach, does not allow any conclusions about the causes of metacognitive inefficiency, it is able to capture a more thorough picture of metacognitive sensitivity: that is, it quantifies metacognitive awareness not only about one’s own sensory noise, but also about other potential sources of error (attentional lapses, finger errors, etc.). While it cannot distinguish between these sources, it captures them all. On the other hand, only a process model approach will allow to draw specific conclusions about mechanisms – and pin down sources – of metacognitive inefficiency, which arguably is of major importance in many applications.”

      • I found it concerning that all the variability in scale usage were being assumed to load onto evidencerelated parameters (eg delta_m) rather than being something about how subjects report or use an arbitrary confidence scale (eg the "implicit biases" assumed to govern the upper and lower bounds of the link function). It strikes me that you could have a similar notion of offset at the level of report - eg an equivalent parameter to delta_m but now applied to c and not z. Would these be distinguishable? They seem to have quite different interpretations psychologically: one is at the level of a bias in confidence formation, and the other at the level of a public report.

      I substantially reworked the section about metacognitive biases, including an additive metacognitive bias (κm) also at the level of confidence. The previous version of the manuscript already included a multiplicative bias parameter loading onto confidence (previously referred to as ‘confidence scaling’ parameter, now multiplicative confidence bias λm), but it was considered optional and e.g. not part of the parameter recovery analyses.

      My previous emphasis on biases that load onto evidence-related variables was due to a more principled interpretation (e.g. ‘underestimation of sensory noise’), but I agree that metacognitive biases must not necessarily be principled and may be driven e.g. by the idiosyncratic usage of a particular confidence scale. Updated Figure 1 sketches the new, more complete model.

      Is a mix of evidence- and confidence-related metacognitive bias parameters distinguishable? I tested this in Figure 7—figure supplement 3.

      The slope matrices show that e.g., the model suggested by the reviewer (two evidence-related bias parameters 𝜑m and δm + an additive confidence-based bias parameter κm) is to some degree dissociable, although slight tradeoffs start to emerge with such a complex model. By contrast, a mix of only one evidence-related and one confidence-related bias parameter is much more robust. In general, I thus recommend using at most two metacognitive bias parameters, which are selected either based on a priori hypotheses or on a model comparison. I comment on the necessity of choosing one’s bias parameters in a new paragraph in section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or m). Parameter recovery is more unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios 1 or 2 metacognitive bias parameters is a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

    1. Author Response

      Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      We thank reviewer #1 for the suggestions on the kinetics of prestin and previous literature.

      Although there is no data (to our best knowledge) for electromotilty (eM) in isolated basal murine OHCs, a more thorough review of the existing literature on the topic suggest that the assumed parameters are indeed a reasonably conservative estimation of eM in situ.

      Additionally, the OHC parameters are pessimistic enough to account for a doubling of effective capacitance due to NLC.

      Regarding the fallacy of composition, we are puzzled that the reviewer interpreted it as a “scorning” of the OHC biophysics, obviously important for cochlear function. The raised point is simple and rather obvious: a system built with low-pass filters doesn’t mean that the system is a low-pass filter. This is elucidated with the analogy, familiar to electrical engineers, that high- and band-pass filters are often built by cascading and mixing the response of low-pass filters. The “fallacy of composition” therefore lies in the conclusion that since eM is “low-pass”, it can’t possibly contribute to high frequency amplification. Strikingly, this conclusion is often based on measured vibrations near the OHCs showing transfer functions with >30 dB peak-to-tail ratio, and that are somewhat consistent with the inner working of cochlear models. That is, we are criticizing one specific interpretation of the biophysical data, not certainly suggesting that collecting and analyzing the data in the first place is unimportant.

      Reviewer #2 (Public Review):

      In the inner ear, the cochlea transforms sound-induced vibrations into electrical signals that are sent to the brain. Cochlear outer hair cells (OHCs) are thought to amplify these vibrations, but it is unclear how amplification works. Sound-induced vibrations modulate the current entering an OHC, which drive its receptor potential, causing the OHC to change length. The change in length owing to the receptor potential variation, known as the OHC's electromotile response, depends on the size of the receptor potential. However, the receptor potential decreases with increasing sound frequency, because of the resistance (R) and capacitance (C) of the OHC's membrane. This paper addresses the RC problem, limitations on high-frequency amplification owing to the OHC's receptor potential decreasing with frequency.

      The authors use a well-known simplification of the RC problem and some back-of-the-envelope calculations to argue that OHCs can amplify sufficiently well at high frequencies to match experimental data, despite the decrease in their receptor potentials. They argue that changes to OHC properties along the cochlea allow them to amplify at high frequencies and that OHCs reduce noise and distortion. They argue against OHCs as being cochlear impedance regulators and that OHCs do not limit cochlear tuning.

      Figure 1 and Equations 1-6 are useful teaching tools but are not novel. The back-of-the-envelope calculations use these equations and a limited number of data points from the literature. There are many prior models that show amplification despite the RC problem, but they are not analyzed or discussed in much detail.

      How RC OHC filtering reduces noise without reducing the signal is not explained. The type of noise calculation done in Appendix 1 is well-known and the application is again a rough back-of-the-envelope calculation. Most of the statements about noise are not fleshed out or supported by calculations.

      The discussion about tonotopic variations has little new data. Fig. 2 uses two data points from the literature and an unpublished data point from a colleague. The fact that BM displacement is smaller at the base than at the apex is well known. There is speculation that reduced OHC motion is "effectively counteracted" by gradients in OHC capacitance and MET current, but no evidence is presented.

      The discussion about distortions is pedagogical but is again speculation without new or strong-supporting evidence. Fig. 3 argues that OHCs might reduce high-frequency distortions, but don't limit the cochlear amplifier. The plots shown are either well-known consequences of filtering or a summary of the authors' previous model data.

      The arguments against OHCs as regulators and that they don't limit tuning are not well flushed out, speculative, and unsupported by new calculations or data.

      This paper does not clarify OHC operation or the RC problem, because it mixes speculation, limited data, and topics that are not clearly related to the problem.

      We agree with reviewer #2 that there are no new physics principles elucidated here, and that most of the discussion relies on simple calculations. But we believe that such simple calculations are the missing piece (absent in the literature) that allow one to appreciate the magnitude of the problem under exam—magnitude typically inflated by focusing on quantities whose physical significance is uncertain. In other words, we believe that the simplicity of the calculations and physical reasoning is not a bug, but a feature of the paper.

      We believe that in his criticism regarding various topics of discussion presenting little or speculative new evidence, this reviewer might not have fully considered that most of the evidence provided here is fundamentally a physics-based review of the recent experimental data, incidentally the same type of data previously employed to argue that the RC problem is dramatic in the first place. Likely we didn't convey this message clearly enough in the manuscript.

      While the arguments against OHCs as regulators are not all new, they are often ignored (or perhaps forgotten) and we believe there is a value in synthesizing them all in one place. The support for these arguments comes from fundamental hydrodynamic principles, previous modeling studies, and most importantly from OCT data collected over the last 6 years. Of course, the discussion on the plausibility of suggested mechanisms lacking a concrete proposal cannot be 100% “analytic”.

      About noise and signal amplification, the missing piece perhaps is that distributed internal noise sources (e.g., thermal and shot noise) are independent of each other and hence spatially incoherent. While the manuscript doesn’t specifically deal with signal vs. noise amplification in cochlear models, spatially distributed amplification is known to boost signals more than internal noise—a principle universally used in telecommunications and addressed in >60-year-old literature.

      Reviewer #3 (Public Review):

      This paper discusses the effect of the low-pass filtering between outer hair cell transducer current and receptor voltage. The filter's cut-off frequency (where the response is down by a factor of 0.71 of its maximum) can be quantified by the resistance and capacitance of the cell hair cell's basolateral membrane. The capacitance value is determined mainly by the lipid membrane and is augmented by the charge movement of the piezoelectric prestin molecule, which endows the OHC with its electromotile properties. The OHC's capacitance (C) value is pretty well known. The resistance (R) is determined mainly by K+ channels in the basolateral membrane, a value that is also known reasonably well. The low-pass cut-off frequency is equal to (2pi*RC)^-1 and has a value of a ~1 to a few kHz - a value that has both experimental and theoretical support. The low-pass filtering of membrane voltage is important because the cell responds to membrane voltage by shortening and lengthening - this electromotility is thought to be key to the cochlea's operation and in particular to cochlear amplification, the process that enhances the magnitude and tuning of the cochlea's passive response to sound. However, the auditory system works to 80 kHz and even higher in some animals. Thus, it has been posed (let's say by team A) that the RC cut-off frequency value of a few kHz makes electromotility too slow to operate "cycle-by-cycle" up to several 10s of kHz. The article under review, representing team B, supports "cycle-by-cycle" action, arguing that the several kHz cut off frequency is not a problem and is even an advantage.

      The arguments put forward in favor of cycle-by-cycle action are:

      1. The size of the motions, even with the low-pass-filtered attenuation are as large or larger as those measured in the cochlea at high frequencies.

      2. Noise is often increasing as frequency decreases, thus low-pass-filtering is actually good, to reduce the predominantly low frequency noise.

      3. Harmonic distortion is at supra-CF frequencies, so it's good if the hair cell is low-pass-filtering to reduce harmonics.

      These three points are reasonable, and the quantification relating to statement 1 is convincing. However, the quantification associated with point 2 is muddled. The hair cell voltage signal is expressed in volts, but the noise value is given in terms of the current mediated by 1-5 channels. A quantitative comparison should be made, with signal and noise expressed in the same units, preferably volts and volts/root(Hz), with a bandwidth estimated. The appendix attempts to be more quantitative and something like that short appendix should be incorporated into the paper. If a quantitative comparison in standard units is not possible with current data, that can be stated and underscores that we really don't know whether the noise is a problem for cycle-by-cycle amplification. Point 3 is reasonable and nicely illustrated in Fig. 3B. I did not get anything from Fig. 3A and the corresponding discussion on page 8 lines 320-335. Panels C and D were under-explained and could be removed, and the caption's reference to "short wave hydrodynamics" was also under-explained.

      The arguments put forward to challenge gain control mechanics, which employ DC shifts to set effective operating conditions:

      1. Operation based on DC and quasi-DC operating points is sensitive to noise, which as noted above is often increasing as frequency decreases.

      2. Operation that employs a DC shift for operating point is likely to work in such a way to reduce stiffness, which has been shown to be inconsistent with active cochlear responses. For example, stiffness reduction would reduce traveling wave wavelength and thus alter the response phase and timing to a degree that has not been observed experimentally. This has long been known and relevant papers are cited.

      Point 4 was not convincing to me because the motions related to setting operating conditions could be larger than the nanoscale cycle-by-cycle response motions - thus these operating point motions could be above the noise values that seem limiting to cycle-by-cycle amplification. Point 5 is a nice reminder of the conclusion that, based on experimental findings and physics-based basic cochlear models, the cochlear amplifier must work by means of energy injection. This point was made clearly by Kolston (well cited in this paper) and later supported by other work.

      The present paper is informative in many ways and offers useful insights for further exploration. It is nicely written and illustrated. Because the signal and noise values are not quantified, the basic claim, that the cochlea amplifier can amplify a noisy signal effectively, is not convincing and that basic question is still unsettled. Overall, the paper would be improved if the claims and arguments were presented more tightly, with fewer digressions, and more modestly.

      We thank reviewer #3 for the many comments and suggestions.

      We agree that plotting the spectral density of a “near-threshold” OHC signal vs. inherent electric noise results in much simplification. Regarding noise and signal amplification, previous work on transmission lines points out that amplification is the way to increase SNR along the line.

      We believe that part of the undergoing confusion is that the problem is not how OHC can amplify a “noisy signal” —the cochlea amplifies “noisy” sounds similarly as it amplifies pure tones— but how OHCs can amplify signals in presence of internal noise. Amplification and detection are two distinct things, and signal amplification does not rely on detection. Detection is an intrinsically nonlinear decision process (e.g., signal present/absent). Amplification in relevant frequency ranges is what allows to detect signals in the real world (e.g., radio receivers). The cochlea (as portrayed by classic theories) does not seem exceptional in this regard.

      We agree that the effect of noise on DC responses is not very clear in the manuscript. Although it is difficult to make quantitative statements on a hypothesis that lacks a concrete mechanistic proposal, ~63% of (inherent) electric noise power is confined below the RC corner frequency, i.e, the frequency band of the regulatory OHC. In presence of (unavoidable) flicker and brown noise (e.g., Brownian motion of stereocilia), this percentage can only increase. Conversely, in the frequency band of OHC cycle-by-cycle amplification, the noise power is only a tiny fraction of the total.

    2. Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      1. J. Santos-Sacchi, Asymmetry in voltage-dependent movements of isolated outer hair cells from the organ of Corti. J. Neurosci. 9, 2954-2962 (1989).<br /> 2. A. J. Hudspeth, How the ear's works work. Nature 341, 397-404 (1989).<br /> 3. J. Santos-Sacchi, W. Tan, The Frequency Response of Outer Hair Cell Voltage-Dependent Motility Is Limited by Kinetics of Prestin. J. Neurosci. 38, 5495-5506 (2018).<br /> 4. G. Frank, W. Hemmert, A. W. Gummer, Limiting dynamics of high-frequency electromechanical transduction of outer hair cells. Proc. Natl. Acad. Sci. U. S. A. 96, 4420-4425 (1999).<br /> 5. J. Santos-Sacchi, D. Navaratnam, W. J. T. Tan, State dependent effects on the frequency response of prestin's real and imaginary components of nonlinear capacitance. Sci. Rep. 11, 16149 (2021).<br /> 6. J. Santos-Sacchi, W. Tan, Complex nonlinear capacitance in outer hair cell macro-patches: effects of membrane tension. Sci. Rep. 10, 6222 (2020).<br /> 7. A. Sasmal, K. Grosh, Unified cochlear model for low- and high-frequency mammalian hearing. Proc Natl Acad Sci U S A 116, 13983-13988 (2019).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This is a well-executed and interesting study addressing a still controversial issue in clathrin-mediated endocytosis, namely the nature of curvature generation during formation of endocytic clathrin coated vesicles. The authors have applied new techniques to this old question, including state-of-the-art high resolution 3D single-molecule localization microscopy (SMLM, i.e. Super-resolution microscopy), a new maximum-likelihood based fitting framework to fit complex geometric models into localized point clouds (Wu et al., 2020, BioRxix) and mathematical modeling leading to a new cooperative curvature model of clathrin coat remodeling and temporal reconstruction of CCP structural dynamics based on the distribution of static super-resolution images. This is an important contribution, but will it resolve the controversy of constant curvature vs constant area for CCP invagination? I doubt it. In some ways the controversy is somewhat contrived and, as this paper shows the answer is unlikely to be either or. Below are some specific comments, in somewhat random order, from someone (a curmudgeon?) who has reviewed and/or carefully read these papers since 1980. Points that the authors should address are in bold. All can be addressed with modifications to the text, as the one experiment I asked for (quantification of clathrin recruitment) is impossible with this approach).

      • I wonder how many people who cite Heuser's 1980 paper have ever read it carefully. Indeed, many of the observations made here were also made by Heuser. Below, for example, is a summary I wrote, but then removed from a review as it was too lengthy "While Heuser favored the model that CCPs assemble first as flat structures and then rearrange during invagination, he was also careful to note several caveats. First, he observed that the edges of CCPs were 'ragged', likely reflecting sites of assembly of new polygons and that pentagons were more abundant at the edges. Thus, he argued that 'if even a few of these edge pentagons were destined to become completely surrounded with hexagons, it would be necessary to conclude that some degree of curvature can be built into coats as soon as they form". Second, by examining tilted sections he observed that "even the flattest baskets have a small degree of inward curvature, and many were complete hemispheres". Finally, he cautioned that his images were snap-shots and a precursor-product relationship could not, therefore, be unambiguously established and that the very large flat lattices he observed might well be 'prove to be some sort of dead end'. We now know that fibroblasts, in particular, have large numbers of static flat clathrin plagues."

      Thus, many of the author's conclusions, i.e. that 'completely flat clathrin coats are rare (pg 12, although they're not numbered), and that curved structures can be seen to emerge from the edges of flat lattices (see Supplemental Figure 1a, 3 examples on the right) are indeed consistent with Heuser's observations. In many ways, Heuser's 1980 paper is used as a straw man argument for the constant area model. The authors should more accurately cite and acknowledge this seminal paper.

      Response: __We thank the reviewer for this insightful and constructive input on the interpretation of the constant area model (CAM). We have revised the discussion (Page 14, Lines 397-402), citing Heuser’s observations more carefully and in similarity of what was already suggested eloquently by the reviewer. We agree that the strict interpretation of the CAM is misleading, and early evidence already suggests its flawed approximation of the endocytic mechanism (further mentioned now on __Page 15, Lines 429-431).

      • As Heuser did in his 1980 classic, the authors here would do well to note several caveats related to their analyses. These include:

      +

      Like Heuser they have assembled static imaged to create a pseudotemporal model, albeit using a much more quantitative approach. Nonetheless, it seems that this assumes only a single, stereotypic pathway for CCV formation. How good is this assumption? We know from dynamic imaging that there exists significant heterogeneity in both the kinetics and the molecular composition of CCPs. The authors should acknowledge this limitation.

      __Response: __We agree with the reviewer that the lack of direct temporal information is a clear limitation of our approach.

      We now introduce this limitation on Page 16, Lines 474-484, where we discuss the disadvantage of reconstructing an average trajectory based on static images. Here, the assumption of a single, stereotypic pathway of endocytosis is addressed. We cannot exclude the possibility of slight mechanistic variations being averaged out using our approach. However, we want to highlight the fact that our approach seems sensitive enough to distinguish between structures that originate via endocytosis, and structures that derived from a different pathway, potentially from the Golgi.

      We further address the kinetic variability in terms of abortive events on Page 14, Lines 405-411, __and discuss their effect on the mechanistic interpretation of our results. Generally speaking, abortive events are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on __Page 14, Lines 385-395).

      • The method, which required that they 'optimized the sample preparation to densely label clathrin at endocytic sites' involves labeling cells to near saturation with rabbit polyclonal antibodies to both clathrin light chains and clathrin heavy chains followed by detection with a second polyclonal donkey anti-rabbit. This gives 20 nm of additional and presumably flexible linker on the label. How might this effect the measurements and modeling? The Wu et al paper, which BTW has not been peer-reviewed, shows high precision fitting of the nuclear pore structure, but using endogenously tagged NUP-95, not two-layers of antibodies. The authors will need to discuss this limitation, it is my biggest concern regarding the analysis shown.

      Response: __We acknowledge the limitations imposed by indirect immunolabelling and formulated a hypothesis on how this could affect our model fit (mentioned on __Page 13, Line 363, illustrated in Supplementary Figure 6). A larger linkage error between label and target molecule would increase the distribution of localizations around the true underlying structure. As LocMoFit fits our spherical model directly to the localization coordinates, it is able to take this distribution into account, and will weigh the fit results based on the uncertainty of the localization estimation. A uniform distribution of labels around the true underlying structure should therefore be fitted accurately also at larger linkage error. A non-uniform labeling could occur should e.g. the densely crowded space between the coat and the plasma membrane not allow for the diffusion of the antibody to the clathrin epitopes. In that case, labeling would be one-sided, and instead of the true underlying structure, LocMoFit would optimize the spherical model to the highest probability density of label around + 10 nm from the true clathrin coat. This would result in an overestimation of the radius by the model, which we could correct by substracting 10 nm from the experimentally determined radius. This was done in Supplementary Figure 6 for the hypotheses of (1) uniform displacement by the antibodies; (2) biased displacement of the antibodies towards the cytosol; and (3) biased displacement of the antibodies towards the plasma membrane. Whilst we see that the fitting parameters scale with the corrected radii, the mechanistic interpretation of partial flat pre-assembly on the membrane, and subsequent bending and surface area growth still holds true.

      • One reason for continued controversy in this field is the lack of rany attempt to resolve findings obtained using different methods. Can a parsimonious explanation be found, or are their artifacts or misinterpretations of previous findings that can explain the discrepancies? Any valid model should fit all of the valid data. For example, the authors fail to cite a recent paper by Willy et al in Dev Cell (PMID 34774130), which has been on BioRxiv since 2019 (doi: https://doi.org/10.1101/715219). Here, similar to this present study, the authors used high resolution SIM-TIR to analyze ~1000 CCPs in 3 different cells lines (sadly non-overlapping with the cells used herein) and in Drosophila embryos to quantitatively test the two models. They conclude that their findings unambiguously support a constant curvature model. The authors would do the field a favor if they carefully read this paper and identified areas of commonality (i.e. that curvature is detected at early stages in both cases) and possible explanations for the discrepancies. Certainly, they should not ignore it.

      Response: __We agree with the reviewer on the importance of consolidating findings from different studies to converge to a generally accepted mechanism of clathrin coat formation. We had indeed cited Willy et al in the introduction, but agree that further discussion of their findings should be included. We therefore discuss their findings in more detail, also in comparison to our work, on __Page 17, Lines 502-511. We agree that we reach contradictory conclusions, which we think lies at least in part with the way that Willy et al. analyze their data. Willy et al. acquire 2D projections of the endocytic clathrin structures, whose size is just at the limit of their image resolution. They then compare their projected sizes to a purist constant area model, which assumes that a coat has to grow to its entire surface as an entirely flat structure and then instantaneously snaps to an increased curvature, resulting in a sudden drop of the projected area (footprint). As we and others (e.g. Bucher et al 2011, Heuser, 1980) have observed, completely flat lattices are rare, and curvature is initiated before final surface area is acquired. We do not agree that the absence of a purist constant area model implies that clathrin mediated endocytosis follows a constant curvature trajectory. Instead, we imagine that our cooperative curvature model is likely to fit well with the observations of Willy and colleagues.

      • An important body of evidence that is not considered in their model or discussion is that derived from live cell imaging. In addition to the heterogeneity mentioned above, studies have shown that the clathrin addition to CCPs is complete (i.e. the growth phase) occurs within the first ~20-30s, followed by a variable length (0->100s) plateau phase (Loerke et al, PMID 21447041). Both the current study and the Willy et al study admit that they may not be able to detect the earliest intermediates in CCP assembly. Indeed, in this study the smallest surface area CCPs are only 2-fold smaller than the largest CCPs, suggesting that over half of the triskelions have been recruited before a CCP can be distinguished from the background of clustered, nonspecifically-bound antibodies. Could the authors be monitoring events during the plateau phase and not the earliest events? Regardless, the findings are important as they address the nature of curvature generation during this plateau phase. While monitoring curvature generation during early events in CME, a recent study (Wang et al., eLife, PMID 32352376) showed that the acquisition of curvature within the first 20s of CCP assembly was a distinguishing feature between abortive and productive events. The authors might discuss how these studies on CCP dynamics might (or might not) inform their models.

      __Response: __We thank the reviewer for this very insightful comment and discuss this hypothesis on __Page 16-17, Lines 485-511. __We suggest that part of the initiating/growth phase observed in live-cell dynamics falls into the fast, flat assembly that we are unable to capture with our approach. It is challenging to clearly identify at which point in real-time we are detecting our earliest sites. We would however argue that the plateau phase in real-time could coincide with curvature generation and final addition of triskelia at the lattice rim. The variability in the duration of this plateau phase could therefore result from variable recruitment speed of triskelia and other factors during the finalizing of the vesicle neck.

      • The authors advertise 'quantitative' description of clathrin coated structure and indeed their measurements and models are quantitative; but there is no measure of intensity/numbers of triskelions and CCP growth: an important piece of quantitative data. I expect this is impossible with indirect immunofluorescence but should be considered as a limitation of the approach. Indeed, to my knowledge no one has yet quantitatively measured curvature generation in parallel to clathrin addition at CCPs (closest is Saffarian and Kirchhausen, PMID 17993495), but they don't discuss the relationship.

      Response: __We agree with the reviewer that quantifying the number of triskelia would be an essential piece of information to correlate area growth and curvature generation with dynamic information retrieved from fluorescence intensity in live-cell studies. Unfortunately, the indirect immunolabelling approach used in this work complicated this quantification, and direct comparison between number of localizations and fluorescence intensity cannot be made. However, we do observe a correlation between coat surface area and number of localizations in our data and show this in the newly added __Supplementary Figure 7. This allows us to formulate the hypothesis on Page 16-17, Lines 485-511, which suggests that the plateauing of fluorescence intensity coincides with curvature generation and final triskelia addition to the coat rim. We further highlight the necessity of capturing both high spatial and temporal resolution simultaneously, to ultimately overcome this limitation.

      • On page 7 equation 1, you assume a constant growth rate for addition of triskelia, but later describe that the rate might be cooperative (as the number of edges increases). How would this affect your modeling?

      Response: __We formulate the __surface area growth rate of the clathrin coat to be proportional to the rim length with a constant____ rate. The cooperativity between clathrin molecules we consider to affect the rate of curvature generation. The more molecules are present, the more the entire coat is inclined to bent. We rephrased that section to emphasize this distinction (Page 8, Line 217).

      Minor points:

      • Can you indicate in the first paragraph of the results that you are using indirect immunofluorescence with rabbit anti-CLCA, anti-CHC and detection with donkey anti-rabbit for labeling, to augment the rather vague statement 'we optimized the sample preparation to densely label clathrin at endocytic sites'.

      Response: __We added a clear indication on the labelling strategy used in this work on __Page 4, Lines 109-110.

      • I'm not comfortable with the conclusioin on page 5 that your data 'indicates that at the time point of scission, the clathrin coat of nascent vesicles is still incomplete'. Other explanations might be the relative kinetics of scission vs CCP growth (i.e. these structures are too transient to detect), or that deeply invaginated pits are sheered-off the membrane during sample preparation (there is evidence that most biochemically isolated CCVs are derived from sheered CCPs).

      Response: __We extended the explanation for the absence of fully closed vesicles with the hypotheses mentioned by the reviewer on __Page 5, Lines 159-161.

      • Bottom of page 5, can you briefly mention what data is shown in Supplemental Figure 2 (ie. Figure 2D and examples of likely non-endocytic CCPs shown in Supplemental Figure 2). When I read this, I questioned your speculation.

      Response: __We clarified the cross reference to (now) Supplementary Figure 3 accordingly on __Page 6, Lines 184-185.

      • Can you indicate N CCPs from N cells in the data in Tables 2-3 for fibroblasts and U2OS cells? Do you observe and have to ignore a larger number of flat/clustered CCPs in the fibroblasts?

      Response: __We indicated the number of cells and sites per data set in the Table captions on __Page 36, Lines 51; 959; and 967. We did not quantify the number of flat/clustered, plaque like structures in our data sets. During data acquisition, we would specifically select cells with minimal number of these structures present, and even within this cell chose an area in the periphery exhibiting low number of plaques. Our data is therefore not ideal to reliably quantify plaque density between different cell lines. Qualitative observations showed that whilst we had to disregard a few cells from the U2OS and SK-MEL-2 cell-lines due to high plaque formation, the 3T3 fibroblasts were relatively straight forward to image, as few cells showed high plaque density. A recent study by Hakanpää et al., 2022 (bioRxiv) showed the decreased formation of plaques when cells were seeded on fibronectin. The fact that fibroblasts excrete their own fibronectin agrees well with our observations of relatively few 3T3 cells exhibiting extensive plaque formation.

      • The last 3 paragraphs of the Introduction are results. The Introduction might best be used to review literature in more detail, discuss the reasons why uncertainty still exists and perhaps indicate how the methods applied here will help.

      Response: __We re-wrote the last 3 paragraphs of the introduction, now clearly stating the knowledge gap in the field, and what methods would be required to bridge it (Page 3, Lines 80-102).__

      Reviewer #1 (Significance (Required)):

      This is another excellent addition to a growing list of papers seeking to define the process of curvature generation at endocytic clathrin coated pits. In my opinion, its impact would be increased by better integrating the results presented here with other studies and methods, including the recent paper by Willy et al and the large body of literature on coated pit dynamics, some of which might be relevant in interpreting results, or at least placing them in a real vs pseudo-temporal perspective. The methods introduced and the quality of imaging, modeling and quantification further increase the study's significance. The finds will be of interest to those in the CME field, those studying membrane curvature generation in other contexts, those modeling CME, vesicle formation and curvature generation and those using SMLM to discern the structure of macromolecular assemblies.

      Reviewer expertise: Clathrin-mediated endocytosis (Sandra Schmid)

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis.

      The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.

      __Response: __We thank the reviewer for this comment and very much agree that the relationship between mechanical properties structural adaptation of the endocytic machinery is a highly interesting question. We came to the same conclusion and are therefore exploring this relationship at the moment. This is however not a straightforward task, and the complex nature of plasma membrane mechanics necessitates careful experimental design. It is therefore outside the scope of this publication. We do think this point further highlights the potential of the method presented here, as it allows the investigation of additional principles in clathrin-mediated endocytosis mechanics. We do hope to share our insights on this topic soon.

      In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Response: __Generally speaking, abortive events (now discussed on __Page 14, Lines 405-411) are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on Page 14, Lines 385-395).

      Abortive events throughout the later process of endocytosis would, according to our data, still follow the same mechanistic trajectory as other sites. They could potentially slightly skew our pseudotime analysis, as they would result in an overestimation of specific endocytic stages. The overall mechanistic insight of our work would not be greatly affected, as curvature generation would still occur according to the same trajectory. Due to the low impact on our overall results we do not discuss these late abortive events further.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.

      Response: __We extended our explanation for the presence of non-endocytically derived structures in our data set on __Page 6, Lines 184-189. We further extended the supplementary information with an additional experiment (Supplementary Figure 4), highlighting the absence of AP2-positive structures within the disconnected population. As AP2 is a specific marker for CME, these results further solidify our hypothesis. Further experiments would be required to determine their exact origin, and are outside of the scope of this publication.

      Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?

      Response: __Whilst our assumption states the growing of clathrin coat on flat membranes, we do not restrict our model to an intercept through 0, and it would therefore still hold true even in the case of growth starting on slightly bent membranes. The impact of the preference of clathrin for curvature is considered as a potential mechanistic explanation for the positive feedback in curvature generation described by our model. We therefore already cite the reference mentioned by the reviewer on __Page 8, Line 224.

      As we do observe flat structures in our data set (discussed more in detail now on Page 14, Lines 396-404), we still think the assumption of early flat growth holds true.

      Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.

      __Response: __That is correct, an oversight on our part. We changed the cross-reference.

      Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Response: __We changed the cross-reference. We were addressing a subsection of __Supplementary Figure 8.

      Reviewer #2 (Significance (Required)):

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors used single-molecule localization microscopy of clathrin in fixed cells (2 human cell lines, one mouse) to capture snapshots of a clathrin-mediated endocytosis (CME), fitted these localizations to a geometric model of a forming vesicle, and used these fitted measurements to test existing models of clathrin-mediated vesicle formation before refining their own. Specifically, the closing angle, a measure of vesicle completeness, was used as a proxy for growth-stage of the vesicle such that the many captured snapshots could reconstruct a pseudo-timeline with an unknown parameterization of time on closing angle. Two standard models of CME vesicle formation, where the surface area is kept constant or where the curvature is kept constant, were examined and determined to be incommensurate with the pseudo timelines of curvatures and surface area. The authors then describe their own model for CME vesicle formation, in which neither surface area nor curvature are constant in evolution of the vesicle, and cooperative forces are hypothesized to non-linearly modulate the curvature-growth as a function of closing angle. Additionally, by binning snapshots and then aligning, scaling, and azimuthally smoothing each bin, they reconstruct representations of distinct endocytic stages.

      Major comments:

      Most results are quite convincing, and the authors do a nice job of displaying examples of SMLM data, both with fit results as well as example clathrin assemblies that are too far removed from their budding-vesicle model to be included for analysis, for example. It is also worth noting that the clathrin images themselves appear to be very high-quality - clearly, as detailed in the methods, attention was given to each step of the imaging and reconstruction process.

      While the presented cooperative curvature model seems reasonable and surely fits the curvature-, surface area-, and rim length-vs. closing angle data better than the simplistic constant surface-area and constant curvature models, it also has more parameters, namely: gamma (the initial rate of curvature change with closing angle) and H_0 (the final preferred curvature). It would be appropriate to calculate an information criterion (e.g. Bayesian), using an assumption of Gaussian-distributed errors (presumably the data fitting in R was least squares, so this would match) to justify the additional parameters.

      Response: __This is an important observation by the reviewer. Indeed, our model uses one more parameter compared to the models we compare it with. To justify this, we performed the calculation as suggested by the reviewer, and found that the cooperative curvature model (CoopCM) indeed results in the lowest BIC (__Supplementary Notes). We therefore are confident that out of the three models tested in this work, our CoopCM fits best to the underlying experimental data (Page 8, Lines 232-235).

      A related issue relates to the error in the extracted value of the closing angle from a single 3D reconstruction - the error distribution should be quantified for this very important parameter. The errors in the other parameters extracted from the fits are less important, but would enhance the paper.

      Response: __We thank the reviewer for pointing out the importance of the estimation error of the key parameter closing angle. To address this point, based on the geometrical model, we simulated clathrin-coated structures with closing angles evenly distributed across the entire range (0-180°). This realistic simulation represents the data quality (e.g., localization precision and labeling efficiency) of the experimental data (corresponding methods are included in __Pages 22- 23, Lines 679-706). The result of fitting these structures using LocMoFit shows an unbiased estimation with small spread of the error (overall STD = 2.82°; see the newly included Supplementary Figure 2a).

      Pseudo-temporal sorting on closing angle makes sense and I appreciate the authors mentioning potential caveats to the monotonicity, etc. However, a comment about the impact of closing angle errors on the pseudo-time determinations would be helpful. The agreement of theta-rank plots with the hypothesized sqrt(t) scaling is reassuring.

      I additionally appreciate the robustness of fitting a geometric structure from localizations rather than relying on pseudo-temporal sorting on clathrin count extracted from localization-merging of multi-blinking emitters.

      Response: __The pseudo-temporal sorting is based on the precisely estimated closing angle, and therefore is also precise, as the distribution of the fitted closing angle has no significant distortion compared to the expectation (__Supplementary Figure 2b).

      The authors did a nice job of qualifying their more speculative claims, in particular I appreciated their mentioning the possibility that smaller clathrin coats could be below their detection limit.

      The authors state a set of data points in suppl. figure 2D (and suppl. Fig 3A-C) are "likely" small clathrin-coated vesicles from the trans Golgi. I appreciate the examples rendered in that figure so a reader can appraise, but if they have my background they might not know how reasonable exclusion of this data is from model testing. This claim could be rephrased or the rationale expanded upon to justify the Golgi hypothesis.

      Response: __We agree with the reviewer and further expanded on our hypothesis on the origin of the structures within the disconnected cloud of data points (Page 6, Lines 184-189). We further performed an additional experiment (Supplementary Figure 4)__, where we simultaneously imaged the clathrin coat at high resolution, and the CME specific AP2 complex tagged with GFP at diffraction limited resolution. We observed that there were no AP2-GFP positive structures present in the disconnected cloud of our data set, and conclude that these structures indeed must originate via a different pathway.

      The data and methods are presented such that they could be reproduced, and replicating their experiment in multiple cell lines, across multiple species, would seem to be adequate replication. As mentioned above, the statistical analysis of whether the model complexity is justified by improved goodness of fit is currently missing but can readily be checked and added.

      Minor comments:

      Last paragraph of the introduction, positive feedback is mentioned but not the slowing down as preferred curvature is realized (inclusion of which might help foster a clearer understanding of the model early on).

      Response: __We now mention the slowing down towards a preferred curvature in our introduction on __Page 3, Lines 100-102.

      In Fig. 1, please state in the figure caption what is being displayed in the two large panels and what is the color map. Is this the 3D data from the overlapping elliptical Gaussians projected on the plane in a "hot" map? Further, in the top right small panels, are the x-y images projections of all z, or measured at a specific z?

      Response: __We adjusted Figure 1 and the figure caption to clearly explain what is mentioned in each superresolution panel. The exact details for image rendering, including the color map and gaussian blurring of the localization coordinates are now described in the methods on __Page 21, Lines 625-627. Ultimately, the x-y images represent an enlarged view of the projections as visible in the previous two panels. We hope that rephrasing of Figure 1 legend clarifies this accordingly.

      In Eqn. (1), epsilon is not defined.

      Response: __The definition is mentioned on __Page 8, Line 210, right before the equation, same as for kon.

      For the theta-rank plots (Fig4 B, SFig D-F ii) moving the theta(t)=sqrt(t) red curves behind sorted theta data would make the data easier to see.

      __Response: __We adjusted the Figures according to the reviewer's suggestion.

      "Laser" in sentence about the speckle reducer should probably be plural.

      Response: __We corrected this grammar mistake, and changed “laser” to “lasers” on __Page 20, Line 586.

      I would like to see the "custom" algorithm based on redundant cross-correlation for drift correction briefly described.

      Response: __We added an explanation on the algorithm used for the drift correction on __Pages 20-21, Lines 611-617.

      A legend for supplemental figure 3 A-C would be nice.

      Response: __We added a legend for the various models in (now) __Supplementary Figure 5, and further made some clarifications in the figure caption.

      If the definition of the abbreviation flat-to-curved-transition as FTC was explicit I missed it.

      Response: __As we do not use this abbreviation anywhere else in the manuscript, we removed it from the __Supplementary Note to avoid confusion.

      Resolution of 20 and 30 nm (laterally and axially, respectively) was quoted once towards the beginning of the manuscript as being an improvement resulting from the localization method described in Li et al., 2018. Resolution can be difficult to speak about precisely, but the methods section would seem to indicate that localizations are filtered at 20 nm lateral localization precision (potentially 30 nm axially?), and I think the authors could consider rephrasing to depict this unless I am missing elsewhere a description of the resolution metric being used.

      Response: __The original 20 and 30 nm resolution (laterally and axially) was calculated based on the median localization precision values in x-y and z for a representative image, using the FWHM approach (described in Methods __Page 21, Lines 621-624). After consideration of the reviewer's question, we found the modal value to be a better quantity to calculate the resolution, and changed this in the text accordingly (Page 4, Lines 113-115, and Methods Page 21, Lines 621-624).

      Reviewer #3 (Significance (Required)):

      Proteins involved with inducing curvature in membranes are in general very exciting targets for localization microscopy, yet still for many systems questions remain unanswered. The authors tackle one such question in this manuscript. In other, unresolved, discussions, the posed hypotheses are quite similar to the simplistic models surpassed in this work (e.g. that curvature scales linearly with local protein copy number, or that surface area scales linearly with local protein copy number). The idea of cooperativity may be useful for others to consider, and the authors additionally demonstrate a seemingly smooth workflow using their separately described tools (primarily LoMoFit; Wu et al. 2021).

      I myself am not an expert on CME or vesicle trafficking. My background is primarily in SMLM method development and SMLM / fluorescence image analysis. From my perspective, the novelty of the biological conclusions appears to be the authors' specific cooperative model and the presence of two structural states which are enriched (closing angle 70{degree sign} and 130{degree sign}). As referenced, and authors F. Frey and U. S. Schwarz nicely present in Bucher et al. 2018, the constant curvature and constant surface area models are known to be inaccurate descriptions of CME evolution, and further it is also known that clathrin first assembles small flat structures before beginning to curve the membrane. However, the 3D super-resolution imaging and direct evaluation of a 3D model geometry in this work is a nice extension of the 2D super-resolution imaging and projection evaluation in the authors' previous work studying endocytosis through ensemble averaging in yeast (Mund et al. 2018) as well as the analysis on projections in Bucher et al. 2018. Fully 3D treatment of the clathrin structures allows the authors to orient asymmetric assemblies such that they are averaged out in their ensemble reconstruction, and as they point out the molecular specificity afforded by a fluorescence-based technique ensures unbiased segmentation of clathrin-involved endocytic sites. In other words, while this work does not describe a technical advance not already described elsewhere, it sets a nice example for those researching protein-membrane interactions of how to leverage the right tools to clearly and directly answer their questions. With their additional work to make these tools extensible to other geometries, multiple color channels, etc., I expect their work to inspire quality studies in other systems. That significance is complementary to their proposal of a reasonable model for the geometric evolution of CME.

      References:

      Maximum-likelihood model fitting for quantitative analysis of SMLM data, Yu-Le Wu, Philipp Hoess, Aline Tschanz, Ulf Matti, Markus Mund, Jonas Ries, bioRxiv 2021.08.30.456756; doi: https://doi.org/10.1101/2021.08.30.456756

      Bucher, D., Frey, F., Sochacki, K.A. et al. Clathrin-adaptor ratio and membrane tension regulate the flat-to-curved transition of the clathrin coat during endocytosis. Nat Commun 9, 1109 (2018). https://doi.org/10.1038/s41467-018-03533-0

      Markus Mund, Johannes Albertus van der Beek, Joran Deschamps, Serge Dmitrieff, Philipp Hoess, Jooske Louise Monster, Andrea Picco, François Nédélec, Marko Kaksonen, Jonas Ries, Systematic Nanoscale Analysis of Endocytosis Links Efficient Vesicle Formation to Patterned Actin Nucleation, Cell, 174, 4, (2018). https://doi.org/10.1016/j.cell.2018.06.032.

      s

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis. The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.
      2. In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.
      2. Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?
      3. Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.
      4. Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Significance

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors attempt to optimize the FluoroSpot assay to allow for the assessment of cross-reactive antibodies targeting conserved epitopes shared by multi-allelic antigens and those specific to unique antigen variant at the B cells level. This is a critical aspect to consider when identifying targets of a broad range of cross-reactive antibody for vaccine development and the antigen VAR2CSA used in this work is one that will benefit from the method described in the manuscript.

      Overall, this is a method manuscript with extensive detail of the assay validation process. The description of the assay performance steps using, first monoclonal antibodies and later hybridoma/immortalized B cells was important to understand conditions that can influence the antigen-antibody interactions in the assay. This multiplex approach can assess the cross-reactivity of antibody to up four allelic variants of an antigen with the possibility to explore the affinity of antibody to a particular variant using the RSV measurements. The validation of the assay with PBMC from malaria exposed donors both men and women (that naturally acquired high titer of antibodies to VAR2CSA during pregnancy) is a strength of this work as this is in the context of polyclonal antibodies with more heterogenous antibody binding specificities.

      The ability of the assay to detect cross-reactive antibodies using all four tags appear highly variable even in the context of monoclonal antibody targeting the homologous antigen labelled with all 4 tags.

      We understand the concern for variability, but we think that in general the assay was very consistent. Regardless of the configuration used, we detected strikingly comparable number of spots/well, especially when the homologous antigen labelled with four tags was used (Figure 2A). Similar consistency has been previously reported when a similar assay was used to study cross-reactivity in dengue-specific antibodies.

      Overall, it appears that the assessed antibody reactivity with TWIN tagged antigens was relatively low and this needs to be explained and discussed as the current multiplex method, as it is, might just be optimized for study of cross-reactive antibodies to 3 antigens.

      The LED380 (used to detect and visualize the TWIN tag) indeed gave more background than the other three detection channels. We normally observed a ring of fluorescence at the edge and the middle of the wells, accompanied by lower intensity of the spots. These two characteristics are apparent in the figures and RSV plots presented in the manuscript. In an attempt to reduce these issues, we attempted to substitute the TWIN tag for a BAM tag detected with a peptide-specific antibody (data not presented). However, that approach did not improve the readout and we therefore decided to keep the TWIN-StrepTactin pair for all the experiments. Importantly, even with these issues, routine manual inspection of the wells confirmed the Apex software automatically and efficiently counted “real” spots giving us confidence on the performance of the assay. We acknowledge that exclusion of the LED380 data would lead to higher assay accuracy. However, it would result in reduced ability to assess broad antibody cross-reactivity, which was the primary objective of our study. We have added text briefly discussing this to the revised manuscript (lines 154-160).

      As acknowledged by the authors, the validation of this assay on PBMC from only 10 donors (7 women and 3 men) is a caveat to the conclusion and increasing this number of donors (the authors have previously excelled in B cells analyses of PfEMP1 proteins and would have PBMC readily available) will strengthen the validity of this assay.

      We thank the reviewer for this comment and agree the number of donors tested is far from sufficient to provide any conclusive evidence regarding frequencies of VAR2CSA-specific and cross-reactive B cells in the context of placental malaria. However, we firmly believe that the validation of the assay – which was the objective of the study – is sufficient, especially because we included human B-cell lines isolated from donors naturally exposed to VAR2CSA-expressing parasites. Futures studies including more donors and full-length VAR2CSA antigens are certainly warranted. As the performance of assay has now been validated (this manuscript) to our satisfaction, we are indeed planning such studies.

      Reviewer #2 (Public Review):

      The manuscript describes the development of a laboratory-based assay as a tool designed to identify individuals who have developed broadly cross-reactive antibodies with specificity for regions that are common to multiple variants of a given protein (VAR2CSA) of Plasmodium falciparum, the parasite that causes malaria. The assay has potential application in other diseases for which the question ofacquisition of antibody-mediated immunity, either through natural exposure or through vaccination, remains unresolved.

      From a purely technical/methodological viewpoint, the work described is of high quality, relying primarily on the availability of custom-designed, in-house-derived protein and antibody reagents that had, for the most part, been validated through use in earlier studies. The authors demonstrate a high degree of rigour in the assay development steps, culminating in a convincing demonstration of the ability to accurately and reproducibly quantify cross-reactive antibody types under controlled conditions using well-characterized monoclonal antibodies.

      In a final step, the authors used the assay to assess the content of broadly cross-reactive antibodies in samples from a small number of malaria-exposed African men and women. Given that VAR2CSA is a parasite-derived protein that is exclusively and intimately involved in the manifestation of malaria during pregnancy, with specific localisation to the maternal placental space, the premise is that antibodies -including those with cross-reactive specificities - should be almost exclusively detectable in samples from women, either pregnant at the time of sampling or having been pregnant at least once. The assay functioned technically as expected, identifying antibodies predominantly in women rather than men, but it failed to identify broadly cross-reactive antibodies in the women's samples used, only revealing antibodies with specificity for just one of the different variants used. The latter result could have two mutually non-exclusive explanations. On the one hand, the small number of women's samples (7) screened in the assay could simply be insufficient, demanding the use of a much larger panel. On the other hand, for technical reasons the assay involves the use of only relatively restricted parts of the VAR2CSA protein, and this particular aspect may represent its primary limitation. In earlier work, the authors did identify broadly cross-reactive antibodies in samples from African women, but that work relied on the use of the whole VAR2CSA protein present in its natural state embedded in the membrane of the infected red cell, or as a complete protein produced in the laboratory. The important point being that the whole protein likely interacts with antibodies that recognize protein structures that the isolated smaller parts of the whole protein used in the assay fail to reproduce, and that the cross-reactive antibodies identified recognize these structures that are conserved across different VAR2CSAvariants. The authors recognize these potential weaknesses in their discussion of the results. It is also possible that VAR2CSA variants expressed by parasites from geographically-distinct regions (Africa, Asia, South America) are themselves distinct, and this aspect could also have affected the outcome, since the variant protein sequences used in the assay were derived from parasites originating in these different regions.

      The assay could find application in the malaria research field in the specific context of assessments of antibody responses to a range of different parasite proteins that are, or have been, considered candidates for vaccine development but for which their extensive inherent allelic polymorphism has effectively negated such efforts.

      We thank the reviewer for the kind evaluation. We fully acknowledge the need for more comprehensive studies to assess the robustness of the pilot data regarding antibody cross-reactivity after natural exposure in the present study, which was aimed to document the performance of the complicated multiplexed assay rather than to provide such evidence. As mentioned above, we are currently planning such a study. We also acknowledge the need to assess the degree of cross-reactivity to full-length antigens rather than domain-specific components of them. This is obviously particularly true for large, multi-domain antigens such as PfEMP1 (including VAR2CSA). Such an exercise is complicated by the need for appropriately tagged antigens. We are intrigued by the apparent discrepancy between the degree of antibody cross-reactivity in depletion experiments using individual DBL domains of VAR2CSA (low cross-reactivity) versus full-length VAR2CSA antigens (very substantial cross-reactivity) reported by Doritchamou et al., and are keen to apply our approach to explore that finding. Therefore, as also mentioned above, we are currently planning a study employing tagged full-length VAR2CSA allelic variants as well.

    1. Reviewer #1 (Public Review):

      The authors test whether neurons in V1 show "multiplexing", which means that when two stimuli A and B are presented inside their receptive fields (RFs), the neuronal response fluctuates across trials between coding one of the two, leading to a bimodal spike count histogram. They find evidence for this "mixture" model response in a subset of V1 neurons. They next test whether the spike count noise correlations (Rsc) vary between pairs of neurons that prefer the same versus different stimuli, and show that Rsc is positive for neurons that prefer the same stimulus but negative for neurons that prefer different stimuli.

      While this paper shows some intriguing results, I feel that there are a lot of open questions that need to be addressed before convincing evidence of multiplexing can be established. These points are discussed below:

      1. The best spike count model shown in Figure 2C is confusing. It seems that the number of "conditions" is a small fraction of the total number of conditions (and neurons?) that were tested. Supplementary Figure 1 provides more details (for example, the "mixture" corresponds to only 14% of total cases), but it is still confusing (for example, what does WinProb>Min mean?). From what I understood, the total number of neurons recorded for the Adjacent case in V1 is 1604, out of which 935 are Poisson-like with substantially separated means. Each one has 2 conditions (for the two directions), leading to 1870 conditions (perhaps a few less in case both conditions were not available). I think the authors should show 5 bar plots - the first one showing the fraction for which none of the models won by 2/3 probability, and then the remaining 4 ones. That way it is clear how many of the total cases show the "multiplexing" effect. I also think that it would be good to only consider neurons/conditions for which at least some minimum number of trials are available (a cutoff of say ~15) since the whole point is about finding a bimodal distribution for which enough trials are needed.

      2. More RF details need to be provided. What was the size of the V1 RFs? What was the eccentricity? Typically, the RF diameter in V1 at an eccentricity of ~3 degrees is no more than 1 degree. It is not enough to put 2 Gabors of size 1 degree each to fit inside the RF. How close were the Gabors? I am confused about the statement in the second paragraph of page 9 "typically only one of the two adjacent gratings was located within the RF" - I thought the whole point of multiplexing is that when both stimuli (A and B) are within the RF, the neuron nonetheless fires like A or B? The analysis should only be conducted for neurons for which both stimuli are inside the RF. When studying noise correlations, only pairs that have overlapping RFs such as both A and B and within the RFs of both neurons should be considered. The cortical magnification factor at ~3-degree eccentricity is 2-2.5mm/degree, so we expect the RF center to shift by at least 2 degrees from one end of the array to the other.

      3. Eye data analysis: I am afraid this could be a big confound. Removing trials that had microsaccades is not enough. Typically, in these tasks the fixation window is 1.5-2 degrees, so that if the monkey fixates on one corner in some trials and another corner in other trials (without making any microsaccades in either), the stimuli may nonetheless fall inside or away from the RFs, leading to differences in responses. This needs to be ruled out. I do not find the argument presented on pages 18 or 23 completely convincing, since the eye positions could be different for a single stimulus versus when both stimuli are presented. It is important to show that the eye positions are similar in "AB" trials for which the responses are "A" like versus "B" like, and these, in turn, are similar to when "A" and "B" are presented alone.

      4. Figures 5 and 6 show that the difference in noise correlations between the same preference and different preference neurons remains even for non-mixture type neurons. So, although the reason for the particular type of noise correlation was given for multiplexing neurons (Figure 3 and 4), it seems that the same pattern holds even for non-multiplexers. Although the absolute values are somewhat different across categories, one confound that still remains is that the noise correlations are typically dependent on signal correlation, but here the signal correlation is not computed (only responses to 2 stimuli are available). If there is any tuning data available for these recordings, it would be great to look at the noise correlations as a function of signal correlations for these different pairs. Another analysis of interest would be to check whether the difference in the noise correlation for simply "A"/"B" versus "AB" varies according to neuron pair category. Finally, since the authors mention in the Discussion that "correlations did not depend on whether the two units preferred the same stimulus or different", it would be nice to explicitly show that in figure 5C by showing the orange trace ("A" alone or "B" alone) for both same (green) and different (brown) pairs separately.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive comments and are pleased that all reviewers share our opinion, that the present study “makes an important contribution to the molecular architecture of mitochondria”, is in addition “an important advancement in our understanding of the mechanism by which Cqd1 regulates CoQ distribution” and will “thereby appealing to the broad readership of the journals”. We are convinced that addressing the important points raised by the reviewers will further strengthen the manuscript and result in additional significant insights in the molecular function of Cqd1.

      Reviewer #1:

      The major concerns affecting the conclusions are: 1) Experimental evidence is lacking on the contribution of contact site formation by Cqd1 to the effects on mitochondrial architecture and respiration-dependent growth. Determining the effects of the overexpression of the kinase-dead mutant on mitochondrial morphology and contact site formation with Por1-Om14 can address that.

      We thank reviewer #1 for raising these important points. Indeed, the various functions of Cqd1 might be independent from each other and so far we cannot distinguish between them. As suggested by the reviewer we will analyze the effect of overexpression of CQD1 in the Dups1 deletion mutant and make use of the point mutant in the conserved ATP binding domain which cannot complement the phenotype of the Dups1 Dcqd1 double deletion mutant. We generated a yeast mutant strain expressing Om14-3xHA in the absence of wild type Cqd1. Expression of the cqd1(E330A) mutant in the Om14-3xHA background and subsequent immunoprecipitation will allow us to test whether ATP binding is also essential for contact site formation. Preliminary experiments showed that the overexpression of cqd1(E330A) in the Dcqd1 deletion background results in a growth defect comparable to that caused by overexpression of CQD1 WT. Therefore, we think it might be more promising to analyze the interaction of Om14 and Cqd1 E330A at wild type level in order to avoid pleiotropic effects.

      In addition, we will further characterize the cqd1(E330A) mutant by analyzing the effect of its overexpression on mitochondrial morphology, cell growth and assembly of MICOS and F1FO ATP synthase in the Dcqd1 deletion background.

      2) Related to point #1, Cqd1 overexpression in deltaUsp1 cells could have addressed whether the role of Cqd1 in contact sites and mitochondrial architecture is independent of its role on CoQ distribution and phospholipid metabolism. Further characterization of the kinase-dead Cqd1 mutant on CoQ distribution, contact sites, mitochondrial archictecture and phsophsolipid metabolism might help discerning how these activities can be separated.

      We agree that the related points 1) and 2) raised by reviewer #1 are important and addressed our plans in the response on point 1).

      3) It is unclear how both Cqd1 overexpression and deletion induce mitochondrial fragmentation. Performing live cell imaging with a mitochondrial-phoactivatable GFP to measure mitochondrial fusion rates could help discerning the causes for fragmentation. It is a possibility that overexpression induced fragmentation by activating fission without changing fusion, while deletion induced fragmentation by blocking fusion.

      We thank reviewer #1 for bringing up this point. Perhaps our explanation in this respect was too short. Fig. 4E shows that deletion of CQD1 does not result in altered mitochondrial morphology, however, deletion of CQD1 in the Dups1 background leads to virtual complete fragmentation of the mitochondrial network. This is likely due to inhibition of mitochondrial fusion through disturbed processing of the fusion protein Mgm1 (see Fig. 4D). In contrast, overexpression of CQD1 does NOT result in formation of small mitochondrial fragments, but in formation of huge mitochondrial clusters which in addition contain a large proportion of ER membranes. So, we don’t think that this phenotype is related to either enhanced fission or reduced fusion. We will clarify this point in text of the revised manuscript.

      Minor comment:

      1) Figure 4 claims that mitochondrial function is impaired by ups1 deletion, which Cqd1 deletion exacerbates. However, no respiration data is shown in figure 1, only measurements of mitochondrial architecture are shown. Thus, oxygen consumption measurements are needed to claim effects on mitochondrial function.

      We did not want to claim that mitochondria lose respiratory competence upon simultaneous deletion of CQD1 and UPS1. Actually, our results indicate that the Dups1 Dcqd1 double deletion mutant grows like wild type on complete medium containing glycerol. Therefore, respiration is not impaired in this mutant. However, mitochondrial function is not restricted to ATP production by oxidative phosphorylation. The reviewer probably refers to Figure 4 where we show that mitochondrial biogenesis and dynamics are impaired in the Dups1 Dcqd1 double deletion mutant – the heading of the legend summarizes this as "mitochondrial function". We will be more precise in the revised version on this point and add a panel showing growth of the mutant strain on non-fermentable carbon source to avoid any further confusion.

      2) Some Western blots lack quantifications and statistical analyses of independent experiments.

      It is correct that some quantification and the respective statistics were missing in the initially submitted manuscript. We will add the requested information in the revised version of the manuscript.

      Reviewer #2:

      I have the following concerns for the authors to consider. (1) Although biochemical evidence shows that Cqd1 is likely a factor that forms CS structures in mitochondria, it would make the manuscript stronger if the authors can observe uneven distribution of Cqd1 in the mitochondrial membranes (assessed by fluorescent microscopy or ideally high-resolution microscopy) and the presence of Cqd1 in the region of close apposition of the OM and IM by immunogold labeling for electron microscopy.

      Two independent lines of evidence show that Cqd1 is a novel contact site protein: (i) it is found in the contact site fraction in density gradients (Fig. 6A), and (ii) it can be co-immunoprecipitated with outer membrane proteins (Fig. 6G, H, I). Furthermore, the co-IP is supported by cross-links of expected size (Fig. 6F). In sum, we feel that this is solid evidence to support our claim that Cqd1 is present in mitochondrial contact sites. However, it still might be interesting to check an uneven distribution of Cqd1 in mitochondria, as suggested by the reviewer. We will do this by 3D deconvolution fluorescence microscopy.

      (2) Since the structural characterization of Cqd1 is important to understand its interactions with the OM proteins and other UbiB protein kinase-like family proteins, Coq8 and Cqd2, take different orientations, the membrane topology of Cqd1 should be experimentally analyzed. The authors state, "two hydrophobic stretches can be identified in the Cqd1 sequence, of which the first one (amino acids 125-142) might be a bona fide transmembrane segment" (lines 97-100); then is Cqd1 a single membrane spanning protein or two-membrane spanning protein?  

      Unfortunately, it was not possible to test the location of the N terminus experimentally because an N-terminally tagged variant of Cqd1 (tag inserted between presequence and mature part) turned out to be unstable. We consider it very unlikely that the second hydrophobic stretch is a transmembrane domain as it is rather short (only 11 amino acids). Furthermore, several Cqd1 homologs in other fungi, including Yarrowia lipolytica, Aspergillus niger and Schizosaccharomyces pombe, are lacking the second hydrophobic stretch. Therefore, we propose that the major part of Cqd1 including the protein kinase-like domain is exposed to the intermembrane space. We will point out this more clearly in the revised manuscript.

      (3) The authors state, "conserved GxxxG dimerization motif (amino acids 504‐508)" (Fig. 1A caption), but this description needs a reference. The GxxxG motif was proposed to mediate transmembrane helix-helix association (https://doi.org/10.1006/jmbi.1999.3489), which is not consistent with the membrane topology proposed by the authors.

      We thank reviewer #2 for this comment. It is correct that GxxxG motifs are usually present in transmembrane a-helices. However, there is information available indicating that these motifs may also be present in soluble proteins and are stabilizing dimeric interactions for instance in the homodimeric Holliday-junction protein resolvase (Kleiger et al., 2002; doi: 10.1021/bi0200763.). However, as this point is not critical for our conclusions we will remove the discussion of the GxxxG motif from the revised manuscript.

      (4) What is the role of the kinase activity of Cqd1 in the CS formation? The effects of overexpression of Cqd1 (Fig. 7) should be tested for its E330A mutant.

      We also thank reviewer #2 for raising this important point similar to reviewer #1. Please see our response to point 1) of reviewer #1.

      (5) Is there stoichiometric as well as quantitative information on the 400 kD complex consisting of Cqd1, Por1 and Om14? Does the stoichiometry and amount of the complex depend on the growth condition? Does the complex contain other Por1 interacting IM proteins like Mdm31?

      We appreciate that reviewer #2 points out this important aspect. It might well be that the amount of the Cqd1 containing complex depends on growth conditions since its presence might be important for phospholipid homeostasis, CoQ distribution and mitochondrial architecture and morphology which for sure strongly depend on growth conditions. Therefore, we will try to analyze the amount of the Cqd1 complex present in mitochondria isolated from yeast cells grown on different media by BN-PAGE. So far we do not have any information on the stoichiometry of this complex and we feel that an analysis would go beyond the scope of this study. We agree with reviewer #2 that Mdm31 is an obvious candidate for an interaction partner of Cqd1. We actually tested this by co-immunoprecipitation using Cqd1-3xHA or Mdm31-3xHA. However, none of these approaches resulted in successful co-isolation of the potential interaction partner. We will mention this result in the revised manuscript.

      (6) For Fig. 7E, the authors state, "consistently, we observed dramatically increased mitochondria‐ER interactions Cqd1 overexpression", but this observation could be due to secondary effects because overexpression of Cqd1 itself already caused abnormal morphology of mitochondria.

      We thank reviewer #2 for bringing up this important point. To check whether the increased mitochondria‐ER interactions are a secondary effect due to altered mitochondrial morphology we will analyze the mitochondria‐ER interactions in other mitochondrial morphology mutants by fluorescence microscopy. This will reveal whether abnormal mitochondrial morphology generally leads to disturbed ER structure.

      (7) Since the antagonistic role of Cqd2 to Cqd1 was proposed, the results of the experiments for Cqd1 can be compared with those for Cqd2. For example, what will become of overexpression of Cqd2 instead of Cqd1 for Fig. 7? What is the lipid composition of the cqd1Dcqd2D double deletion mutant cells (the decreased PA level is recovered?)? Lines 424-425: In summary, overexpression of Cqd1 causes severe phenotypes on growth, formation of mitochondrial structural elements, and mitochondrial architecture and morphology. Is this phenotype affected by overexpression of Cqd2?

      This point raised by reviewer #2 is very interesting. Our preliminary experiments and previously published data (Tan et al., 2013) indicate that overexpression of Cqd2 is also toxic and results in the formation of huge mitochondrial clusters. Therefore, we will extend our study and analyze the effect of overexpression of CQD2, either alone or in combination with overexpression of CQD1.

      Reviewer #3:

      1) The central point of the paper is that Cqd1 is part of a novel contact site between the inner and the outer membrane. Om14 and Por1 were identified as outer membrane components of this contact site by immunoprecipitation. The data look convincing but they were generated from targeted experiments to test the involvement of suspected proteins. Ideally, one would like to see a cross-linking mass spectrometry (XL-MS) experiment that identifies the physical interactions of Cqd1 without bias.

      We thank reviewer #3 for acknowledging the presented data as convincing. Considering the significant amount of experiments planned for the revised version of the manuscript, we hope that reviewer #3 agrees that this point is not essential.

      2) Could an analogous blot of the MICOS complex be added to Figure 6D?

      Of course, we are happy to include BN-PAGE analysis showing the running behavior of MICOS next to the Cqd1 containing complex in Fig. 6D.

      3) In the Introduction, a host of contact sites is mentioned, which are partly from older papers. I'm not sure whether this is the accepted view of the field. Also, newer data suggest that the permeability transition pore is derived from complex V rather than ANT, CK, and VDAC. The authors should double check in order to represent the current state of the art

      We thank reviewer #3 for this comment. We will update this part according to the more recent literature.

    1. Author Response

      Reviewer #2 (Public Review):

      First, I want to congratulate the author team on this manuscript, which I read with great pleasure. I think this will be a fine addition to the literature!

      The present MS by Clement et al. provides a comprehensive overview of the brain shapes of lungfishes. Besides previously known/described brain endocasts, the work includes models and descriptions of previously undescribed taxa. Notably, all CT data are deposited online following best practices when working with digital anatomy. The specimen sample is impressive, especially as the sampled material is housed in museum all over the world. Although the sample size may seem numerically low (12 taxa), this actually is a comprehensive sample of fossil (and extant) lungfishes in terms of what's preserved in the first place.

      The study at hand has several goals: (1) The description of lungfish brains for taxa that were previously undescribed; (2) the quantification of aspects of brain shape using morphometric measurements; (3) the characterization of brain shape evolution of lungfishes using exploratory methods that ordinate morphometric measurements into a morphospace.

      The provided 3D data and descriptions will serve as valuable comparisons in future lungfish work. This type of data is imperial for palaeontological studies in general, and the anatomical information will be extremely valuable in the future. For example, anatomical characters related to brain architecture have been shown to be informative about phylogeny in the past, and the presented data may inform future phylogenetic studies. The quantification of brain shape via (largely linear) measurements is relatively simplistic, and can thus only detect gross trends in brain shape evolution among lungfishes. The authors describe several such trends - such as high variation in the olfactory brain region in comparison to other parts of the brain. The results and interpretations drawn from the authors are supported by their data, and the approach taken is valid, even if more sophisticated shape quantification methods (e.g. 3D landmarking) and analytical methods (e.g. explicit phylogenetic comparative methods) are available, which could provide additional insights in the future.

      We agree with Reviewer #2 that 3D geometric morphometrics could have provided more sophisticated analytical methods. However, geometric morphometrics has some limitations with regard to the type of data that we analysed: (1) low sample size and (2) missing/incomplete data. In order to have a comprehensive coverage of the brain shape, it would have required to have numerous landmarks (and semilandmarks) to represent the complexity of brain shape.

      First, our sample size (12 taxa) is low (although it is an impressive sample size when considering the type of data). Although there are no universal rule concerning the ratio “number of specimens / number of landmarks” (Zelditch et al., 2012), ideally the sample size must be from two to three times the number of landmarks. Thus, with a sample size of 12 we could have used ca. 4-6 landmarks which is very limited to describe complex shapes. In addition, in order to use geometric morphometrics (2D or 3D), the landmarks should be present on all the specimens. Because of the partial completeness of the studied fossils, the brain endocasts are not uniformly known for each species. Incomplete and deformed specimens prompt the removal of potential landmarks for analyses. Even using right-left reflexion of the endocasts, most specimens do not share all neurocranial information.

      We agree with Reviewer #2 that a phylogenetic PCA could have provided interesting analytical perspectives. Phylogenetic PCA are available on standard PCA, it is uncertain that it can be used on Bayesian PCA and InDaPCA (this method has been published very recently, and we haven’t found much literature about it). However, we did not find an adaptation of phylogenetic PCA to the BPCA nor the InDaPCA; we even contacted Liam Revell, who created the phylogenetic PCA, about this issue.

      The presented results and interpretations in this regard must be seen as a preliminary assessment of lungfish brain evolution, but it is clearly written and generally well performed.

      A potential shortcoming of the paper is the lack of explicit hypothesis testing, which is not problematic per se, but puts limits on the conclusions the authors can draw from their data.

      We decided to address the issues using exploratory methods rather than testing hypotheses. It is a more conservative approach, since it is the first quantitative analysis of dipnoan endocasts. Future analyses, will be able to formulate hypotheses based on our interpretation of our exploratory approach. We hope to stimulate such hypotheses testing, when in the future further dipnoans will be added; however, one has to remember that ossified neurocrania are known in Devonian dipnoans and one partially ossified neurocranium in a Carboniferous, the remaining dipnoans have cartilaginous neurocrania which limit the sample size from which endocast data could be gathered.

      For example, the authors state that different anatomical parts of the labyrinth (particularly, the utricle with respect to the semicircular canals or saccule) may show modular dissociation from other labyrinth modules, based on the polarity of eigenvalue signs of the PCA analysis. I think this is fine as a first approximation, but of course there are explicit statistical tools available to test for modularity/integration, such as two-block partial least squares regression analysis (Rohlf & Corti 2000, Syst. Biol.). I don't see the lack of usage of such methods as problematic, because you cannot do everything in one paper, and the authors remain careful in their interpretation.

      We agree with Reviewer #2 that different geometric morphometrics methods have been developed to look at variational modularity; one of the co-authors (RC) has been publishing a few papers on patterns of morphological integration and modularity in fishes (see Larouche, Cloutier & Zelditch, 2015, Evol. Biol.; Lehoux & Cloutier, 2015, J. Exp. Zool. Mol. Dev. Evol.; Larouche, Zelditch & Cloutier, 2018, Sci. Rep.). Interesting a priori hypotheses of brain modules could have been formulated and tested for modularity using for example Covariance Ratio (CR) and distance matrix approach. But still the low sample size and the incompleteness of the data are major constrains to test modularity. We would however endeavour to use such methods in future work as more complete material becomes available.

      It may be advisable, however, to add the odd sentence or statement about how some findings are preliminary or hypothesized, and that these should receive further treatment and testing using other methods in the future. I think this approach is actually very rewarding, because then you can inspire future work by outlining outstanding research problems that arise from the new data presented herein.

      We have now included an additional sentence early in the Discussion section stating: “We acknowledge that our investigation of lungfish brain evolution as elucidated from morphometric analysis of cranial endocasts is still preliminary in several respects. We hope that our study can inspire future work on the neural evolution of both fossil and extant lungfish.”

      In the following, I comment on a few aspects of the manuscripts. These represent instances where I had additional thoughts or ideas on how to slightly improve various aspects of the manuscript.

      1) Presentation of PCA results

      The authors provide several PCA analyses (preliminary analyses on partial matrices, BPCA, InDaPCA), and are very explicit about the procedures in general. For instance, I appreciate they explicitely state using correlation matrices for PCA analyses due to the usage of different measurement units among their data.

      Visually, the BPCA and InDaPCA are presented in figures 2 and 3, whereas the preliminary partial matrix PCAs are only reported as supplementary figures. While I don't object to any of this, I find the sequence of information given in the results section suboptimal.

      The figures have now been substantially reorganised to include more within the main body text and not as Supplementary Information, and we hope that this improves the sequence of information within the manuscript.

      The authors start by discussing the partial matrix analyses, although none of these analyses are visually/graphically depicted in the main text figures, and although their results do not seem to be of real importance for the narrative of the discussion. The other two PCA analyses actually are presented afterwards and separately, but they convey some common signals, particularly that the major source of variation seems to be a decreasing olfactory angle with increasing olfactory length, and a scaling relationship between all linear measurements (which all have the same eigenvector signs on the first PC axis). I wonder if an alternative way of presenting the PCA results would be better for this particular MS. For example, the authors could give "first level observations" first ("PCA analyses agree in X,Y,Y"), and then move to second order observations ("Morphospace of BPCA has some interesting taxon distribution with regard to chirodipterids"; "InDaPCA axis projections continuously retrieve clustering of specific variables"). I suspect this would shorten the text somewhat and could serve as a clearer articulation of the take home messages?

      Accordingly with Reviewer #2, we have now provided “first level” observations based on the standard PCA. We added some further comments on the species distribution in the morphospaces.

      2) Selection of PC axes for interpretation

      You describe how you use the broken-stick method to decide how many PC axes are retained for the interpretation of results, which I agree is a good procedure. However, I have a few questions regarding this. First, in line 331 (description of InDaPCA) you state that the first three axes are non-trivial "based on the screeplot" - which got me confused because it sounds a bit like eyeballing off the screeplot. Have you used the broken stick method for all your PCA analyses?

      Originally, we used both screeplot and broken-stick method, however, we are now solely using the broken stick method to determine the number of non-trivial axes. We agree with Reviewer #2 that this method is more rigorous than the scree plot. Our choice is greatly inspired by the studies of Jackson (1993, Ecology) and PeresNeto et al. (2005, Computational Statistics & Data Analysis). We have now edited the text so that our methods are clearer (and removed the text relating to the screeplot such as “based on the screeplot…”).

      The second question relates to the results of the broken stick method, which I did not find reported. Unless I am mistaken, for the xth axis, the method sums the fractions of 1/i (whereby i = x..n; n = number of axes), and divides this number by n to get a value of expected variation per axis. This number is then compared with the actual value of variance explained by the axis. So for the 1st of 17 axes, the broken-stick expectation is = (1 + 1/2 + .. + 1/17) / 17. If you apply this to your BPCA, the third axis' value (i.e., (1/3 + ... + 1/17)/17) is 0.114, which is smaller than the reported 0.120 that PC3 explains. Thus, following the broken stick method, PC3 does explain more variation that expected (and should thus be retained, contra your comment in line 311 which refers to two non-trivial axes)?

      We thank Reviewer #2 for the insightful evaluation of our paper who took the time to validate each step of our analyses. Effectively, we agree with Reviewer #2 that based on the broken stick method the third axis in nontrivial. The value for the third axis is 1,0531310. Thus, we are presenting these results as well as discussing the three PCA projections (axis 1 versus axis 2, axis 2 versus axis 3, axis 1 versus axis 3).

      Related to this potential issue is the presentation of the BPCA results in Fig. 2: You present loadings of three PC axes, although only the first two are considered in morphospace bi-plots and although the text also mentions only two non-trival axes. If the third axis is indeed non-trivial, then the loading-presentation could be retained in the figure, but then the authors should consider showing a PC1 vs. PC3 plot in addition to the currently presented biplot showing the first and second axis only. If the third axis indeed is trivial, as currently suggested by the text, then showing the loadings is unnecessary.

      We consider showing a biplot of PC1 vs PC3 unnecessary as those shown (PC1 vs PC2) already account for 83.4% of the variation captured. We have edited these figures so that the loadings related to PC3 have also now been omitted.

      It would be great if you clarify the usage/application of the broken stick method for all your PCAs. An easy way to report the results may be the add a row to each of your PCA loading tables in the supplements, in which you divide the actual value of variation explained by the value expected under the broken stick method - this way, all axes which explain more variation than expected by the stick method have values larger than 1, and axes which explain less have values lower than 1.

      We have taken this suggestion from Reviewer #2 on board and have now recalculated all values for the brokenstick method for each analysis; we also provide broken-stick values in their respective loading tables in the SI.

      3) Missing commentary on allometry

      In basically all PCA analyses, the first PC axis seems to be dominated by allometric size effects, given that all linear measurements have the same eigenvalue signs. The authors do acknowledge this (lines 314-316; 335-336), but offer no further comment on size effects/allometry.

      We agree that normally the first axis represents variation related mainly to size changes and shape changes related to size (allometry). However, we are reluctant to assume that our first axis corresponds to evolutionary allometry. Among others, Klingenberg & Zimmermann (1992) and Klingenberg (1996) used standard PCA (or multi-group PCA) to disentangle evolutionary and ontogenetic allometry (as well as static allometry) mainly by analysing multiple specimens for each group (or species) in order to have a better repartition of the covariance. Since our sample is limited to 12 species, and that they are all represented by a single specimen (except for Dipterus), it would be difficult to clearly discriminate variation associated to allometry. Even in a case of ontogenetic allometry, a sample size of 12 would have been limited to unambiguously conclude any variation.

      For example, it would be interesting to see how the linear measurements scale with overall head size. Similarly, the authors note that the semicircular canal measurements covary strongly, as do the utricle and saccule height/length measurements (paragraph line 346). Basically, it seems that the semicircular canal measurements scale with one another: as one gets bigger, so gets the other. It is interesting that the utricle does not seem to follow the same scaling pattern as the saccule and semicircular canals, and it would be good to hear if the authors think that there is a functional implication for this. Increases in utricular/saccular/semicircular canal sizes are usually explained by increased sensitivity - so is an increased utricular size a compensatory development to decreased semicircular canal+saccule size to retain an overall level of sensitivity, or does it maybe related to a relative change of importance of the specific functions, e.g. increased importance of linear accelerations in the horizontal plane with simultaneous decrease of importance of angular and vertical accelerations?

      We thank Reviewer 2 for this suggestion about overall head size scaling - endocast measurements. Our original study design also included measurements of dermal skulls, but we omitted this from the final version as the material available was far too incomplete to be able to conduct meaningful analyses. It is a topic of future study that some of us (AC, RC) have already discussed as a potential future project to be investigated.<br /> With respect to the functional implications of the modular dissociation of the labyrinths, we have expanded the final paragraph of the “implications for sensory abilities” within the Discussion, and similarly added the sentence “However, we acknowledge that it is difficult to determine if increased relative utricular size results from greater reliance of sensitivity in the horizontal plane alone, or if it expands to compensate for e.g. relative stagnation of the sacculus + semicircular canals in some way. Further studies, such as investigation of neuronal densities in extant lungfish labyrinths, may potentially in part clarify this uncertainty in future.”

      4) Labyrinth size

      With the above mentioned utricular exception, labyrinth size measurements particularly on the semicircular canals seem to imply that there is a relative consistent scaling relationship between the canals. When one canal gets larger, so do the others, perhaps thereby retaining canal symmetry across different absolute labyrinth sizes. Labyrinth size in tetrapods is often interpreted in relation to body size/mass or head size (e.g. Melville Jones & Spells 1963, Proc. R. Soc. Lond. Biol. Sci.; Spoor & Zonneveldt 1998, Yearb. Phys. Anthr.; Spoor et al. 2002, Nature; Spoor et al. 2007, PNAS; Bronzati et al. 2021, Curr. Biol.), as deviations from the expected labyrinth size per head size indicate increased or decreased relative labyrinth sensitivities. Large relative head sizes of birds and (within) mammals have generally been interpreted as indicative of "active" or "agile" behaviour, although doubt has been casted on these relationships recently (e.g., Bronzati et al. 2021). Increased sampling of relative labyrinth size from various vertebrate groups would be important to better understand labyrinth sizefunction relationships. Melville Jones & Spells (1963) have shown that fishes have large labyrinth sizes compared to most tetrapods, but they don't have lungfish data and the large labyrinth sizes of fishes have often remained uncommented on in tetrapod works. I think this study offers a fantastic opportunity to provide comparative labyrinth size data for lungfishes. In this regard, it would be really interesting to quantify labyrinth size relative to head size, and show a respective (phylogenetic) regression analysis. Ideally, the size of the labyrinth could be quantified along the arc lengths of the semicircular canals, but other ways are also thinkable (for example a box volume of labyrinth size by the existing measurements, contrasted with a box volume of the skull, i.e. heightwidthlength).

      Firstly, many thanks for the suggested reading of Bronzati et al. (2021) And while we consider a labyrinth skull size regression analysis to be a worthwhile suggestion, we have chosen not to include one in this study, partly as there is no phylogenetic regression based on the new methods that we are using, and secondly that it forms the basis of another study currently underway by some of the authors.

    1. Reviewer #1 (Public Review):

      In this study, the authors aimed to address the important question of the mechanism of deep brain stimulation (DBS) in treating Parkinson's disease, based on a mouse model that the authors established previously.

      The strength of the study lies on 1) avoiding the interference of stimulation artefacts of using electrophysiological recording technique, and 2) examining effects on cell-type or projection-specific targets.

      However, there are several critical problems in this study. First, the low temporal resolution and the averaged population signal (rather than from individual neurons) of the fibre photometry data prevents in-depth enough analysis of the effects of DBS on the target areas to draw useful conclusion. Thus, all interpretations were based on an average rise in GCaMP-reported calcium signals with pretty low temporal resolution. As a result, important readouts that were analysed in many previous studies such as the firing patterns (e.g. rhythmic) or synchrony among neurons were missed by this approach. Take one example. The conclusion that antidromic activation is excluded as a possible mechanism is based partly on the lack of good correlation of the averaged calcium signal with the behavioral improvement. However, such a lack of correlation is also evident in the averaged calcium signal and the improvement in movement behavior under 60Hz and 100 Hz stimulation (Figure 2). While a higher average in calcium signal is observed under 60Hz DBS than 100Hz, the improvement in motor behavior is lower than that induced by 100 Hz DBS. This highlights the severe limitation of the fibre photometry data in revealing the therapeutic mechanism of DBS.

      Second, there is no clear elucidation of the pathological changes revealed by the fibre photometry in PD mice to illustrate what is normal and what is abnormal, and how the DBS rectifies the abnormal changes. For example, when we need to interpret the effect of the DBS on calcium activities in the subthalamic nucleus (STN), the substantia nigra pars reticulala (SNr) and the primary motor cortex (M1), what abnormal GCaMP signal did the authors find, compared with healthy control mice? Without such information, it is difficult to get a sense of what an increase in GCaMP signal in STN, SNr and M1 mean with respect to motor control, and therefore what it means with respect to the effect of DBS. With the specific context of a peak (actually a biphasic waveform) of the calcium activity in the PD anima, it is puzzling that a surge of STN is correlated with movement onset, while in principle it should result in movement termination. Therefore, it is critical to know if there is there such a correlation in healthy animals. If yes, this may not indicate a pathological change that needs to rectified by DBS. If no, how the pathological appearance of such change leads to parkinsonian motor symptoms (akinesia, bradykinesia etc) must be established.

      Third, it is well-known that clinical DBS employed at least 120 Hz stimulation. In fact, the authors had also demonstrated in their previous report that the optimal stimulation frequency in the mouse model is around 180Hz. But the present study utilised clearly suboptimal frequencies (60 and 100Hz only) to address the mechanism. It is possible that different mechanisms or combinations of mechanism may take place under different stimulation frequencies. As such, any conclusion drawn from this study may not represent the whole picture.

      Given the above consideration, I do not think that the authors have achieved the aim of their study, as the results cannot convincingly support their conclusions.

    1. Reviewer #3 (Public Review):

      Zadbood and colleagues investigated the way key information used to update interpretations of events alter patterns of activity in the brain. This was cleverly done by the use of "The Sixth Sense," a film featuring a famous "twist ending," which fundamentally alters the way the events in the film are understood. Participants were assigned to three groups: (1) a Spoiled group, in which the twist was revealed at the outset, (2) a Twist group, who experienced the film as normal, and (3) a No-Twist group, in which the twist was removed. Participants were scanned while watching the movie and while performing cued recall of specific scenes. Verbal recall was scored based on recall success, and evidence for descriptive bias toward two ways of understanding the events (specifically, whether a particular character was or was not a ghost). Importantly, this allowed the authors to show that the Twist group updated their interpretation. The authors focused on regions of the Default Mode Network (DMN) based on prior studies showing responsiveness to naturalistic memory paradigms in these areas and analyzed the fMRI data using intersubject pattern similarity analysis. Regions of the DMN carried patterns indicative of story interpretation. That is, encoding similarity was greater between the Twist and No-Twist groups than in the Spoiled group, and retrieval similarity was greater between the Twist and Spoiled groups than in the No-Twist group. The Spoiled group also showed greater pattern similarity with the Twist group's recall than the No-Twist group's recall. The authors also report a weaker effect of greater pattern similarity between the Spoiled group's encoding and the Twist group's recall than between the Twist group's own encoding and recall. Together, the data all converge on the point that one's interpretation of an event is an important determinant of the way it is represented in the brain.

      This is a really nice experiment, with straightforward predictions and analyses that support the claims being made. The results build directly on a prior study by this research group showing how interpretational differences in a narrative drive distinct neural representations (Yeshurun et al., 2017), but extend an understanding of how these interpretational differences might work retrospectively. I do not have any serious concerns or problems with the manuscript, the data, or the analyses. However I have a few points to raise that, if addressed, would make for a stronger paper in my opinion.

      1) My most substantive comment is that I did not find the interpretive framework to be very clear with respect to the brain regions involved. The basic effects the authors report strongly support their claims, but the particular contributions to the field might be stronger if the interpretations could be made more strongly or more specifically. In other words: the DMN is involved in updating interpretations, but how should we now think about the role of the DMN and its constituent regions as a result of this study? There are a number of ideas briefly presented about what the DMN might be doing, but it just did not feel very coherent at times. I will break this down into a few more specific points:

      While many of us would agree that the DMN is likely to be involved in the phenomena at hand, I did not find that the paper communicated the logic for singularly focusing on this subset of regions very compellingly. The authors note a few studies whose main results are found in DMN regions, but I think that this could stand to be unpacked in a more theoretically interesting way in the Introduction.

      Relatedly, I found the summary/description of regional effects in the Discussion to be a bit unsatisfying. The various pattern similarity comparisons yielded results that were actually quite nonoverlapping among DMN regions, which was not really unpacked. To be clear, it is not a 'problem' that the regional effects varied from comparison to comparison, but I do think that a more theoretical exploration of what this could mean would strengthen the paper. To the authors' credit, they describe mPFC effects through the lens of schemas, but this stands in contrast to many other regions which do not receive much consideration.

      Finally, although there is evidence that regions of the DMN act in a coordinated way under some circumstances, there is also ample evidence for distinct regional contributions to cognitive processes, memory being just one of them (e.g., Cooper & Ritchey, 2020; Robin & Moscovitch, 2017; Ranganath & Ritchey, 2012). The authors themselves introduce the idea of temporal receptive windows in a cortical hierarchy, and while DMN regions do appear to show slower temporal drift than sensory areas, those studies show regional differences in pattern stability across time even within DMN regions. Simply put, it is worth considering whether it is ideal to treat the DMN as a singular unit.

      2) I think that some direct comparison to regions outside the DMN would speak to whether the DMN is truly unique in carrying the key representations being discussed here. I was reluctant to suggest this because I think that the authors are justified in expecting that DMN regions would show the effects in question. However, there really is no "null" comparison here wherein a set of regions not expected to show these effects (e.g., a somatosensory network, or the frontoparietal network) in fact do not show them. There are not really controls or key differences being hypothesized across different conditions or regions. Rather, we have a set of regions that may or may not show pattern similarity differences to varying degrees, which feels very exploratory. The inclusion of some principled control comparisons, etc. would bolster these findings. The authors do include a whole-brain analysis in Supplementary Figure 1, which indeed produced many DMN regions. However, notably, regions outside the DMN such as the primary visual cortex and mid-cingulate cortex appear to show significant effects (which, based on the color bar, might actually be stronger than effects seen in the DMN). Given the specificity of the language in the paper in terms of the DMN, I think that some direct regional or network-level comparison is needed.

      3) If I understand correctly, the main analyses of the fMRI data were limited to across-group comparisons of "critical scenes" that were maximally affected by the twist at the end of the movie. In other words, the analyses focused on the scenes whose interpretation hinged on the "doctor" versus "ghost" interpretation. I would be interested in seeing a comparison of "critical" scenes directly against scenes where the interpretation did not change with the twist. This "critical" versus "non-critical" contrast would be a strong confirmatory analysis that could further bolster the authors' claims, but on the other hand, it would be interesting to know whether the overall story interpretation led to any differences in neural patterns assigned to scenes that would not be expected to depend on differences in interpretation. (As a final note, such a comparison might provide additional analytical leverage for exploring the effect described in Figure 3B, which did not survive correction for multiple comparisons.)

      4) I appreciate the code being made available and that the neuroimaging data will be made available soon. I would also appreciate it if the authors made the movie stimulus and behavioral data available. The movie stimulus itself is of interest because it was edited down, and it would be nice for readers to be able to see which scenes were included.

      To sum up, I think that this is a great experiment with a lot of strengths. The design is fairly clean (especially for a movie stimulus), the analyses are well reasoned, and the data are clear. The only weaknesses I would suggest addressing are with regards to how the DMN is being described and evaluated, and the communication of how this work informs the field on a theoretical level.

    1. we need to treat one another with respect despite our differences like this is like an aspiration for people 00:41:01 right except for they thought it was in the bottom quarter of stuff for everybody else so what happens if i'm like i i would like to get back to treating other people's respect but i don't think 00:41:12 they care about that for me back to that ambiguous interactions that we have all the time i'm gonna read disrespect into most everything i see right and so i i think it's really critical like 00:41:25 like i talked about this as like congruence right this need for our private selves and our public selves to be as as closely aligned as possible we've known for a long time that that's that's a critical part of fulfillment 00:41:37 and self-actualization i mean how how do you get there you're the expert on that like how do you how do you get there if you have a divided self like my private self is different than my public self like so we know that at an individual 00:41:48 level but given the the fact of collective illusions i believe this idea of congruence may be the most important thing you can do for other people right because it doesn't help anyone when we misread each 00:42:00 other so profoundly

      Congruence is the antidote to collective illusion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their very helpful comments. We feel that the comments pointed to a few main issues that we could remedy. First, we found that many comments and concerns could be addressed with work from our previous paper (doi.org/10.1101/2020.11.24.396002). To fix this, we added additional descriptions of experiments done previously and additional citations. We discussed more in depth an experiment that shows that ciliary membrane and membrane proteins can indeed come from the cell body plasma membrane, we talked more about how we determined that the actin puncta are representative of membrane remodeling functions like endocytosis, and we discussed some of the mechanistic insights provided by our previous work that are applicable here. We hope that this helps to answer several of the reviewer questions. Second, there were a few experiments we thought would be useful to add. These are represented in bold in our responses below. Briefly, we added a measure of internalization or endocytosis in the drp3 mutant, we added some images of cilia to the phalloidin figure to orient readers’ views of the cell, we added some additional mechanistic insight (supplemental figure 3), and we added an axoneme stain to confirm that the axoneme was extending (supplemental figure 4). Finally, we fixed some of our wording in the paper to represent our findings more accurately. Together, we hope that these revisions will address the reviewer concerns.

      Additionally, we added some data that we collected while waiting on reviews. We investigated the requirement for myosin in this pathway and include this data in the supplement.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The current manuscript by Bigge et al. demonstrated that the chemical inhibition of GSk3 causes ciliary elongation in Chlamydomonas reinhardtii. They show that lithium induced ciliary lengthening is majorly due to GSK3 inhibition. Consistent with earlier reports, they show that new protein synthesis is not required for lithium induced ciliary elongation. The authors report that targeting endocytosis either by using chemical inhibitors (dynasore and CK-666) or genetic mutants (dpr3 and Arpc4) does not cause lithium induced ciliary elongation. They further reveal enhanced actin dynamics in lithium treated cells and such activity is lost in Arpc4 mutants. Based on these results, the authors concluded that endocytic pathways may be involved in lithium induced ciliary lengthening. The results are interesting, and this work is important in understanding more about ciliary length regulation. However, more experimental evidence addressing the current interpretation that endocytic pathways may be involved in lithium induced ciliary lengthening is required.

      Major comments: 1 The authors use chemical inhibitors as major tools for their study. However, the specificity of these inhibitors is a concern. How specific are these GSK3 inhibitors such as LiCl? Can authors show that LiCl mediated ciliary lengthening is due to inhibition of GSK3? Authors used BFA and Dynasore to show that not the Golgi, but the endocytosis derived membrane is required for ciliary lengthening. Again, here the specificity of these inhibitors is a concern. Especially as Dynasore has been shown to have non-specific effects.

      We agree that the specificity of chemical inhibitors can be a concern. This is why we used 4 separate inhibitors of GSK3, each showing elongation of cilia and an increase in actin puncta (suggesting an increase in actin dynamics at the membrane). While these different inhibitors may have different off-target effects. Their intended target, GSK3, is the same, suggesting that the shared phenotype from each inhibitor is conserved. The ability of LiCl to affect GSK3 activity in Chlamydomonas was also investigated in depth with a kinase assay and a western blot in Wilson, 2004 (doi: 10.1128/EC.3.5.1307-1319.2004). To address the off-target effects of Dynasore, we employed the drp3 mutant to confirm genetically what we saw from the chemical inhibition. We also show in our previous paper that Dynasore and PitStop2 have similar effects in Chlamydomonas, both of them inhibiting the internalization of a dye-labelled membrane, suggesting that they both function to block endocytosis (doi.org/10.1101/2020.11.24.396002). While no mutant or alternative inhibitor is available to look at the effects of BFA, this inhibitor and its effects on cilia have been well-characterized in Dentler, 2013 (doi.org/10.1371/journal.pone.0053366).

      Does inducing/enhancing endocytosis independent of GSK3 by other means has any effect on ciliary length regulation?

      Our concern with the proposed experiment is that even if elongation requires endocytosis, all endocytosis might not lead to ciliary elongation when endocytosis is for other purposes. For example, endocytosis could occur for other purposes, like nutrient uptake, that will have no effect on cilia. The plasma membrane to cilium pathway may be a targeted pathway triggered by specific disruptions. Therefore, we don’t feel that the proposed experiments will add to our model.

      The major claim of this paper is that LiCl mediated ciliary lengthening is due to enhanced endocytosis. Although authors showed that inhibition of endocytosis results in reduced ciliary length, it is important to show if GSK3 inhibition by LiCl (or any other inhibitor) causes any increased cellular endocytosis? Similarly, what is the effect of GSK3 mutants on endocytosis?

      *We show an increase in actin dynamics at the membrane and actin puncta following treatment with LiCl and the other GSK3 inhibitors. We show here and in our previous paper (doi.org/10.1101/2020.11.24.396002), that these puncta are likely endocytic based on the timing of their appearance and the proteins required for puncta formation (including the Arp2/3 complex and Clathrin) (Figure 7, previous paper). We updated our latest version to reflect the data we have already collected and presented as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). Thus, we stained cells with phalloidin to visualize filamentous actin and these endocytosis-like punctate structures when cells are treated with GSK3 inhibitors.”

      A phenotypic mutant of GSK3 does not currently exist in Chlamydomonas, and methods of reliably introducing mutants in Chlamydomonas do not currently exist. Thus, we used the array of GSK3 inhibitors.

      Are these endocytic processes enhanced specifically at/or around the cilium during the ciliary lengthening process?

      *Based on our phalloidin staining data, these processes are primarily enhanced near the cilium, but puncta also exist throughout the cell. To more clearly show this and in response to a comment from reviewer 2, we added a set of images with brightfield to demonstrate where the dots are in relation to cilia. We also added arrows to the images in the figure to point out the apex of the cell as determined by the filamentous actin structures in the cells. *

      Authors claim that drp3 is a target of GSK3 and, similar to the canonical dynamin, functions in endocytosis. While, it is an important observation, experiments are required to show the role of drp3 in endocytosis and also to show that it is indeed a target of GSK3.

      To address this comment, we are employing an experiment that was designed in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 5B-E). This experiment uses a lipophilic membrane dye, FM4-46FX. The dye binds to the membrane but is unable to enter the cell alone. It is quickly endocytosed and results in vesicular-like structures within the cell. We added a panel to Figure 3 where we do this experiment in wild-type and ____drp3 mutant cells. This shows that endocytosis is affected by the mutation in DRP3. The discussion of this new data is summarized in the text as follows:

      “Additionally, we showed that this DRP is required for internalization of a lipophilic membrane dye, FM4-46FX through endocytosis. This dye binds to the membrane but is unable to enter the cells on its own and must be endocytosed. In wild-type cells it is quickly endocytosed and visible as puncta within the cell (Figure 3F, H) (Bigge et al. 2020). However, in drp3 mutants the amount of dye endocytosed is significantly lower (Figure 3G-H), suggesting that DRP3 is required for optimal endocytosis in these cells.”

      Mechanistic insights into how endocytosis/actin dynamics regulate ciliary lengthening would be interesting to see. Further, it is interesting to see if the ciliary signaling defects caused by abnormal ciliary length can be rescued by inhibition of endocytosis.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we dive into the mechanisms tying together actin dynamics, endocytosis, and cilia. We find that Arp2/3 complex-nucleated actin networks are required for endocytosis to reclaim ciliary membrane and membrane proteins from a pool in the plasma membrane for the rapid early stages of ciliary assembly. We believe that this is a similar mechanism to what is occurring when cells elongate following lithium treatment. This is because there are several parallels in phenotypes: *

      -The Arp2/3 complex is required for both ciliary assembly (Figure 1, previous paper) and ciliary elongation resulting from lithium treatment. In the case of ciliary assembly, treating with cycloheximide to block the synthesis of new protein fully eliminates regrowth in the absence of the Arp2/3 complex, suggesting this Arp2/3 complex dependent mechanism in early ciliary assembly does not involve new protein synthesis (Figure 2, previous paper). Similarly, the process of ciliary elongation in response to lithium does not require new protein synthesis.

      *-A burst in actin dynamics/actin puncta occurs immediately following deciliation during early regrowth and during growth initiated by lithium treatment. We know these puncta are Arp2/3 complex and clathrin dependent (Figures 4 and 7, previous paper). *

      *-Both initial ciliary assembly or ciliary maintenance and elongation of cilia due to lithium treatment require endocytosis (Figures 5, 7-8, previous paper) but not require Golgi-derived membrane (Figure 3, previous paper). *

      *-Also in the previous paper, we find that this mechanism is required for the internalization and relocalization of a ciliary membrane protein for mating (Figure 6, previous paper). We also find that ciliary membrane proteins move from the plasma membrane to the cilia during ciliary assembly (Figure 7-8, previous paper). *

      *This is summarized in the text as follows: *

      *In the introduction we added: *

      “Previous data from our lab suggest that the Arp2/3 complex and actin are involved in reclaiming material from the cell body plasma membrane that is required for normal ciliary assembly (Bigge et al. 2020). We show that the Arp2/3 complex is required for the normal assembly of cilia and for endocytosis of both plasma membrane and plasma membrane proteins in various contexts. Further, we find that deciliation triggers Arp2/3 complex-dependent endocytosis by observing an increase in actin puncta immediately following deciliation (Bigge et al. 2020).”

      And in the discussion we added:

      “Previous work has shown that while the Golgi is required for ciliary maintenance and assembly (Dentler 2013), it is not the only source of membrane. Instead, we found that membrane reclaimed through actin and Arp2/3-complex dependent endocytosis is required for ciliary assembly or growth from zero length (Bigge et al. 2020). More specifically, we found that the Arp2/3 complex is required for normal ciliary maintenance and ciliary assembly, especially in the early stages when membrane and protein are needed quickly. The Arp2/3 complex is also required for the internalization of membrane and a specific ciliary membrane protein required for mating. Further, we show that endocytosis-like actin puncta form immediately following deciliation in an Arp2/3 complex and clathrin-dependent manner, and that membrane from the cell body plasma membrane can be reclaimed and incorporated into cilia (Bigge et al. 2020). This led us to question whether that same mechanism might be required for ciliary elongation from steady state length induced by lithium treatment.”

      Minor comments: 1. The paper needs a thorough proof reading as it harbors many spelling mistakes, grammatical errors, and poor sentence formation in multiple instances.

      *The paper was thoroughly read, and spelling mistakes and grammar were fixed. *

      Supplemental Figure S2A and S2B should be quoted separately from S2C and S2D.

      *This was updated in the latest version of the paper. *

      In Page 6 paragraph 2 - "authors wrote "To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRPs (Supplemental Figure 2)." No data is shown in S2 with regard to this. Either data needs to be shown or change the text in a way to avoid confusion.

      *The text was changed in a way to avoid confusion. *

      It would be nice to see if GSK3 can actually phosphorylate DRP3.

      *This would be interesting, however there is not currently a simple way to test this. There is not an antibody for DRP3 that shares enough of its immunogen sequence with the Chlamydomonas DRP3 sequence to use for a western blot. *

      The authors observe that arpc4 mutants do not form actin puncta upon LiCl treatment. Could this phenotype be rescued by complementing with WT ARPC4.

      *We showed in our previous paper (doi.org/10.1101/2020.11.24.396002) that the actin puncta could be rescued by re-expression of wild-type ARPC4 (Figure 4). *

      The concentration of inhibitors is described differently in the text and figure legends (for example Fig. 4A)

      *In the figure legend of figure 4, the concentration of 6-BIO was accidentally reported as 100 µM instead of the correct value (100 nM) as it was throughout the rest of the paper. This was addressed in the latest version. *

      The p values are not significant in some of the figures. (Fig. 4D &Fig. 5C)

      P values were provided for all comparisons in an effort to be transparent and so that readers could draw their own conclusions about the data.

      Reviewer #1 (Significance (Required)):

      The current manuscript by Bigge et al. demonstrates that endocytosis is required for GSK3 inhibition mediated ciliary lengthening. Maintenance of proper length of cilia is crucial and its dysregulation results in pathogenesis. This work takes the field forward and helps in our understanding of how ciliary length is regulated. This work is of interest to researchers working in the field of ciliary biology as well as to those working on endocytosis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors show in this study that Lithium and other GSK3-beta inhibitors induce cilia elongation in Chlamydomonas. They further demonstrate that inhibition of endocytosis by Dynasore prevents the induced elongation of cilia. They speculate that a Dynamin-related protein might be involved in this process, and determine 9 Dynamin related proteins (DRPs) in Chlamydomonas of which DRP3 shows the highest sequence similarity. Lithium-induced ciliary elongation is prevented in DRP3 mutants supporting the author's hypothesis and indicating that DRP3 might be a GSK3-beta target, similar to some animal Dynamins. Since Dynamins interact with the F-actin regulator ARP3/3-complex, and because F-actin reorganization is observed in cells after GSK3-beta inhibition, they test the induction of ciliary elongation in arpc4 mutants and after blocking the ARP-complex by CK-666. Indeed, F-actin remodeling and cilia elongation were prevented after loss of ARP-complex function. The induction of ciliary elongation and F-actin remodeling also correlates with the emergence of strong F-actin punctae in cells, and the authors interpret that as induction of Dynamin-dependent endocytosis (also addressed in a current preprint from the group). From that, the conclude that endocytosis is required for delivering membrane to the growing cilium and that this is required for the observed effects. While this claim is somewhat supported by a lack of cilia elongation inhibition after treatment to prevent protein synthesis or Golgi function, direct evidence for membrane delivery to the cilium, the need for membrane delivery for ciliary elongation, and presence of bona fide endocytotic vesicles is sadly missing. Therefore, this study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Main points: 1. The authors need to demonstrate that new membrane is delivered in the process to the growing cilium. E.g. this could be done by membrane stains (pulse) and static or live-cell imaging analysis in untreated, GSK3-beta inhibitor treated and in mutants.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we do an experiment similar to the one described here (Figure 8, previous paper). We biotinylated all surface proteins, then removed the cilia (and therefore all labelled ciliary surface proteins) and allowed them to regrow. We then isolated the new cilia and probed for biotinylated proteins because any biotinylated proteins must have come from the surface of the cell. We found that the cilia did contain membrane proteins from the surface of the cell. This experiment shows that membrane and membrane proteins derived from the plasma membrane are entering growing cilia during regeneration. We added a description of this experiment to the text as follows: *

      “Conversely, when treated with Dynasore to inhibit endocytosis, cilia could not elongate to the same degree as untreated cells (Figure 3A-B), implying endocytosis is required for lithium-induced elongation and that endocytosis requires dynamin. This is consistent with results from our previous studies which show that ciliary membrane and membrane proteins are delivered from the cell body plasma membrane to the cilia. In an experiment first performed in Dentler 2013 and then later in Bigge et al. 2020, we biotinylated all cell surface proteins. Then, deciliated cells and allowed cilia to regrow. We then isolated cilia and probed for biotinylated proteins. Any biotinylated proteins present must have come from the cell body plasma membrane, and we found that indeed biotinylated proteins exist in the newly grown cilia, suggesting that ciliary membrane and membrane proteins can be recruited from the cell body plasma membrane (Dentler 2013; Bigge et al. 2020).”

      However, this experiment cannot be done in the case of lithium because cilia are not removed meaning they already will contain labelled surface proteins. Additionally, cells do not regrow cilia in the presence of lithium, meaning that we cannot add a regeneration. Regardless, work from our previous paper described above does establish that ciliary membrane and membrane proteins are able to come from the cell body plasma membrane as the reviewer requested.

      Along the same line, the authors need to demonstrate that the punctae are truly endocytotic vesicles. For that uptake assays/stains could be used and additional markers. Furthermore, there are multiple modes of endocytosis (e.g. Clathrin) besides Dynamin. The authors should determine if blocking other modes of endocytosis has similar or divergent effects on cilia elongation.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002) we supplement the actin puncta data with membrane labelling to show that the puncta are likely endocytic pits (doi.org/10.1101/2020.11.24.396002, Figure 5). We also show that the puncta require both the Arp2/3 complex and active clathrin to form, further suggesting that they are endocytic (Figure 7, previous paper). We added this to the paper as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      Additionally, Dynamin is required for most forms of endocytosis, including clathrin mediated endocytosis. In the previous paper (doi.org/10.1101/2020.11.24.396002), which we cite here, we do a deep dive into which endocytic proteins are present in Chlamydomonas. We found that clathrin mediated endocytosis is the most highly conserved on the endocytic processes we looked at (Figure 5, previous paper).

      We did add a new figure to this paper (Figure 4) using a dye that labels membrane in lithium treated cells. This dye binds to the plasma membrane but is unable to enter cells by itself and must be endocytosed. We found that during the first 30 minutes of lithium treatment there is increased membrane dye internalization.

      No cilia are actually shown in the study. I personally, would like to see how these cilia look like, especially in relation to the sites of F-actin remodeling and punctae formation. What comes first? Please also provide a axoneme staining to confirm elongation of the ciliary core and what happens to the tubulin pool when cilia cannot elongate any more? Is it accumulating at the ciliary base?

      We added a panel demonstrating where the puncta are in relation to cilia in Figure 4 with a brightfield overlay.* We also look at the appearance and timing of these puncta more in depth in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 7). We find that puncta form immediately following deciliation and start to return to normal following about 10 minutes of regrowth. We think that this mechanism of ciliary elongation in lithium is similar to what occurs during those early steps of ciliary assembly suggesting that the dots likely form very early on. *

      We also included axoneme staining in Supplemental figure 4*. We show that the axoneme does continue to elongate with the cilia. After about 90 minutes, the cilia actually stop growing and detach from the cells (doi: 10.1128/EC.3.5.1307-1319.2004, doi: doi.org/10.1247/csf.12.369). However, we are interested in the more acute mechanisms that result in ciliary elongation. *

      The authors also claim that the method of GSK3 inhibition is not important. It would be more correct to say that the mode/drug of GSK3 inhibition is not important, but discuss how some of the minor variance between treatments could be explained (incl. the timeline and temporal dynamics of the diverging effects; and the dose-dependency as low concentrations of BIO seem to induce shortening but high doses induce elongation of cilia).

      *We further discussed this in the text as follows: *

      “The minor variances between the drugs could be explained by the timeline in which we tested cilia (90 minutes) or the exact dosages we used. An example of this is 6-BIO where treatment with a low dose of 100 nM caused ciliary lengthening, but treatment with a higher concentration of 2 µM reportedly caused ciliary shortening (Kong et al. 2015). Together, the data suggest that the mode of inhibition by chemical targets of GSK3 is not important for ciliary lengthening. Whether GSK3 was inhibited via competition for ATP binding or phosphorylation, cilia were able to elongate.”

      They propose here a positive effect of F-actin build up in cilia length regulation, while most studies to date report ciliary shortening to correlate with increased F-actin at the ciliary base. I believe that this is not highlighted and discussed enough, which I find reduces the overall quality of the paper (but is easy to improve). It might be also interesting to test if other F-actin inducers/stabiliziers have the same effect?

      *This is addressed in the discussion in the latest version in depth as follows: *

      “One important detail to point out is that Chlamydomonas differ from mammalian cells in that they have a cell wall. The stability awarded by the cell wall means that Chlamydomonas does not require a cortical actin network as mammalian cells do. Thus, in Chlamydomonas, we are able to investigate actin dynamics and functions without the interference of the cortical actin network. This also means that some of the effects we see might be masked in mammalian cells by the presence of the cortical actin network and the effect that it has on ciliary assembly and maintenance.”

      *We also added a section to the introduction to address this concern early on so that readers will have this difference in mind as they read the paper: *

      “Additionally, unlike mammalian cells, Chlamydomonas lacks a cortical actin network which simplifies the relationship between cilia and actin and makes this an ideal model to study such interactions.”

      Also, F-actin inducers/stabilizers do not typically have the same effect because the filamentous actin needed for these processes must be dynamic, or able to undergo rapid depolymerization and repolymerization as needed during this fairly quick timeframe. This is demonstrated in Avasthi, 2014 (*doi.org/10.1016/j.cub.2014.07.038). Cells were treated with several actin targeting inhibitors including LatB which results in depolymerization of filaments and Jasplakinolide which results in stabilization of filaments. In both cases, ciliary regeneration is impaired suggesting that actin must be dynamic for its functions related to cilia. *

      Minor points: 1. In many Figures, the x-axis is labeled "Number of values", but I think that maybe number of observations might be more appropriate.

      We discussed this point and decided to change the axis titles to “Number of cilia”.

      The author often use the word "normally" elongating, but in all cases the elongation is induced = abnormal situation. Maybe the authors could use a different term.

      We originally used “normally” because there are times when we get defective elongation but not no elongation. In the latest version we changed this to “elongation consistent with untreated wild-type cells” or something along those lines.

      It is puzzling as to why DRP3 was chosen, while DRP2 actually is most similar in terms of domain composition. Maybe they could discuss that. They also could explain a bit better how the mutants were generated in which a "cassette was inserted early in the gene". What kind of disruption is expected?

      DRP3 was chosen because it has the highest sequence identity (and similarity). DRP2 while containing all domains, has low overall sequence conservation. DRP3 is also the only DRP that showed a potential GSK3 target site when investigated with ScanSite4.0. This was all made clearer in the text as follows:

      “Chlamydomonas contains 9 DRPs with similarity to a canonical dynamin (DRP1-9). Despite lacking 2 of the canonical dynamin domains, the DRP with the highest sequence similarity and identity to canonical dynamin is DRP3 (Supplemental Figure 2C-D). To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRP3.”

      The representative images in Figure 4A do not really seem to match the quantifications.

      *The quantitative data suggest that these different treatments have increased dots, which we believe the representative images do show. LiCl and CHIR99021 have the most dots, while 6-BIO and Tideglusib have more dots, but less than LiCl and CHIR99021. *

      line 109: "of-targets" should be off-targets

      Fixed in the latest version, thanks for pointing this out.

      line 141: "delivery form the Golgi" should be FROM the Golgi

      Fixed in the latest version, thanks for pointing this out.

      line 160: "was DRPs" should be was DRP3

      Fixed in the latest version, thanks for pointing this out.

      line 204/205: the sentence starting "Thus, we phalloidin..." should be rephrased. It sounds not quite correct

      Fixed in the latest version, thanks for pointing this out.

      line 209: Figure 4A should refer to Figure 4B

      Fixed in the latest version, thanks for pointing this out.

      line 211: "times or rapid ciliary" should be of rapid ciliary...

      Fixed in the latest version, thanks for pointing this out.

      line 257: "in lithium." Should be in lithium treated cells Fixed in the latest version, thanks for pointing this out.

      Reviewer #2 (Significance (Required)):

      This study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Chlamydomonas maintains relatively regular length of cilia (flagella). However, when the cell is exposed to high concentration of lithium ions, it elongates cilia further. In this work, Bigge and Avasthi made experiments to build a potential hypothesis of molecular mechanism of this unusual cilia elongation. Their hypothesis is (1) cilia elongation is triggered, depending on supply of extra membrane (not proteins), (2) membrane is supplied from plasma membrane by clathrin-dependent endocytosis (not from Golgi), (3) this endocytosis contains Arp2/3 complex, (4) GSK3 downregulates Arp2/3 dependent endocytosis and (5) GSK3 is suppressed by lithium. They conducted well-organized experiments to prove each step. While some of them are indirect, their hypotheses were supported experimentally in outline.

      (1) is undoubted, since the authors demonstrated that inhibition of protein production by cycloheximide did not influence cilia elongation.

      (2) The authors clearly demonstrated that source of ciliary membrane for elongation is plasma membrane and not Golgi by examining specific inhibitors' effect. They also showed protein transfer from plasma membrane to cilia, by biotinylaing surface proteins in the cell, deciliating and growing cilia and detecting biotinylated proteins in cilia. This part rather characterizes initial growth of cilia, not elongation. Therefore this result must be properly described in the context of this work (which is elongation of cilia).

      This comment was particularly helpful as it also helps us address some of the comments from the other reviewers. We updated the description of this experiment in the context of this work in the latest version as follows:

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      For (3)-(4), they visualized Arp2/3 localization, showing highly condensed Arp2/3. They interpreted these particles as sign of clathrin endocytosis. Since so far such an endocytosis particle has not been reported in Chlamydomonas, the authors confirmed that DRPs are target of GSK3 to indirectly show GSK3 influences formation of endocytosis. This reviewer thinks the author should be able to directly confirm endocytosis for example by electron microscopy (of traditional epon-embedded and stained cells).

      We visualized Arp2/3 complex-dependent filamentous actin localization. We provide DRP3 as a potential target of GSK3, but do not report that it is the target that results in increased endocytosis or increased ciliary length. We agree that electron microscopy would be ideal to visualize endocytosis in these cells. However, we feel this is outside the scope of this current work. But, we do have plans to look at endocytosis in Chlamydomonas *using electron microscopy in the future and hope that the increased context from the previous data are sufficient at this time. *

      (5) was elegantly proved by multiple drugs (all known as inhibitor of GSK3), including lithium.

      After fixing these points, this manuscript will be ready for publication.

      Minor points: Line188-191: not clear. What are *** and ****?

      Fixed in the latest version, thanks for pointing this out.

      Line262-264: It would be helpful how the initial cilia growth of the arpc4 cell.

      We agree that this would be helpful information, and included more of a description of how ciliary growth is affected by loss of Arp2/3 complex function in the latest version: “Specifically, we found that the Arp2/3 complex is required for reclamation of membrane from a pool in the plasma membrane during the rapid growth that occurs during early ciliary assembly”.

      Line321: it should read as follows. Cang 2014; carlsson and Bayly 2014). While we...

      Fixed in the latest version, thanks for pointing this out.

      Line329: were -> where

      Fixed in the latest version, thanks for pointing this out.

      Line365-366: Lithium-treated cells are not motile. Any thought why? Maybe protein production is not necessary for apparent cilia elongation, but necessary for elongation of functional cilia.

      *This is an interesting idea. However, even when protein production is allowed to proceed, Lithium-treated cells are not motile. This is a ciliary dysfunction, and in fact, after about 90 minutes incubation with lithium, the cilia of these cells start to crash out or fall off, demonstrating that these are not healthy cells or healthy cilia. *

      Reviewer #3 (Significance (Required)):

      This work is an important step toward the understanding of cilia elongation and thus growth mechanism. It will attract wide audience who have interest in cell biology and motility. My expertise is about motile cilia and their 3D structure.

    1. Author Response

      Joint Public Review:

      Strengths: The study represents a step forward in relating immune responses to infection outcomes that of urgent interest to public health, especially the timing of shedding and frequency of supershedding events. Nguyen et al.'s model provides a useful framework for understanding the links between immune effectors and infection outcomes, and it can be expanded to encompass further biological complexity. The study system is a good choice, given the ubiquity of both helminth and bacterial infections, and experimental infections of rabbits provide a useful point of comparison for past work in mice.

      We appreciated these general comments.

      Limitations: The present study does not explicitly account for differences in helminth infection dynamics across the two species represented in the data nor does it include feedbacks between the bacterial and helminth infections. Nguyen et a. therefore show the limits of what can be learned from focusing on the bacterial and immune dynamics alone, and this study should serve to motivate further work that can build on this modeling approach to produce a more comprehensive view of the interactions among species infecting the same host. Future studies examining the impact of helminth infection intensity would be tremendously useful for assessing the potential of anthelminthics to reduce the prevalence of bacterial respiratory diseases. Finally, subsequent studies may need to look beyond the factors examined here to understand why shedding varies so much through time for individual hosts.

      We agree that focusing only on the bacterial infection is a limitation in this study. We followed a parsimonious approach and decided to concentrate on B. bronchiseptica shedding in the four types of infection. While we do have data on the dynamics of infection of the two helminth species, adding these data would have been an enormous amount of work and too much to present in a single paper. Yet, we have already investigated some of these bi-directional effects using the BT group (Thakar et al. 2012 Plos Comp. Biol.) and plan to keep working on these rich datasets in the future.

      We also agree that it is important to understand the rapid variation in Bordetella shedding observed, which appears to be a common feature in many other host-pathogen systems. This requires a completely new set of experiments on infection and shedding at the local tissue level.

      Specific comments

      Definition of supershedding: A major stated goal of the MS is to investigate the effect of coinfection by helminths on supershedding. In order to compare animals with different coinfections, it is therefore necessary to have a common definition of supershedding. At present, the authors use a definition that depends on which arm of the experiment the animals belong to. This complicates the analysis and clouds its interpretation.

      We value this comment and see the implication of using different datasets to quantify supershedding. To overcome this problem, we now propose a slightly different approach where we pull the four infections together and calculate a common 99th or 95th percentile threshold. This common threshold is then used to calculate the number of hosts with at least one supershedding event above this cut-off, for every type of infection. Therefore, while the threshold is the same the percentage of hosts with supershedding events varies among infection groups.

      Inconsistent approach: Within each experimental treatment, the data display variability on at least three levels: (i) within animals, day-to-day shedding displays variability on a fast timescale; (ii) within animals, infection status varies more slowly over the course of infection; (iii) between animals, there is variation in both (i) and (ii). The authors' model seems well-designed to handle this variability, but the authors are strangely inconsistent in their use of it. To be specific, to account for level (i), the authors very sensibly adopt a zero-inflated model for the shedding data, whereby the rate of shedding (colony-forming units per second, CFU/s) is assumed to arise from a mixture of a quantitative process (which we might think of as intensity of potential shedding) and an all-or-nothing process (which might arise, for example, if some discrete behavior of the animal is necessary for shedding to occur at all). The inclusion of the all-or-nothing process necessitates an additional parameter, but it allows the non-zero shedding data to inform the model. To account for level (ii), the authors use a four-dimensional deterministic dynamical system. Three of the four variables are related to the measured components of the immune response. The fourth is related to the aforementioned potential shedding. Level (iii) is accounted for using a hierarchical Bayesian approach, whereby the individual animals have parameters drawn from a common prior distribution. This approach seems very well designed to address the authors' questions using the data at hand. However, they fail to exploit this, in at least three ways. First, even though the model appears designed specifically to allow for non-shedding animals, the authors exclude animals on an ad hoc basis. Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate. Third, despite the fact that the model appears specifically designed to account for variability at each of the three levels, they do not give enough information to allow the reader to judge whether the model does in fact do a good job of partitioning this variability.

      Please see comments to each specific matter below.

      Exclusion of animals: In view of the fact that the model the authors describe can account for variability on all three levels, it is strange that they exclude animals that shed too little or not at all. It would be preferable were the authors to base their conclusions on all the data they collected rather than on a subset chosen a posteriori. It is true that the non-shedders will have no information about the time-course of shedding; on the other hand, including them does not complicate the analysis, and it does allow for estimation of the all-or-nothing probability in a coherent fashion. In particular, the fact that coinfection appears to have an impact on whether animals shed at all is itself directly related to the authors' central questions. More generally, ad hoc exclusion of data raises concerns about the repeatability of the experiments that, in this case, appear entirely avoidable.

      Rabbits that were infected but never shed were excluded from all our original analysis and continue to be excluded in our updated version. Our focus is on the dynamics of shedding and including animals that do not shed is not informative to our objective. Moreover, these animals do not provide meaningful information on rabbits that are infected but do not shed, since this is a very small number (n=7) to draw meaningful conclusions across four types of infection. Rabbits with three or less shedding events larger than zero (i.e. CFU/s>0) were originally excluded from the modeling and continue to be excluded. This decision was motivated by technical reasons of model convergence and our commitment to generate meaningful results; in other words, it is difficult to fit a model, and provide robust results, on a time series with only three points larger than zero, irrespective of the number of zero points in the time series.<br /> In summary our subset of animals was not chosen a posteriori but based on clear objectives (i.e. pattern of shedding between and within types of infections), a rigorous approach and reliable results. We have further clarified our approach in the Results and Material and Methods.

      Incomplete description of the analysis: The description of the statistical analysis will not be complete until sufficient information is provided to allow the interested reader to decide for him- or herself whether the conclusions are warranted and for the motivated reader to reproduce the analysis. In particular, it is necessary to specify all priors fully. At present, these are not described at all, except in vague, and even incoherent, ways. Also, it is necessary to provide details of the MCMC performed. Specifically, the authors should describe the MCMC sampler and show their MCMC convergence diagnostics. Finally, it is good practice to display both the priors and the posteriors: it is impossible to assess the posteriors without an understanding of the priors.

      We have carefully revised our approach and results and now provide a complete description of our analysis with additional/new details on Parameter calibration, Model fitting, Model validation and Model selection in Material and Methods, and Appendix (Appendix-3 and 4). Specifically, we have included all priors, along with all posteriors, for the four types of infection in Table 2. We have also explained how the MCMC simulations were performed and how model convergence diagnosis was assessed (section ‘Parameter calibration and Model fitting’). In Appendix-3 we also show the parameter MCMC trace plots for the four types of infection.

      Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate.

      A clear feature of our shedding data is that there is large variation in the level of shedding both within and between hosts. Because of this, data were presented as log(1+CFU/s) to reduce the skewness of the datasets, and thus the variance, and facilitate the visualization of the experimental and simulated results. The use of data in the form of CFU/s would have made the visualization much harder, especially at low shedding where a large fraction of the data come from.

      The practice of displaying the data on a log-scale is appropriate when the underlying process is exponential or when the amount of relative variation is large, including when representing rates. This practice is widely used when modeling infectious diseases and describing biomedical results. A typical example is the overdispersion of macroparasite infections in host populations, or the large variation in the size of outbreaks by microparasite infections, these data are often described on a log-scale. An example closer to our case is the study on influenza-bacteria coinfection by Smith et al. 2013 Plos Pathogens. Given the nature of our data we found that plotting the level of shedding on a log-scale was the most effective way to represent our results.

      Model adequacy: The authors' argument rests on the model's ability to adequately account for the data. The authors need to provide some evidence of this, in one form or another. Ultimately, the question is whether the data are a plausible realization of the model. The authors should show simulations from the model (including the measurement error and not merely the deterministic trajectories) and compare these simulations to the data. In particular, it seems worryingly possible that the fitted model is capable of capturing certain averages in the data while, at the same time, failing to describe the infection progression for any of the actual infected animals.

      As previously reported, we have now provided full details on model fitting and model convergence in the section ’Parameter calibration and Model fitting’ and ‘Model validation’ in Material and Methods, and ‘Model validation’ and ‘Model convergence’ in Appendix (Appendix3 and 4).

      Regarding the evidence that the data are a plausible realization of the model, we have moved the original figure S1 in the main text (now figure 5). This figure shows the good fit of the model to neutrophil, IgA and IgG, both using individual and group data from every infection. We have also revised the quality of the plot to highlight individual simulations. To avoid too much crowding the 95% CIs for every individual are not reported, however, in Appendix-1 we provide the posterior parameter estimations and their 95% CIs, for every individual and as a group average, for the three co-infections (simulations for B rabbits were performed at the group level only).

      In the new figure 6 (original figure 5), we have now included the individual trajectories (without 95% CIs to avoid overcrowding), alongside the group trends, for the neutralization rates of neutrophils, IgA and IgG which are the important parameter regulating infection and where the CIs are large enough to show the individual data. The other rates have too narrow CIs to single out individual trajectories and, thus, we only reported the group trends.

      In the revised figure 7 (original figure 6) we have revised the quality of the plots to highlight individual trajectories, in addition to the median trend, but have not included the individual 95% CIs, again to avoid overcrowding.

      Finally, the main text associated to these figures has been updated accordingly.

      Confusion of correlation and causation: At various points, the authors succumb to the temptation to interpret their model literally and to interpret the correlations they observe as evidence for a causal linkage between the three immune components they measure, bacterial shedding, and coinfection. They should be more careful and circumspect in the description of their results.

      We have thoroughly revised the presentation and discussion of the results to avoid the overinterpretation of the findings.

      Additional Issues:

      Eqs 1-4. These equations are not mechanistic in any meaningful sense. Essentially, they posit the existence of exponential time-lags between the three immunity variables, and a simple linear killing relationship between each of the variables and pathogen load. To interpret the equations literally risks making unwarranted conclusions. For example, any physiological variable correlated with any of the three variables in the model might equally well be credited with the influence on shedding attributed to IgA, IgG, or neutrophils.

      This work tests the hypothesis that neutrophils, IgA and IgG affect the dynamics of B. bronchispetica infection and, in turn, bacterial shedding. Of course, there are many other immunological mechanisms that could contribute to the pattern observed and that can be tested, as there are many other variables correlated with these dynamics that do not play any role in these patterns, as noted by the reviewer. We follow a parsimonious approach by focusing on three immune variables previously identified as important in regulating Bordetella infection. To avoid excessive complexity and allow model tractability, our informed decision was to simplify the relationship between immunity and infection, without losing the important role of the immune variables selected. Finally, by referring to previous work by others and us we do note that the immune mechanisms described can be much more complex.

      l 456. Do the authors account for the variability in time spent with plates? Implicitly, the assumption is made that the amount of time a rabbit spends with a plate, i.e., the decision as to whether to engage in a behavior that will terminate the plate interaction, is independent of everything else. This raises the question: Does the time spent per plate correlate with anything?

      We always recorded the amount of time spent with the plate, and every rabbit had a maximum interaction time of 10 minutes. Rabbits are very inquisitive and rarely we had animals that did not interact or had to remove the plate because they were chewing the media; usually animals used the entire 10 minutes. Analyses do account for the interaction time and are presented as Colony Forming Unit/second (CFU/s). As noted in the Material and Methods section ‘Observation model’: ‘The probability of having a shedding event is independent of time since inoculation, in that shedding can occur anytime during the experiment and anytime during the interaction with the petri dish”. This assumption is based on our observations of rabbit behavior during the trials.

    1. Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach<br /> The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      LCRs from repeat expansions<br /> I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      Minor points<br /> Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to create a machine learning framework for analyzing video recordings of animal behavior, which is both efficient and runs in an unsupervised fashion. The authors construct Selfee from recent computational neural network codes. As the paper is methodsfocused, the key metrics for success would be (1) whether Selfee performs similarly or more accurately than existing methods, and more importantly (2) whether Selfee uncovers new behavioral features or dynamics otherwise missed by those existing methods.

      Weaknesses:

      Although the basic schematics of Selfee are laid out, and the code itself is available, I feel that material in between these two levels of description is somewhat lacking. Details of what other previously published machine learning code makes up Selfee, and how those parts work would be helpful. Some of this is in the methods section, but an expanded version aimed at a more general readership would be helpful.

      Thanks for the suggestions. We expanded the paragraphs describing training objectives and AR-HMM analysis. We also revised Figure 2C for clarity, and we have added a new figure, Figure 6, to describe how our pipeline works in detail. We also added a detailed instructions for Selfee usage on our GitHub page.

      *The paper highlights efficiency as an important aspect of machine learning analysis techniques in the introduction, but there is little follow up with this aspect.

      Our model only had a more efficient training process compared with other self-supervised learning methods. We also found our model could perform zero-shot domain transfer, so training may not even be necessary. However, we did not mean that our model was superior in terms of data efficiency or inference speed. We have revised some of the claims in the Discussion.

      *In comparing Selfee to other approaches, the paper uses DeepLabCut, but perhaps running other recent methods for more comprehensive comparison would be helpful as well.

      We compare Selfee feature extraction with features from FlyTracker or JAABA, two widely used software. We also visualized the tracking results of SLEAP and FlyTracker in complement to the DeepLabCut experiment.

      *Using Selfee to investigate courtship behavior and other interactions was nicely demonstrated. Running it on simpler data (say, videos of individual animals walking around or exploring a confined space) might more broadly establish the method's usefulness.

      We used Selfee with open field test (OFT) of mice after chronic immobilization stress (CIS) treatment. We demonstrated that our pipeline from data preprocessing to all the data mining algorisms with this experiment, and the results were added to the last section of Results.

      Reviewer #2 (Public Review):

      Jia et al. present a CNN based tool named "Selfee" for unsupervised quantification of animal behavior that could be used for objectively analyzing animal behavior recorded in relatively simple setups commonly used by various neurobiology/ethology laboratories. This work is very relevant but has some serious unresolved issues for establishing credibility of the method.

      Overall Strengths: Jia et al have leveraged a recent development "Simple Siamese CNNs" to work for behavioral segmentation. This is a terrific effort and theoretically very attractive.

      Overall Weakness: Unfortunately, the data supporting the method is not as promising. It is also riddled with incomplete information and lack of rationale behind the experiments.

      Specific points of concern:

      1) No formal comparison with pre-existing methods like JAABA which would work on similar videos as Selfee.

      We added some comparisons with JAABA and FlyTracker extracted features, and also visualized FlyTracker and SLEAP tracking results aside from DeepLabCut. This result is now in the new Table 1. To avoid tracking inaccuracy during intensive interactions and potential inappropriately tuned parameters, we used a peer-reviewed dataset focused on wing extension behavior only. Our results showed a competitive performance of Selfee as other methods.

      2) For all Drosophila behavior experiments, I'm concerned about the control and test genetic background. Several studies have reported that social behaviors like courtship and aggression are highly visual and sensitive to genetic background and presence of "white" gene. The authors use Canton S (CS) flies as control data. Whereas it is unclear if any or all of the test genotypes have been crossed into this background. It would be helpful if authors provide genotype information for test flies.

      We have added a detailed sheet about their genotype in this version. The genetic information of all animals can also be found on the Bloomington fly center by the IDs provided. In brief, five fly lines used in this work are in the CS background: CCHa2-R-RAGal4, CCHa2-R-RBGal4, Dop2RKO, DopEcRGal4 and Tdc2RO54. We did not back cross other flies into the CS background for three reasons. First, most mutant lines are compared with their appropriate control lines. For example, in the original Figure 3B (the new Figure 4B), for CCHa2-R-RBGal4 > Kir2.1 flies contained wildtype white gene, so the comparison with CS flies would not cause any problem. For TrhGal4 flies, they were in white background, and so were other lines that had no phenotype. At the same time, in the original Figure 3G to J (the new Figure 4G to J), we used w1118 as controls for TrhGal4 flies, which were all in mutated white background. Second, in the original Figure 4F and G (the new Figure 5F and G), we admitted that the comparison between NorpA36, in mutated white background, and CS flies was not very convincing. Nevertheless, the delayed dynamic of NorpA mutants was reported before, and our experiment was just a demonstration of the DTW algorithm. Lastly, our method focused on the methodology of animal behavior analysis, and original videos were provided for research replications. Therefore, even if the behavioral difference was due to genetic backgrounds, it would not affect the conclusion that our method could detect the difference

      3) Utility of "anomaly score" rests on Fig 3 data. Authors write they screened "neurotransmitter-related mutants or neuron silenced lines" (lines 251-252). Yet Figure 3B lacks some of the most commonly occurring neurotransmitter mutants/neuron labeling lines (e.g. Acetelcholine, GABA, Dopamaine, instead there are some neurotransmitter receptor lines, but then again prominent ones are missing). This reduces the credibility of this data.

      First of all, this paper did not intend to conduct new screening assays, rather we used pre-existed data in the lab to demonstrate the application of Selfee. Previous work in our lab focused on the homeostatic control of fly behaviors, so most listed lines used here were originally used to test the roles of neuropeptides or neurons nutrient and metabolism regulation, such as CCHarelated lines, a CNMa mutant, and Taotie neuron silenced flies. There were some other important genes that were not involved in this dataset. Some most common transmitters are not included for two reasons. First, common neurotransmitters usually have a very global and broad effect on animal behaviors, and even if there is any new discovery, it could be difficult to interpret the phenomenon due to a large number of disturbed neurons. Second, most mutants of those common neurotransmitters are not viable, for example, paleGal4 as a mutant for dopamine; Gad1A30 for GABA, and ChATl3 for acetylcholine. However, we did perform experiments on serotonin-related genes (SerT and Trh), octopamine-related genes (Tdc and Oamb), and some other viable dopamine receptor mutants.

      4) The utility of AR-HMM following "Selfee" analysis rests on the IR76b mutant experiment (Fig4). This is the most perplexing experiment! There are so many receptors implicated in courtship and IR76b is definitely not among the most well-known. None of the citations for IR76b in this manuscript have anything to do with detection of female pheromones. IR76b is implicated in salt and amino acid sensation. The authors still call this "an extensively studies (co)receptor that is known to detect female pheromones" (lines310-311). Unsurprisingly the AR-HMM analysis doesn't find any difference in modules related to courtship. Unless I'm mistaken the premise for this experiment is wrong and hence not much weight should be given to its results.

      We have removed the Ir76b results from the Results. The demonstration of AR-HMM was now done with a mouse open field assay.

      Reviewer #3 (Public Review):

      This paper is describing a machine learning method applied to videos of animals. The method requires very little pre-processing (end-to-end) such as image segmentation or background subtraction. The input images have three channels, mapping temporal information (liveframes). The architecture is based on tween deep neural networks (Siamese network) and does not require human annotated labels (unsupervised learning). However, labels can still be used if they are produced, as in this case, by the algorithm itself - self-supervised learning. This flavor of machine learning is reflected in the name of the method: "Selfee." The authors are convincingly applying the Selfee to several challenging animal behavior tasks which results in biologically relevant discoveries.

      A significant advantage of unsupervised and self-supervised learning is twofold: 1) it allows for discovering new behaviors, and 2) it doesn't require human-produced labels.

      In this case of self-supervised learning the features (meta-representations) are learned from two views of the same original image (live-frame), where one of the views is augmented in several different ways, with a hope to let the deep neural network (ResNet-50 architecture in this case) learn to ignore such augmentations, i.e. learn the meta-representations invariant to natural changes in the data similar to the augmentations. This is accomplished by utilizing a Siamese Convolutional Neural Network (CNN) with the ResNet-50 version as a backbone. Siamese networks are composed of tween deep nets, where each member of the pair is trying to predict the output of another. In applications such as face recognition they normally work in the supervised learning setting, by utilizing "triplets" containing "negative samples." These are the labels.

      However, in the self-supervised setting, which "Selfee" is implementing, the negative samples are not required. Instead the same image (a positive sample) is viewed twice, as described above. Here the authors use the SimSiam core architecture described by Chen, X. & He, K (reference 29 in the paper). They add Cross-Level Discrimination (CLD) to the SimSiam core. Together these two components provide two Loss functions (Loss 1 and Loss 2). Both are critical for the extraction of useful features. In fact, removing the CLD causes major deterioration of the classification performance (Figure 2-figure supplement 5).

      The authors demonstrate the utility of the Selfee by using the learned features (metarepresentations) for classification (supervised learning; with human annotation), discovering short-lasting new behaviors in flies by anomaly detection, long time-scale dynamics by ARHMM, and Dynamic Time Warping (DTW).

      For the classification the authors use k-NN (flies) and LightGBM (mice) classifiers and they infer the labels from the Selfee embedding (for each frame), and the temporal context, using the time-windows of 21 frames and 81 frames, for k-NN classification and LightGBM classification, respectively. Accounting for the temporal context is especially important in mice (LightGBM classification) so the authors add additional windowed features, including frequency information. This is a neat approach. They quantify the classification performance by confusion matrices and compute the F1 for each.

      Overall, I find these classification results compelling, but one general concern is the criticality of the CLD component for achieving any meaningful classification. I would suggest that the authors discuss in more depth why this component is so critical for the extraction of features (used in supervised classification) and compare their SimSiam architecture to other methods where the CLD component is implemented. In other words, to what degree is the SimSiam implementation an overkill? Could a simpler (and thus faster) method be used - with the CLD component - instead to achieve similar end-to-end classification? The answer would help illuminate the importance of the SimSiam architecture in Selfee.

      We added more about the contribution of the CLD loss in the last paragraph of Siamese convolutional neural networks capture discriminative representations of animal posture, the second section of Results. Further optimization of neural network architectures was discussed in the Discussion section. As for why CLD is that important, there are two main reasons. First of all, all behavior photos are so similar that it is not very easy to distinguish them from each other. In the field of so-called self-supervised learning without negative samples, researchers use either batch normalization or similar operations to implicitly utilize negative samples within a minibatch. However, when all samples are quite similar, it might not be enough. CLD uses explicit clusters to utilize negative samples within a minibatch, in the word of the authors “Our key insight is that grouping could result from not just attraction, but also common repulsion”, so that provides more powerful discrimination. The second reason is what the author argued in the CLD paper, CLD is very powerful in processing long-tailed datasets. As shown in the original Figure 2—figure supplement 5 (the new Figure 3—figure supplement 5), behavior data are highly unbalanced. As explained in the CLD paper. CLD fights against long-tailed distribution from two aspects. One is that it scales up the importance of negative samples within a mini-batch from 1/B to 1/K by k-means; another is that cluster operation could relieve the imbalance between the tail and head classes within a mini-batch. Here I quote: “While the distribution of instances in a random mini-batch is long-tailed, it would be more flattened across classes after clustering.” It was also visualized in Fig5 of the CLD paper.

      To the best of our knowledge, SimSiam is the simplest method that would work with CLD. In the original CLD paper, they combined CLD method with other popular frameworks including BYOL and Mocov2. However, those popular frameworks are more complicated than SimSiam networks. We have attempted to combine CLD with BarlowTwins but failed. As the author of CLD suggested on Github: “Hi, good to know that you are trying to combine CLD with BarLowTwins! My concern is also on the high feature dimension, which may cause the low clustering quality. Maybe it is necessary to have a projection layer to project the highdimensional feature space to a low-dimensional one.” In terms of speed, there are two major parts. For inference, only one branch is used, so the major contribution of efficiency comes from CNN backbone. In theory, light backbones like MobileNet would work, but ResNet50 is already fast enough on a model GPU. As for training, the major computational cost aside from the CNN backbone is from Siamese branches. Two branches, two times of computation. Nevertheless, CLD relied on this kind of structure, so even if the learning framework is simpler than Simsiam, it is not likely to achieve a faster training speed. As for other structures, I think this new instance learning framework (https://arxiv.org/abs/2201.10728) is possible to achieve a similar result with fewer data and in a shorter time. However, this powerful method could be used with CLD. We might try it in the future.

      One potential issue with unsupervised/self-supervised learning is that it "discovers" new classes based, not on behavioral features but rather on some other, irrelevant, properties of the video, e.g. proximity to the edges, a particular camera angle, or a distortion. In supervised learning the algorithm learns the features that are invariant to such properties, because humanmade labels are used and humans are great at finding these invariant features. The authors do mention a potential limitation, related to this issue, in the Discussion ("mode splitting"). One way of getting around this issue, other than providing negative samples, is to use a very homogeneous environment (so that only invariance to orientation, translation, etc, needs to be accomplished). This has worked nicely, for example, with posture embedding (Berman, G. J., et al; reference 19 in the manuscript). Looking at the t-SNE plots in Figure 2 one must wonder how many of the "clusters" present there are the result of such learning of irrelevant (for behavior) features, i.e. how good is the generalization of the meta-representations. The authors should explore the behaviors found in different parts of the t-SNE maps and evaluate the effect of the irrelevant features on their distributions. For example, they may ask: to what extent does the distance of an animal from the nearest wall affect the position in the t-SNE map? It would be nice to see how various simple pre-processing steps might affect the t-SNE maps, as well as the classification performance. Some form of segmentation, even very crude, or simply background subtraction, could go a very long way towards improving the features learned by Selfee.

      In the new Figure 3—figure supplement 1, the visualization demonstrates that our features contained a lot of physical information, including wing angles, animal distance and positions in the chamber. “Mode-split” can be partially explained by those features. We actually performed background subtraction and image crop for mice behaviors, where we found them useful.

      The anomaly detection is used to find unusual short-lasting events during male-male interaction behavior (Figure 3). The method is explained clearly. The results show how Selfee discovered a mutant line with a particularly high anomaly score. The authors managed to identify this behavior as "brief tussle behavior mixed with copulation attempts." The anomaly detection analyses were also applied to discover another unusual phenotype (close body contact) in another mutant line. Both results are significant when compared to the control groups.

      The authors then apply AR-HMM and DTW to study the time dynamics of courtship behavior. Here too, they discover two phenotypes with unusual courtship dynamics, one in an olfactory mutant, and another in flies where the mutation affects visual transduction. Both results are compelling.

      The authors explain their usage of DTW clearly, but they should expand the description of the AR-HMM so that the reader doesn't have to study the original sources.

      We expanded the section that talks about AR-HMM mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      This work offers a simple explanation to a fundamental question in cell biology: what dictates the volume of a cell and of its nucleus, focusing on yeast cells. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. The novelty resides in an effort to provide actual numbers experimentally.

      In this work, Lemière and colleagues combine physical modeling and quantitative measures to establish the basic principles that dictate the volume of a cell and of its nucleus. By doing so, they also explain an observation reported many times and in many different types of cells, of a proportionality between the volume of the cell and of its nucleus. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. This is because, in yeast cells, while the cell has a wall that can contribute to the equilibrium, the nucleus does not have a lamina and there is thus no elastic contribution in the force balance for the nucleus, as the authors show very nicely experimentally, using both cells and protoplasts and measuring the cell and nucleus volume for various external osmotic pressures (the Boyle Van't Hoff Law for a perfect gas, also sometimes called the Ponder relation) ¬- this was performed before for mammalian cells (Finan et al.), as cited and commented in the discussion by the authors, showing that mammalian cells have no significant elastic wall (linear relation) while the nucleus has one (non linear relation). This is well explained by the authors in the discussion. It is one of the clearer experimental results of the article. Together, the data and model presented in this article offer a simple explanation to a fundamental question in cell biology. In this matter, the principles are indeed seemingly simple, but what really counts are the actual numbers. While this article sheds some light on this aspect, it does not totally solve the question. The experiments are very well done and quantified, but some approximations made in the modeling are questionable and should at least be discussed in more length. Overall, this article is extremely valuable in the context of the recent effort of the cell biology and biophysics communities to understand the fundamental question of what dictates the size of cells and organelles. I have a few concerns detailed below. Importantly, there are many very interesting points of the article that I am not discussing below, simply because I completely agree with them.

      1) The main concern is about the assumption made by the authors that the small osmolytes do not count to establish the volume of the nucleus. It was shown that small osmolytes such as ions are a vast majority of the osmolytes in a cell (more than ten times more abundant than proteins for example, which represent about 10 mM, for a total of 500 mM of osmolytes). This means that just a small imbalance in the amount of these between the nucleus and cytoplasm might have a much larger effect than the number of proteins, which is the osmolyte that authors choose to consider for the nuclear volume.

      The point of the authors to disregard small osmolytes is that they can freely diffuse between the cytoplasm and the nucleus through the nuclear pores. They thus consider that the nuclear volume is established thanks to the barrier function of the nuclear envelope, which would retain larger osmolytes inside the nucleus and that the rest is balanced. This reasoning is not correct: for example, the volume of charged polymers depends on the concentration of ions in the polymer while there is no membrane at all to retain them. This is because of an important principle that the authors do not include in their reasoning, which is electro-neutrality.

      Because most large molecules in the cell are charged (proteins and also DNA for the nucleus), the number of counterions is large, and is probably much larger than the number of proteins. So it is hard to argue that this could be ignored in the number of osmotically active molecules in the nucleus. This is known as the Donnan equilibrium and the question is thus whether this is actually the principle which dictates the nuclear volume.

      The question then becomes whether the number of counterions differs between the cytoplasm and the nucleus, and more precisely whether the difference is larger than the difference considered by the authors in the number of proteins.

      How is it possible to estimate this number? One of the numbers found in the literature is the electric potential across the nuclear envelope (Mazanti Physiological Reviews 2001). The number is between 1 and 10 mV, with more cations in the nucleus than in the cytoplasm. This number could correspond to much more cations than the number of proteins, although the precise number is not so simple to compute and the precision of the measure matters a lot, since there is an exponential relation between the concentrations and the potential.

      This point above is simply made to explain that the authors cannot rule out the contribution of small osmolytes to the nuclear volume and should at least leave this possibility open in the discussion of their article.

      As a conclusion, I totally agree with equation 3 which defines the N/C ratio, but I think that the Ns considered might not be the number of large macromolecules which cannot pass the nuclear envelope, but rather the small ones. Whether it is the case or not and what is actually the important species to consider depends on the actual numbers and these numbers are not established in this article. It is likely out of the scope of the article to establish them, but the point should at least be discussed and left open for future studies.

      We appreciate these excellent points made by the reviewer and their numerous consultants. We amend the discussion of colloid osmotic pressure in the text to reflect these points.

      2) The authors refer to the notion of colloidal pressure, discussed in the review by Mitchison et al. This term could be confusing and the authors should either explain it better or just not use it and call it perfect gas pressure or Van't Hoff pressure. Indeed, what is meant by colloidal pressure is simply the notion that all molecules could be considered as individual objects, independently of their size, and that it is then possible to apply the Van't Hoff Law just as it was a perfect gas, hence the notion of 'colloidal' pressure, which would be the osmotic pressure of all the individual molecules. The authors might want to discuss, or at least mention, that it is a bit surprising that all these crowded large macromolecules would behave like a perfect osmometer and that the Van't Hoff law applies to them. Alternatively, it could be simpler to consider that what actually counts for the volume is mostly small freely diffusing osmolytes, to which this law applies well, and which are much more numerous.

      3) Very small point: on page 7 the authors refer to BVH's Law (Nobel, 1969). It is not clear what they mean. If they refer to the Nobel prize of Van't Hoff, it dates from 1901 (he died in 1911) and not 1969. I am not sure if there is something in one of the Nobel prizes delivered in 1969 which relates to this law. I checked but it does not seem to be the case, so it is probably a mistake in the date.

      The citation is correct. It's a JTB paper by Park S. Nobel describing the BHV relation in biology.

      4) On page 11, bottom, the result of the maintenance of the N/C ratio in protoplast is presented as an additional result, while it is a simple consequence of the previous results: both the cell and nuclear volume change linearly with the external osmotic pressure, so it is obvious that their ratio does not change when the external pressure is changed.

      This result was not trivial. Although both cells and nuclei volume change linearly with the inverse of the external osmotic concentration in protoplasts, it was not obvious whether the two volumes change with the same proportion (ie same slope on the BVH graph).

      Another result, not commented by the authors, is that this should be true only in protoplasts, since in whole cells, the cell wall is affecting the response of the cell volume, but not the nucleus, so the ratio should change.

      In whole cells, the maintenance of the N/C ratio is in fact also maintained, consistent with the model. This result is now clarified in the manuscript (Figure 1C and D plus Figures 3D and S1C).

      5) The results in Figure 5, with the inhibition of export from the nucleus, are presented as supporting the model. It is not really clear that they do. First the effect is very small, even if very clear. Again, the numbers matter here, so the interpretation of this result is not really direct and more calculation should be made to understand whether it can really be explained by a change of number of proteins. The result in panel F is even more problematic. The authors try to argue that the nucleus transiently gets denser, based on the diffusion of the GEMs and then adapts its density. It rather seems that it is overall quite constant in density, while it is the cell which has a decreasing density ¬- maybe, as suggested by the authors, because there are less ribosomes in the cytoplasm, so protein production is reduced. This could have an indirect effect on the number of amino acids (which would then be less consumed). A recent article by Neurohr et al (Trends in cell biology, 2020) suggests that such an effect can lead to cell dilution, in yeast, because the number of amino acids increases. In this particular case, this increase would affect the nuclear volume rather than the cell volume because of the presence of the cell wall and the rather small change.

      We agree that there are different possible interpretations for these results. We have carefully reconsidered the interpretation and have rewritten the entire text for Figure 5

      6) Page 16: it seems to me that the experiments presented in the chapter lines 360 to 376, on the ribosomal subunits, simply confirm that export is impaired, and they do not really contribute to confirm the hypothesis of the authors that it is the number of proteins in the nucleus which counts.

      We agree. We highlight the ribosomal subunit proteins as they are very abundant nuclear shuttling proteins that provide a good example for the dynamics of nuclear protein accumulation.

      The next paragraph with the estimation of the number of proteins in the nucleus and cytoplasm and how they change relatively upon export inhibition also appears to mostly demonstrate that export has been inhibited.

      The authors propose to use the number they find, 8%, to compare it to the change in the N/C ratio, which is of the same order. Given how small these numbers are, and the precision of such measures, it is very hard to believe that these 8% are really precise at a level which could allow such a comparison. The authors should really estimate the precision of their measures if they want to claim that. It is more likely that what they observe is a small but significant change in both cases; a small change means it is small compared to the total, so it is a fraction of it, and it is measurable, which means it is more than just a few percent, which is usually not possible to measure. So it means that it is in the order of 10%. This is the typical value of any small but measurable change given a method for the measure which can detect changes around 10%. In conclusion, these numbers might not prove anything.

      It could also be that the numbers match not just by chance, but that the osmolyte which matters is, for this type of experiment, changing in proportion to the amount of proteins (which would be possible for counter ions for example). But determining all that requires precise calculations and additional measures. It is thus more a matter of discussion and should be left more open by the authors.

      We agree that these measurements are not so precise. We have carefully reworded this section and removed these specific comparisons.

      Reviewer #2 (Public Review):

      The goal of the paper is to test the idea that colloidal osmotic pressure controls nuclear growth as suggested by Tim Mitchison in a recent review.

      In fleshing out the idea, Lemiere and colleagues develop a simple mathematical model that focuses on the forces generated by the movement of macromolecules across the nuclear-cytoplasmic boundary, ignoring any contribution of ions or small molecules which they assume equilibrate across the nuclear envelope. In testing this model, they focus their quantitative analysis on the response of cells that lack a wall (protoplasts) to osmotic shocks and to perturbations of nuclear export, protein synthesis and symmetric cell division. They also analyse the motion of small 40nm particles to test how diffusion is affected by these perturbations in both compartments.

      Their analysis leads them to make some important observations that suggest that the system is even simpler than they might have hoped, since under the conditions tested nuclei (which lack lamins) behave as ideal osmometers. That is, the nuclei and cytoplasm grow and shrink in concert following sudden osmotic shocks. This suggests that the tension in the nuclear envelope, which gives nuclei their spherical shape, plays no role in constraining nuclear size.

      While most of the paper's claims are well supported by their data under the assumptions of the model, there are a few claims that are less convincing.

      For example, while their data are consistent with the idea that cells regulate their nuclear/cytoplasmic size ration using an adder type mechanism, in which a fix ratio of nuclear and cytoplasmic proteins are synthesised per unit time as cells grow, this has not been rigorously put to the test. In addition, while the diffusion analysis is very interesting, it does not fully support the authors' simple model linking diffusion, molecular crowding and colloidal osmotic pressure, something that could be more thoroughly discussed in the manuscript.

      We added new data showing that slowing growth rate leads to a proportionate decrease in N/C ratio correction. This strengthens this portion of the paper.

      We have added an improved discussion of the GEMs data and its limitations.

      Reviewer #3 (Public Review):

      This manuscript by Lemière and colleagues presents a view on how nuclear size is set by simple physical principles. The first part of the work describes a theoretical framework with the nucleus and the cell as two nested osmometers. Using fission yeast as a model, the authors then show that protoplasts and nuclei behave as ideal osmometers, i.e. show linear changes in volume upon change in external osmotic pressure. Consequently, the nuclear to cell volume ratio remains constant upon osmotic changes, but increases upon block of nuclear export, which leads to higher nuclear protein contents. Measurements of diffusion in the cytoplasm and nucleoplasm back these data. Finally, in the last part of the manuscript, the authors show that nuclear growth through a passive osmotic model can explain the previously described homeostasis of nuclear volume.

      The manuscript is clearly written, and the data are clean and overall solid. I very much liked the simple view on the phenomenon of constant nuclear to cytosol ratio and the mix of modelling and experiments supporting the model that nuclear size is set passively by osmotic principles.

      There are however a few points that are slightly at odds with the model and/or require further explanation to make the model compelling and discuss it in view of previous findings.

      1) Isn't the finding that diffusion rates are faster in the nucleus (line 298, Fig S4C), indicating lower crowding in the nucleus, at odds with the finding that the non-osmotic volumes are similar in the two compartments? If the nucleus is less crowded, does this not suggest a lower pressure than the cytosol? I would also like to see this finding appear in Figure 4, which only reports on the normalized diffusion rates in both nuclei and cytosol.

      We have added this figure to the main Figure 4, as requested. We agree that this raises some interesting questions. Our current interpretation is that composition of the nucleoplasm and cytoplasm are different and therefore affect GEMs diffusion and colloid osmotic pressure slightly differently.

      2) Similarly, I don't understand the observed change in diffusion rates of GEMs upon LMB treatment (Fig 5F). If the nucleus behaves as an ideal osmometer, then any change in protein density between the nucleus and the cytosol, leading to change in osmotic pressure, will lead to a change in nuclear size that should re-equilibrate the osmotic pressures between the two compartments. The prediction would thus be that, if LMB treatment does not change overall protein concentration, at equilibrium there is no change in either osmotic pressure or density as measured by GEM diffusion rates. This is indeed illustrated by the constant normalized non-osmotic volume of the nucleus after LMB treatment. Is the change in diffusion rates perhaps only transient until a new steady state is reached? Or is there a change upon total protein content in the cell after LMB treatment?

      3) In the experiments labelling proteins with FITC, are the reported values really those of protein concentrations or rather protein amounts? Isn't the enlargement of the nucleus upon LMB treatment compensating for this increase in amounts, returning the nucleus to a similar concentration as before treatment? A change in concentration is not in agreement with the reported constant non-osmotic volume of the nucleus.

      These measurements of intensity are of concentrations. We add in the text this prediction that changes in concentration will be compensated for by swelling in nuclear volume and now interpret the data in light of this prediction. We add new data that total FITC staining for protein and RNA shows no change in concentration in compartments, consistent with this model.

      4) The authors state that "a previous paper proposed a model for N/C ratio homeostasis based upon an active feedback mechanism (Cantwell and Nurse, 2019)" (lines 471-472). My understanding of this previous study is that nuclear size was proposed to be set by a limiting component, itself proportional to cell volume. No feedback was postulated. This previous model is in fact not too different from what the authors propose here, with the previously proposed limiting component now corresponding to the nuclear macromolecules that produce colloid osmotic pressure and thus set nuclear size. Though the present study goes significantly further in presenting the passive role of osmosis in setting nuclear size, it is a misrepresentation to portray this previous model as fundamentally different. Furthermore, it is not clear whether the new osmotic pressure-based model produces a better fit than the previous 'limiting component model'. Figure 7E here is very similar to Fig 4I in Cantwell and Nurse 2019, but it is difficult to judge the similarity of the fits.

      The Cantwell and Nurse paper tested two models. The first was based upon nuclear growth being a fraction of cell growth. This model is qualitatively similar to ours. However, they discarded this initial model because it fitted poorly with their data. They then went to propose a second model, which contains a critical equation in which nuclear growth rate is a function of the N/C ratio, i.e. the system is sensing the N/C ratio and adjusting nuclear growth rate as a function of the N/C ratio. In other words, this is a feedback mechanism. The Cantwell paper does not describe this "feedback" term explicitly in the text, but it is clearly present in the equations. Therefore, our model which lacks any feedback term is fundamentally different from the Cantwell limiting component model.

      We show that our model fits our data much better than the Cantwell model. We believe that the different views in these studies arise from differences in the experimental data. These differences may arise from two technical differences: 1) Their use of binning could be responsible for flattening the nuclear growth rate as a function of the nuclear volume at start. 2) Their estimates of cell and nuclear volumes using a 2D image and geometric assumptions may be less accurate than our automated 3D volume method.

      5) If nuclear size is set purely by osmotic regulation, how do you explain that mutants in membrane regulation (such as nem1 and spo7, see Kume et al 2017; or lem2, see Kume et al 2019) previously shown to have an enlarged nucleus, display increased nuclear size?

      This is an interesting question that we are currently pursuing. It is likely that these mutants affect multiple processes besides nuclear envelope expansion. For example, at least some of these mutants have altered chromatin organization could cause increase in colloid pressure. There may also be significant defects in chromosome segregation, which leads to production of different-sized nuclei with abnormal number of chromosomes. Some of the N/C ratio defects reported in these papers may arise from their 2D measurement methods, which are not accurate for misshapen nuclei. In our preliminary results, lem2 mutants do not have N/C ratio defects.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented in the first part of the study are convincing. However, it is unclear whether each step of cell elongation and alignment, cell migration, cell dedifferentiation and regenerative response, is required for fin regeneration following amputation. As indicated in the discussion, the authors cannot provide evidence for the requirement of migration or dedifferentiation for the overall success of fin regeneration. Such limitations should be more clearly stated.

      We have modified the title and abstract to avoid overstating the requirement of the particular responses to successful regeneration. Furthermore, we have stated the limitations of our study more clearly in the discussion.

      We have removed the word “requires” from the title, it now reads: Zebrafish fin regeneration involves generic and regeneration-specific osteoblast injury responses

      In the discussion we state the limitations on page 21 as follows:

      “Unfortunately, currently existing tools to block dedifferentiation are either mosaic (activation of NF- κB signalling using the Cre-lox system) or cannot be targeted to osteoblasts alone (treatment with retinoic acid). Due to these limitations in our assays, we can currently not test what consequences specific, unmitigated perturbation of osteoblast dedifferentiation has for overall fin / bone regeneration. Conversely, the interventions presented here that specifically perturb osteoblast migration are limited as they act only transiently, that is they can severely delay, but not fully block migration. Furthermore, while interference with actomyosin dynamics reduces regenerative growth, we cannot distinguish whether this is caused by the inhibition of osteoblast migration or due to other more direct effects on cell proliferation and tissue growth. Thus, an unequivocal test of the importance of osteoblast migration for bone regeneration requires different tools.”

      In the second part of the study, the term trauma needs to be clarified or reconsidered. A trauma model would imply that healing is impaired. Evidence for a non-healing phenotype is lacking and is expected in support of a trauma model.

      We apologize if our use of the term trauma has caused confusion. We have simply used it interchangeably with “injury”. We have now removed all references to “trauma” in the text.

      The authors describe the process of fin regeneration that may share common features with bone regeneration in other species. In the absence of direct evidence of common mechanisms between fin regeneration and bone regeneration in other systems, the authors should remain focused on "fin regeneration" in their conclusions rather than referring to "bone regeneration" and "bone formation" in more general terms.

      We have rephrased the conclusion to have it more centred on bone regeneration in the fin. The relevant parts of the discussion now read on page 25 as follows:

      In conclusion, our findings support a model in which zebrafish fin bone regeneration involves both generic and regeneration-specific injury responses of osteoblasts. Morphology changes and directed migration towards the injury site as well as dedifferentiation represent generic responses that occur at all injuries even if they are not followed by regenerative bone formation. While migration and dedifferentiation can be uncoupled and are (at least partially) independently regulated, they appear to be triggered by signals that emanate from all bone injuries. In contrast, migration off the bone matrix into the bone defect, formation of a population of (pre-) osteoblasts and regenerative bone formation represent regeneration-specific responses that require additional signals that are only present at distal-facing injuries. The identification of molecular determinants of the generic vs regenerative responses will be an interesting avenue for future research.

      Reviewer #2 (Public Review):

      The study by Sehring et al. depends on an extensive and thoroughly acquired collection of data points in combination with a robust and rigorous statistical analysis. I see that the authors have spent a lot of effort into this and I am overwhelmed by the number of analyzed data points that again depend on careful measurements at the cellular level in a more or less intact tissue. However, since just a fraction of cells has been chosen to be incorporated into the statistical analysis, there is a certain risk of a biased selection. I think the reader of the paper would appreciate a somewhat clearer picture of how the authors get to their final numbers, starting from the original image data. This appears of particular importance when it comes to determining the elongation of cells and the angular deviations from the proximo-distal axis. In many cases (e.g. Fig.2 A, B, D and E), the reader has to take those numbers without seeing any primary image data. A practicable solution to that issue would be to complement the accompanying Excel sheets of raw data with corresponding image material. This should show an overview of a representative sample for the dedicated experiment, together with some appropriate magnifications of analyzed cells including the axes along which those measurements have been performed. Also, it would be important to state within the methods section of the paper whether the measurements have been done manually using Fiji or whether a certain automated Fiji plug-in has been used for this part of the analysis.

      Osteoblasts line the bony hemirays on the inner and outer surface (see Figure 1A), and for quantifications of osteoblast morphology, we analysed the osteoblasts of the outer layer of one hemiray (the hemiray facing the objective in whole mount imaging). While we have no direct evidence for this, we think it is reasonable to assume that osteoblasts in the other “sister” hemiray behave the same, and we have anecdotal evidence that osteoblasts on the inner surface of the hemirays also migrate and dedifferentiate. Thus, we don’t think that restriction of the analysis to one hemiray and the outer surface introduces bias.

      For measurement of morphology, we used a transgenic line expressing a fluorescent protein (FP) in osteoblasts in combination with Zns5 antibody labelling. Zns5 is a pan-osteoblastic marker which localizes to the cell membrane. Therefore, combination of a cytosolic FP labelling with the membrane labelling by Zns5 provides solid definition of single cell outlines. For general morphology studies and drug intervention studies, we used bglap:GFP transgenics. In the transgenic intervention studies (manipulation of NF-kB signalling), mCherry is expressed together with CreERT2 under the osterix promoter and used as cytosolic labelling of osteoblasts. Our analyses are always based on segments, e.g. we present data for segments 0, -1, 2. Within these segments all FP+ Zns5+ cells were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. Measurements were performed manually, and the analysist was blinded. With these set-ups, not only a fraction but all FP+ Zns5+ osteoblasts present in those segments that we analysed were included into the analysis, and thus no selection was necessary that could have introduced bias. As suggested by Reviewer #2, we have added representative sample images to the accompanying Excel sheets of raw data for the dedicated experiments. Within these, the axes along which the measurements have been performed are indicated.

      We have expanded the description of the analysis in the method section. It now reads on page 36 as follows:

      “To quantify osteoblast cell shape and orientation, the transgenic line bglap:GFP in combination with Zns5 AB labelling was used. Osteoblasts of the outer layer of one hemiray (facing the objective in whole fin mounting) were imaged and analysed. As Zns5 localizes to the plasma membrane of all osteoblasts, the combination of both markers provides solid definition of single cell outlines. All GFP+ Zns5+ cells with such a defined outline within an analysed segment were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. In the transgenic intervention studies, mCherry is expressed under the osx promoter and was used as cytosolic labelling of osteoblasts. Using Fiji (Schindelin et al., 2012), the longest axis of a FP+ Zns5+ cell was measured as maximum length, the short axis as maximum width, and the ratio calculated. Simultaneously, the angle of the maximum length towards the proximodistal ray axis was measured for angular deviation. All measurements were performed manually, with the analyst being blinded.”

      Along the same line, it would strengthen the statement provided by the statistical diagram in Fig.3A if the authors could show images of cells from segment -1 and -2 for all three experimental conditions. In particular, since the depicted segment -1 osteoblasts look rather roundish than elongated (compare with Fig.1 C and D, images and width/length ratio).

      As suggested by the reviewer, we have added representative sample images of cells in segment -1 to the figure, the images that were already there in the previous version of the figure were from segment -2 (new data in Figure 4A). As legible from the graphs, there is a certain range of morphology within each segment / assay with an obvious overlap between the segments. This can make it difficult to realize the difference between the segments by looking on the images alone, and we have therefore added arrowheads to highlight examples of roundish and elongated cells. Yet as mentioned above, all cells were included into the analysis.

      In regards to the biology itself, Sehring and colleagues claim that the complement system is required for injury-induced directed osteoblast migration. To strengthen this point it would be beneficial if the authors could show that the central complement components C3 and C5 are indeed expressed at the amputation site where the dedifferentiated pre-osteoblasts migrate to. It would be interesting to learn about the localization of C3 and C5 expression in the conventional amputation as well as the double-injury condition. Apparently, the RNAscope-based in situ hybridization seems to work quite well in the Weidinger lab.

      Complement precursor proteins are thought to be mainly expressed in the liver and distributed throughout the body via the circulation. Injury would then result in local production of the activated C3a and C5a peptides via a cascade of proteolytic processing. Unfortunately, we lack the tools to detect the C3 and C5 precursor proteins or the mature cleavage products of the complement factors, which mediate the biological function of the cascade (e.g. antibodies against the zebrafish proteins / peptides). We have also attempted RNAScope for c5a and c3a.1 in fins, but these turned out to not produce any specific stainings, thus the results of these experiments remained inconclusive and we have not included them in the manuscript.

      However, we analysed expression of the RNA coding for the precursors of the complement factors c5 and the six zebrafish paralogs of c3 using qRT-PCR on liver, non-injured fins and fins at 6 hpa (samples derived from segment -1 plus segment 0). These new data can be found in Figure 5B. Compared to the expression levels in the liver, expression in non-injured fins could hardly be detected. Interestingly, c5 and c3a.5 levels were upregulated in injured fins, but compared to the expression in the liver still only slightly, e.g. c5 is about 17 Ct values (2 to the power of 17 = 130000 times) more highly expressed in the liver than in the injured fin. These results are consistent with the idea that the majority of complement factors that are activated after injury is derived from precursors that are expressed in the liver and are distributed via the circulation to the fin, as is considered standard for the complement system. Interestingly, however, local production might contribute as well.

      Overall our new data support our conclusion that the complement system is an important regulator of osteoblast migration in vivo, since the receptors are present in osteoblasts (see also response to the next issue), while systemic and local expression can provide the precursors for injury-induced production of the activated factors that might act as guidance cues.

      To judge whether this osteoblast's migratory response is cell-type specific and cell-autonomous it would be good to know if c5ar1 and c3ar are solely expressed in osteoblasts, or rather broadly within tissue lining the hemirays.

      While we had already shown that c5aR1 is expressed in osteoblasts, we have now added additional RNAscope in situ analysis for c5aR1 showing that the receptor is also expressed in other cell types (new data in Figure 5 – figure supplement 1A). We have also attempted RNAScope for c3aR in fins, which however did not produce specific staining, thus remained inconclusive; we have not added these data to the manuscript. However, we established fluorescent activated cell sorting from bglap:GFP transgenic fins, which gives us an additional tool to analyse to which extent expression is specific to osteoblasts. By qRT-PCR analysis we found that c5aR1 and c3aR are expressed in both GFP+ osteoblasts and other cells that are GFP– (these will mainly represent epidermis and fibroblasts, to a lesser extent endothelial and other cell types). These new data can be found in Figure 5 – figure supplement 1B.

      While our qRT-PCR data and the c5aR1 RNAScope results show that the complement receptors are not specifically expressed in osteoblasts, we do not consider this result to be in conflict with our model that the complement system regulates osteoblast migration. Other cell types migrate after fin amputation as well, which is best described for epidermal cells (Chen et al., Dev Cell 2016, 10.1016/j.devcel.2016.02.017), but likely also occurs for fibroblasts (Poleo et al., DevDyn 2001, doi: 10.1002/dvdy.1152), and it is conceivable that the complement system plays a role in regulating these events as well.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) The major conclusions on osteoblast dedifferentiation and migration are solely based on a bglap:GFP strain, which does not allow a pulse-chase approach in injury responses. Specificity of this strain to osteoblasts is also doubtful because as many as 20% of GFP+ cells are in proliferation. Specificity of bglap:GFP to mature osteoblasts is a major concern. Important caveats associated with this reporter strain are not carefully considered.

      To address these comments, we have performed several additional experiments as described below. In addition, we would like to refer the reviewer to our previous papers, where we have analysed the process of osteoblast dedifferentiation (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016). Using transgenic reporters and immunofluorescence we have shown in these previous papers that osteoblasts in the non-injured fin express Bglap but not the pre-osteoblast marker Runx2 (and are thus by our definition differentiated). We apologize if we failed to explain the logic of our approach in this manuscript, we have restructured the results to clarify these, as indicated below.

      We have also performed the following additional experiments.

      1) To confirm the specificity of the bglap:GFP line for mature osteoblasts, we have performed three experiments:

      a) immunofluorescence against Runx2 on 7 dpa regenerates, at a stage where blastema proliferation at the distal tip of the regenerate produces new osteoblast progenitors, while in more proximal (older) regions osteoblasts have already started to differentiate and new bone matrix has formed. We found that Runx2 is expressed in distal regions in pre-osteoblasts, while bglap:GFP is only expressed in proximal regions in osteoblasts which do not express Runx2. Thus, formation of new bony segment during regenerative growth, bglap:GFP is activated in mature osteoblasts and the population does not include osteoblast precursor cells. These new data are found in Figure 2 – figure supplement 2B.

      b) we have refined and expanded our methods and are now able to determine the expression patterns of markers of the osteoblast differentiation status with single cell resolution using RNAScope in situ hybridization. Using this, we can now show that at 1 day post amputation, in segment -2 of the fin stump, which represents a segment equivalent to the non-injured state, since no dedifferentiation occurs here, bglap:GFP+ cells do not express endogenous runx2a. These new data are found in Figure 1 – figure supplement 1A.

      c) Using RNAScope, we can show that cyp26b1, a gene associated with dedifferentiated osteoblasts, is likewise not detected in bglap:GFP+ cells in segment -2 at 1 dpa (new data in Figure 1 – figure supplement 1B).

      Together, these data confirm that the bglap:GFP line is specific for differentiated osteoblasts, and does not label osteoblast progenitors. See the response to issue 2 below for how we describe these new data in the revised version of the manuscript.

      2) Regarding the proliferation of bglap:GFP osteoblasts: In the experiment the reviewer refers to (now Figure 5 – figure supplement 3A), we make use of the persistence of the GFP protein in the bglap:GFP line to detect dedifferentiated osteoblasts. Thus, at the time of analysis, when these GFP+ cells proliferate, they are not differentiated anymore. We can show this as follows:

      Although bglap expression is downregulated during osteoblast dedifferentiation and thus also GFP levels eventually drop in the transgenic line, we can nevertheless use this line to trace osteoblasts, since GFP protein persists for up to three days in cells that shut down endogenous bglap and also bglap:GFP transgene transcription. While we have already shown this previously (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016), we have now also used RNAScope to confirm this. We analysed the expression of GFP on protein and RNA level in the bglap:GFP line. In bglap:GFP fish, in a mature segment in non-injured fins the regions close to the joints are devoid of cells expressing GFP (Figure 1G). Yet after amputation, we observe GFP+ cells in this distal part of segment -1 (Figure 1G, D). RNAscope in situ shows that these GFP+ cells are negative for gfp RNA (new data in Figure 1D). Thus, the observed fluorescence is due to the persistence of the GFP protein and not due to a potential upregulation of the transgene (Figure 1E).

      Importantly, we have now also added data describing the proliferative state of bglap:GFP+ osteoblasts. First, in the non-injured fin, bglap:GFP+ cells are non-proliferative (new data in Figure 5 – figure supplement 2B). After amputation, proliferation can be detected in GFP+ cells at 2 dpa (Figure 5 – figure supplement 2B), and proliferation is restricted to segment -1 and segment 0 (new data in Figure 5 – figure supplement 2C). As we show in Figure 1B, at 2 dpa, dedifferentiation as defined by bglap downregulation is not complete in segment -1, rather here a mixture of cells with different bglap levels are found. We have thus combined EdU labelling with RNAscope against bglap in segment -1 to analyse to which extent bglap and EdU anticorrelate. These data show that EdU is hardly ever incorporated into cells expressing high levels of bglap, while the majority of the proliferating osteoblasts are dedifferentiated, as they express only low levels of bglap (new data in Figure 5 – figure supplement 2D). Together, these data show that mature osteoblasts are non-proliferative, and upon amputation, when they are dedifferentiated, they become proliferative. Thus, the absence of proliferation in bglap:GFP+ cells in the non-injured fin adds to the evidence that this line is specific for mature osteoblasts, but due to the persistence of the GFP protein it can be used to analyse dedifferentiated osteoblasts.

      These data are described on page 14 of the manuscript as follows:

      “In the non-injured fin, bglap:GFP+ osteoblasts are non-proliferative, but upon amputation osteoblasts proliferate at 2 dpa (Figure 5 – figure supplement 2A, B). Proliferation is restricted to segment -1 and segment 0 (Figure 5 – figure supplement 2C), and RNAscope in situ analysis of bglap expression revealed that the majority of EdU+ osteoblasts have strongly downregulated bglap (Figure 5 – figure supplement 2D). Inhibition of C5aR1 with PMX205 had no effect on osteoblast proliferation in segment -1 at 2 dpa (Figure 5 – figure supplement 3A). Furthermore, upregulation of Runx2 was not changed by PMX205 treatment (Figure 5 – figure supplement 3B), and regenerative growth was not affected in fish treated with either W54011, PMX205 or SB290157 (Figure 5 – figure supplement Figure 3C). We conclude that the complement system specifically regulates injury-induced osteoblast migration, but not osteoblast dedifferentiation or proliferation in zebrafish.”

      3) To support our conclusion that osteoblasts migrate, we performed time-lapse imaging using a transgenic line expressing the photoconvertible protein kaede in osteoblasts (entpd5:kaede). Local photoconversion of only the proximal half of a segment allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F and they are described on page 7 of the revised manuscript as follows: To trace osteoblasts, we used the transgenic line entpd5:kaede (Geurtzen et al., 2014), in which Kaede fluorescence can be converted from green to red by UV light (Ando et al., 2002). We photoconverted osteoblasts in the proximal half of segment -1, while osteoblasts in the distal half remained green (Fig. 1F). At 1 dpa, red osteoblasts were found in the distal half (Fig. 1F), showing that photoconverted osteoblasts had relocated distally.

      2) The authors poorly define dedifferentiation. They use reduced bglap:GFP or bglap mRNA expression as a sole criterion for dedifferentiation. The authors state that NF-kB and retinoic acid can inhibit osteoblast dedifferentiation. However, this simply reflects of the well-described fact that these signals promote osteoblast differentiation.

      We define dedifferentiation as the reversion of a mature cell into an undifferentiated progenitor-like status. This involves the following characteristics: 1) the expression of markers of the differentiated state are downregulated; 2) early lineage markers are re-expressed; 3) the cells become proliferative; and 4) they have the ability to re-differentiate into mature cells. Based in this definition, the downregulation of an osteoblast-specific marker can be used as a read-out for osteoblast dedifferentiation. Bglap is an established marker for mature osteoblasts (Kaneto et al., 2016 doi.org/10.1186/s12881-016-0301-7¸ Yoshioka et al., 2021 doi: 10.1002/jbm4.10496; Kannan et al., 2020 doi: 10.1242/bio.053280; Sojan et al., 2022 doi.org/10.3389/fnut.2022.868805; Valenti et al., 2020 doi.org/10.3390/cells9081911). While we use downregulation of bglap expression as our main read-out for osteoblast dedifferentiation in our experimental interventions (actomyosin inhibition, retinoic acid treatment, complement inhibition), we have expanded our methods to characterize osteoblast dedifferentiation, and have re-arranged our manuscript to show these data in the beginning of the results.

      Already in the previous version of the manuscript we have shown that endogenous bglap is strongly expressed in segment -2, (the segment that does not respond to fin amputation and thus represents the non-injured state), while it is downregulated in a graded manner in segment -1 and segment 0 (the segments where dedifferentiation happens). We have now moved this data to the re-designed Figure 1B. In addition to bglap, we can now show that entpd5, a gene required for bone mineralization, is strongly expressed in osteoblasts of segment -2, while it is massively downregulated in segment -1 and segment 0. These new data can be found in Figure 1C. Thus, entpd5 is another differentiation marker whose loss characterizes osteoblast dedifferentiation. Importantly, we can confirm by RNAScope that the pre-osteoblast marker runx2a is absent in mature segments but is upregulated in segment 0 and segment -1 at 1 dpa (new data in Figure 1 – figure supplement 1A). Similarly, cyp26b1, an enzyme shown to regulate dedifferentiation, is upregulated in segment 0 and segment -1, but not expressed in segment -2. (new data in Figure 1 – figure supplement 1B). Furthermore, we have repeated all experiments where we have previously quantified dedifferentiation upon experimental interventions using downregulation of bglap:GFP (actomyosin inhibition, retinoic acid treatment, complement inhibition). We now can fully confirm the previous conclusions using the more rigorous quantification of dedifferentiation using RNAScope analysis of endogenous bglap levels. We have replaced all bglap:GFP data with the new bglap RNAScope data. These new data are found in Figure 3F, Figure 3 – figure supplement 1A, Figure 4B and Figure 5F.

      Overall, we support our conclusion that osteoblasts dedifferentiate by the loss of the two differentiation markers bglap and entpd5, the upregulation of the pre-osteoblast marker runx2a and the dedifferentiation-associated gene cyp26b1, and the fact that osteoblasts become proliferative. We hope that the reviewer considers this sufficient evidence.

      In mammals, the available literature relatively convincingly concludes that NF-kB signaling negatively regulates osteoblast differentiation (Yao et al., 2014, doi: 10.1002/jbmr.2108; Swarnkar et al., 2014 doi.org/10.1371/journal.pone.0091421, Chang et al., 2009, doi.org/10.1038/nm.1954). Yet in zebrafish osteoblasts, we have previously shown that NF-kB signaling is active in mature osteoblasts and needs to be downregulated for dedifferentiation to occur (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Importantly, in our previous work we showed that at least during fin regeneration, NF-kB signalling is not involved in osteoblast differentiation (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Specifically, osteoblasts in which Nf-kappaB signaling is enhanced or inhibited differentiate completely normally during the later stages of fin regeneration in the fin regenerate. Hence, our findings with the Nf-kappaB intervention studies done in this manuscript, where we look at osteoblasts in the stump within 1 dpa, cannot be explained by them affecting osteoblast differentiation.

      For retinoic acid signalling, multiple roles in bone development and repair have been described in mammals. For zebrafish osteoblasts, it was shown that during the outgrowth phase of bone regeneration, retinoic acid negatively regulates osteoblast differentiation in the blastema (Blum & Begemann, 2015, 10.1242/dev.120204). Yet importantly, it also negatively controls the dedifferentiation of osteoblasts in the stump right after amputation (Blum & Begemann, 2015, 10.1242/dev.120204). Thus, the effect we observe at the early timepoints we analyse in our intervention studies (retinoic acid treatment) are due to the effect on osteoblast dedifferentiation.

      We have added a short definition of dedifferentiation to the results section (page 6). There it reads as follows:

      “We have previously shown that osteoblasts dedifferentiate in response to fin amputation, that is they revert from a mature, non-proliferative state into an undifferentiated progenitor-like state, which includes loss of bglap expression and upregulation of the pre-osteoblast marker runx2 (Knopf et al., 2011; Geurtzen et al., 2014).”

      In addition, we have restructured the results to describe our use of tools and the new data on page 6 of the revised manuscript as follows:

      Using RNAScope in situ hybridization, we can now show that downregulation of bglap occurs in a graded manner and that entpd5 expression is similarly downregulated during dedifferentiation (Figure 1B, C). At 1 day post amputation (1 dpa), expression of entpd5 and bglap remains high in segment -2, but gradually decreases towards the amputation plane and is almost entirely absent from segment 0, with entpd5 downregulation being more pronounced (Figure 1B, C). While RNA expression of these genes is downregulated within hours after injury, GFP or Kaede fluorescent proteins (FPs) expressed in bglap or entpd5 reporter transgenic lines persist for up to three days, even though transgene transcription is shut down rapidly as well (Knopf et al., 2011). We can confirm these earlier findings using the more sensitive RNAScope in situs. In bglap:GFP transgenics at 2 dpa, gfp RNA and GFP protein colocalized to the same cells in segment -2, where osteoblasts do not dedifferentiate (Fig. 1D). In contrast, in the distal segment -1 GFP protein was present, but barely any gfp transcript could be detected (Fig. 1D). Thus, persistence of FPs in reporter lines can be used for short-term tracing of dedifferentiated osteoblasts (Fig. 1E). At 1 dpa, bglap:GFP+ cells upregulated expression of the pre-osteoblast marker runx2a and of cyp26b1, an enzyme involved in retinoic acid signalling (Blum and Begemann, 2015), which regulates dedifferentiation (Figure 1 – figure supplement 1A, B). Both markers were exclusively upregulated in segment -1 and segment 0 at 1 dpa, but were absent in segment -2. Together, these data show that osteoblasts in segment -1 and segment 0 lose expression of mature markers and gain expression of dedifferentiation markers.

      3) The authors do not rigorously demonstrate that mature osteoblasts indeed migrate. What they showed in this study is simply cell shape changes.

      We have the following evidence for osteoblast migration:

      1) bglap:GFP+ cells relocate from the centre of segments towards the amputation plane (after fin amputations) or towards both injuries in the hemiray model. In this revised manuscript we show that transgene expression is not upregulated in these regions, but that GFP fluorescence there must be due to relocation of cells in which GFP protein persists (new data in Figure 1D, E; see also response to “Weaknesses, issue 1” above)

      2) Using the entpd5:kaede transgenic line, which is expressed in mature osteoblasts throughout segments, we have photoconverted only the proximal half of a segment, which allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F.

      3) Already in the previous version of the manuscript, we have performed live imaging to track single cell behaviour. Using double transgenic fish expressing both GFP and kaede in osteoblasts, we deliberately only partly converted kaedeGreen to kaedeRed, which resulted in different hues for each osteoblast. This distinct colouring facilitates observing single cells. Video 1 shows the directed movement of cell bodies relative to their surroundings within 2 hours (see also Figure 2 – figure supplement 1A).

      4) Osteoblasts display the typical cell shape changes associated with active migration (elongation along the axis of migration, extension of dynamic protrusions), data in Figure 2.

      Together, we think these are convincing data supporting the conclusion that osteoblasts actively migrate.

      4) The hemiray removal model is highly innovative, but this part of the study is not very well connected to the rest of the study.

      We have rephrased the first sentence of the hemiray paragraph to make the connection more perceptible. It now reads as follows:

      In response to fin amputation, all osteoblast injury responses occur directed towards the amputation plane, that is dedifferentiation is more pronounced distally, osteoblasts migrate distal wards and the proliferative pre-osteoblast population forms distally of the amputation plane. We wondered how osteoblasts respond to injuries that occur proximal to their location. To test this, we established a fin ray injury model featuring internal bone defects.

    1. We are not sorry for him—we learn that, not to be sorry for the dead. But for ourselves? This terror is always so fresh, so unexampled.

      This is quite a bold statement, especially for an opening paragraph. It makes the reader stop and think, potentially reflecting on their own life. It also allows us to connect with the narrator as they think about the terror they may have experienced in their own lives.

    1. Author Response

      Reviewer #1 (Public Review):

      Using Tet-off system, Kir2.1 was expressed (or not) during the key time of callosal development from E15 to P15. Restoring activity either by adding Dox during a critical period from P6 to P15 or using DREADDs from P10-14 could rescue the callosal projection to the cortex, whereas later restoration of activity (with Dox) was not successful. Did this successful rescue lead to normal activity? Calcium imaging in animals with Kir2.1 had low levels of any kind of activity, both highly correlated and low correlation, but P6-13 dox treatment partially restored only low-correlation activity and not high correlation activity at P13. The effects of DREADDs on activity was not similarly measured though it was effective for at least partially restoring the callosal projection.

      Overall this study builds on earlier findings regarding the importance of neuronal activity in the formation of a normal callosal projection, using in utero electroporation which is particularly well suited for this subject. It makes the case very compellingly that near-normal callosal connectivity can be produced if activity is permitted during a critical period window from P6 or P10 to P15, though the exact timing of this window is imprecise because the elimination of Kir expression was not systematically quantified. For transmembrane proteins like channels it can often take many days for protein expression to completely abate.

      We thank the reviewer for their positive evaluation and the constructive comments. Based on the comment on Kir expression, we conducted new experiments using pTRE-Tight2Kir2.1EGFP, with which EGFP signals reflect localization of over-expressed Kir2.1, and examined when the expression of Kir2.1EGFP went down after Dox treatment at P6. At P6 (before Dox treatment), the signals of Kir2.1EGFP (stained with anti-GFP antibody) were observed in the periphery of the soma and along dendrites, implying that Kir2.1EGFP was transported to the cellular membrane. At P10 and P15 (4 days and 9 days after Dox treatment), Kir2.1EGFP signals were not observed in the periphery of the soma and along dendrites. We noted that low-level green signals were observed in the central part of the cell body. These may stem from low-level expression of Kir2.1EGFP in nuclei or cytosol even after Dox treatment. Alternatively, and more likely, these may reflect bleed-through of RFP signals into GFP channel. Overall, we confirmed that Kir2.1 proteins that were localized to the cellular membrane were largely down-regulated. We described these observations in detail in the figure legend of Figure 1-figure supplement 3, and added the result as Figure 1-figure supplement 3.

      I found the quantification of the callosal projection to be rather minimal and the normalization approach not entirely transparent. For example does activity from P10-15 restore the full normal PATTERN of callosal connectivity or merely the density of input overall?

      We thank the reviewer for this comment. Based on the comment, we added analyses of the pattern of callosal projections; the width of callosal axon innervation zone in layers 2/3 and 5, and densitometric line scans across all cortical layers. Our original quantification showed that the density of callosal axons reaching their target layer (i.e. cortical layer 2/3) is almost recovered in P6-P15 DOX condition (Fig1B-D), but new analyses suggest some aspects of callosal axon projections (the width of the innervation zone in layer 2/3 and 5 (Figure 1-figure supplement 4A,B), and lamina specific innervation pattern (Figure 1-figure supplement 4C)) might be only partially recovered. We have added these new results as Figure 1-figure supplement 4. In future study, we would like to assess the effect of the manipulations at finer resolution by 3D morphological reconstruction of axons of individual neurons.

      Also in the discussion it would be nice to more clearly establish whether activity is thought to be maintaining a projection already formed by P10 or permitting the emergence of such a pattern.

      Thank you for the suggestion. We have added thorough discussions about this point as follows. Page 7, lines 198-208:

      “In the previous study, we showed that callosal axons could reach the innervation area almost normally under activity-reduction, and that the effects of activity-reduction became apparent afterwards (Mizuno et al., 2007). Callosal axons elaborate their branches extensively in P10P15 (Mizuno et al., 2010), and axon branching is regulated by neuronal activity (Matsumoto and Yamamoto, 2016). It is likely that activity is required for the processes of formation, rather than the maintenance of the connections already formed by P10, but the current study employed massive labeling of callosal axons which is not suited to clarify this. In addition, the restoration of activity in the Tet-off (Figure 1) or DREADD (Figure 2) experiment may not completely rescue the ramification pattern of individual axons. Single axon tracing experiments (Mizuno et al., 2010; Dhande et al., 2011) would be required to clarify this. Nonetheless, our findings suggest that callosal axons retain the ability, or are permitted, to grow and make region- and lamina-specific projections in the cortex during a limited period of postnatal cortical development under an activity-dependent mechanism.”

      The calcium imaging is a valuable validation of the Kir expression approach, but it the study here appears to overinterpret what may simply be an intermediate level of activity restoration rather than a specific restoration of L events, as it seems that L events would be the most likely to occur under conditions of reduced overall activity. One possibility is that the absence of H events at P13 in the calcium is due to residual Kir expression creating a drag on high level network activation rather than any more complicated change in patterned spontaneous activity/connectivity. The conclusions from this study regarding the permissive role of activity during a critical window and the lack of a requirement for highly correlated activity are valuable, even if somewhat imprecise on both counts. The authors should probably refrain from use of the term patterned activity given that this was measured but not systematically compared to unpatterned spontaneous activity.

      We thank the reviewer for this constructive comment. Based on this comment, we removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. For example, in the Discussion, we revised as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

      Reviewer #2 (Public Review):

      Tezuka et al. use in vivo manipulations of spontaneous activity to identify the activitydependent mechanisms of callosal projection development. Previous research of the authors' and other labs had shown that overexpressing the potassium channel Kir2.1, which reduces activity levels in the developing cortical network, blocks the formation of callosal connections almost entirely.

      The current manuscript corroborates and extends these previous discoveries by:

      1) Demonstrating that the effect of Kir overexpression can be rescued by pharmacogenetic network activation using DREADDs.<br /> 2) Revealing the requirement of network activity for the development of callosal projections during a particular developmental time window and by<br /> 3) Directly relating perturbed callosal development to the actual changes in activity patterns caused by the experimental manipulations.

      Thus, this paper is important for our understanding of the role of neuronal activity in the development of long-range connections in the brain. In addition it provides strong evidence for a role of specific activity patterns in this process.

      In general, the approach is very straightforward and the results clearly interpreted. Nevertheless, there are a few points to consider.

      We thank the reviewer for these positive and supportive comments.

      1) It is not clear in which cortical area(s) the in vivo 2-photon recordings were performed and in how far cortical areas that actually receive/send callosal projections were included or not in the analysis.

      In response to this comment, we revised the text in the method section as follows.

      “We aimed to record spontaneous neuronal activity in putative binocular zones in V1 (2.5 mm lateral of midline and 1 mm anterior of the posterior suture). Since the boundaries between V1 and higher visual areas, AL/LM are not as obvious as those in adult, our recordings likely contained juxtaposed lateral monocular V1 and AL/LM as well.”

      Based on our colleaguesʼ unpublished observations, V1 and AL/LM can be distinguished solely by spontaneous activity patterns even before eye-opening. They also found frequencies of spontaneous activity are similar across mono/binocular regions of V1 and AL/LM (Murakami, Ohki, et al. unpublished). Thus, our results should hold even with the variability in recording sites.

      2) It is not discussed what the duration of the CNO effect is. Do daily injections rescue activity patterns for 24 hours or a significant proportion of this period?

      In response to this critical comment, we revised the text in the method section as follows.

      “A previous study showed that an intraperitoneally injected CNO was effective (in terms of increasing activity) for about 9hrs (Alexander et al., 2009). The “partial rescue” effect we observed (Figure 2) may suggest that activity was not fully restored during 24hrs by our daily CNO injections.”

      Reviewer #3 (Public Review):

      The manuscript by Tezuka adds to an emerging story about the role of activity in the formation of callosal connections across the brain. Here, the authors show that they can use a TET system to switch off the activity of an exogenous potassium channel, in order to probe when activity might be necessary or sufficient for the formation of callosal connections. The authors find that artificial restoration of activity with DREADS is sufficient to rescue the formation of callosal connections, and that there is a critical period (somewhere between P5-P15) where activity must occur in order for the connections to form within the cortex. Finally, the authors show that when the potassium channel is removed during the critical period, the cortex exhibits activity, but few highly synchronous events. These results indicate that it is activity in general and not specifically highly synchronous activity that is necessary for the final innervation of the callosal cortex.

      In general, the study is well done, and the writeup is polished, well summarized. The figures are solid. There are only a few criticisms/suggestions.

      We thank the reviewer for the positive evaluation.

      Major issue: Have the authors demonstrated a requirement for "patterned spontaneous activity"?

      The authors claim variously in the abstract ("a distinct pattern of spontaneous activity") and in the results (pg 6, "our observations indicate that patterned spontaneous activity") and discussion (pg 6, "we demonstrated that patterned spontaneous activity") that it is "patterned" spontaneous activity that is key for the formation of callosal connections. However, when I was reading the paper, I came to the opposite conclusion: that any sufficiently high spontaneous activity is sufficient for the formation of these connections.

      The authors showed that relieving the KIR expression from P5-15 allows the connections to form; however, in Figure 4, the authors show that the nature of the activity produced in the cortex (in terms of mixtures of H and L events) is very different. Nevertheless, the connections can form. Further, the authors showed that increasing activity when KIR is expressed using DREADS restores the connections. The pattern of activity produced by this DREADS + KIR expression is likely to be very different from the pattern of activity of a typically-developing animal. In total, I thought that the authors demonstrated, quite nicely, that it is just the presence of sufficient activity that is key to the innervation of the contralateral cortex. (It's not cell autonomous, as the authors showed before; there seems to be a "sufficient activity" requirement).

      Therefore, I think the authors should remove references to the requirement of patterned activity and instead say something about sufficiently high activity (or some characterization that the authors choose). I think they've shown quite nicely that a specific pattern of the spontaneous activity is not important.

      We thank the reviewer for this very important insight and interpretation. After considering all the currently presented data again, we have come to agree with the interpretation stated by the reviewer. We removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. Nevertheless, we would not completely discard the possibility that specific patterns of spontaneous activity, such as L-events, could potentially have some active contribution to the development of projection circuits, and would like to further address this in future study.

      For example, in the Discussion, we revised the text as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Sasaki et al titled "Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes" employed conditional GWAS to identify trans-regulators of mCHG levels in Arabidopsis natural accessions, after controlling for mCHH. Using loss of function mutants for couple of these genes, the authors also tested their effects on mCHG levels.

      Overall, this manuscript makes a nice contribution. I suggest the following improvements to enhance the quality of this manuscript.

      Comments:

      1. MSI1 has been shown to be copurified with TCX5, a component of DREAM Complex. The DREAM complex transcriptional regulates CMT3, MET1, DDM1 in a cell cycle dependent manner (ref: Yong-Qiang Ning, 2020 nature plants). Tcx5/6 double mutants have ectopic gain of TE and genic mCHG. It would be nice to refer this paper and add to the MSI1 part accordingly. Absolutely: thanks for suggesting this!

      Multifaceted regulation of mCHG levels seems to be evident from this and previous studies. Why would such complex pathways evolv to regulate mCHG? Bewick et al 2016 and Wendte et al 2019 showed lack of CMT3 or ectopic expression of CMT3 can influence CG gene body methylation (gbM). One possibility is that these five factors regulate CHG to maintain it at a level that is just enough to target TE. Irrespective of the functional relevance of gbM, differences in the levels of these five factors might result in erroneous gbM. It would be interesting to look for the rates of gbM and number of gbM genes in the natural accession carrying 1 to 4 number of mCHG-decreasing alleles. Also, in the one line from Iberian peninsula carrying polymorphisms in all five genes.

      Yes, the connection between CHG and gbM is very interesting and deserves more attention. We looked for the effect of cumulative mCHG-decreasing alleles on gbM, but there was no association with gbM — but this is really not expected given the stable epigenetic inheritance of gbM. The Iberian peninsula line carrying all decreasing alleles did slightly lower gbM levels, but it is impossible to exclude the effects of population structure. Since we have nothing to add beyond speculation, we prefer not to go into this topic.

      The authors mentioned a significant peak for mCHG|mCHH on RdDM-targeted transposons was located 196 bp downstream of MIR823a and not on mature miRNA. Therefore, this cannot directly impair miR823 base pairing with CMT3 mRNA transcripts and its cleavage. Moreover, natural accessions carrying alternative MIRNA823 allele show reduced CMT3 and mCHG levels, meaning more miR823 levels? Does this 196 downstream region contain any regulatory feature that effects miR823 transcription? Or this region still falls in the primary miRNA hairpin region? A single nucleotide change in pri-miRNA can have a significant impact on its secondary structure that can impede DICER processivity and effectively levels of mature miR823 molecules? It will be beyond the scope of this paper to pin down the exact mechanism. But a simple stem loop RT-PCR for miR823 levels in reference and alternative accessions would be informative (on accessions that grow at the same speed). Perhaps, the authors can at least model SNP induced pri-miRNA secondary structure variations using Vienna RNAFold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) and present MEF values (maximum free energy) for representative accessions.

      Stem-loop qRT-PCR for MIR823a expression would indeed be helpful to confirm allelic effects. However, comparing lines with wildly different genetic backgrounds is fraught with difficulty due to trans-effects. Furthermore, MIR823a is expressed specifically during embryogenesis, and the expression quickly decreases after the early heart stage (Papareddy et al., 2021). Thus, we would need to extract microRNA from embryos at exactly the same developmental stage, from lines that may develop at different speeds.. Most likely, time-series data would be required, and generating such data is a massive undertaking. As noted in the paper, we did measure MIR823a expression by stem-loop qRT-PCR for several lines carrying reference and alternative alleles but the results were inconclusive. A proper study of this is beyond the scope of this paper.

      Testing predicted effects on RNA secondary structure, on the other hand, is eminently feasible. As suggested, we used Vienna RNAFold for the region, including the GWAS peak. Since the SNP is linked to a 35 bp deletion (shown in S4A), it is closer to the MIR823A coding region than 196 bp. However, the results indicate that the SNP (Chr3:4496626) is not within the stem-loop. It remains possible that this SNP tags multiple SNPs in the annotated stem regions. This is now mentioned.

      Figure 1A can be made more reader friendly. Perhaps this can be broken down into correlation plots for individual conditions or tissue types. In addition, it might be good to add individual r-square values for each of them instead of compound r-square.

      We respectfully disagree, since the main point of the figure is the overall correlation and heterogeneity, rather than the correlation within sub-sets. Instead of splitting the plot, we changed color contrasts to make it easier to read.

      Page 3, Paragraph 1 from line 3 to end of paragraph. The authors wrote "Much of this variation is due to differences in the environment (including tissue, which can be viewed as a cellular environment)". A possible explanation is these two tissues have different mitotic indices (fraction of cells diving and non-diving; flowers have more dividing cell, leaves have more non dividing and endoreduplicated cells) that explains non-CG variation. I would suggest authors to change the text to this and refer to Filipe Borges et al 2021 Current biology paper.

      This is certainly a possibility, although higher mCHG levels in flower buds presumably also reflect higher CMT3 expression during embryogenesis (Feng et al. 2020; Gutzat et al. 2020; Papareddy et al. 2021). We now mention both explanations and cite Borges et al. (2021).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      There are three points here. First, we disagree that the GWAS results are confirmatory. Sure, only one of our associations is connected with a novel gene, but the fact that the four other genes apparently harbor major polymorphism is a new finding that contributes to our understanding of the function of this trait (and, possibly, these genes). Second, while it is possible that we emphasize statistical methodology too much, we do this for clarity, not to claim that what we are doing is novel. Third, we are similarly not interested in defining what is polygenic and what isn’t, but rather put the results in the context of other studies. We have changed the writing in various places to make it clearer (and hopefully less distracting/pedantic).

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      We agree, and have even written papers on this very subject. We were surprised by this comment as we felt we had included lengthy sections (see also comment above) about methodology, emphasizing that multi-trait analysis is a good idea in principle. One of our purposes here is to provide a beautiful example demonstrating this. We have tried to make these points clearer.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      Again we agree, and fail to see why the reviewer thinks we do not. Nowhere do we claim that the overall covariance has a simple basis, and we explicitly state that it is the conditional mCHG variation that has an oligogenic basis. We did write that “univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic”, which was imprecise, and arguably erroneous. The word “erroneously” has been removed in the revision.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      The phrase “seem to realize” is unwarranted and unnecessary sarcasm. Given that we cite the two century-old papers that first demonstrated that it was possible to decompose complex traits into Mendelian ones, it should be obvious that we understand what we have done. That our writing could have been better is another matter. As noted above, the word “erroneously” has been dropped, and we have also changed the second sentence to make it obvious that this is obvious. We suspect that whether one finds this part of the Discussion “distracting” or not depends on training and background — our objective was to explain our results to readers who (unlike us and the reviewer) are not well-versed in quantitative genetics.

      Specific comments

      1. A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.

      The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.

      Comments 1 and 9 largely overlap, and so we moved 9 here for clarity and respond to both at the same time. We agree that the enrichment analysis should be explained in this article as well, so as to save the reader from finding the supplement to an old paper. A new section has been added to Methods. In this section, we also try to preempt some of the misunderstandings in the reviewer's comments.

      First, our approach is indeed generally applicable. Whether it is useful depends on what you want to do, and yes, the utility will depend on the quality of the independent data, but note that the a priori gene set does not have to be genes: you could use this approach to compare coding vs non-coding regions of the genome, for example.

      Second, we are not trying to “sell” our approach (or anything else for that matter).

      Third, the approach does not label GWAS hits that are not within the a priori set as false discoveries: it says nothing about these hits.

      Fourth, we are not sure what is meant by a ‘“natural” stopping point for going below GWAS thresholds’, but our approach does provide a simple way to explore how FDR (in the a priori set!) depends on the threshold used.

      Fifth, the proposed alternative of “targeted GWAS” (non-genomewide association, as it were) is not equivalent, because our approach was not designed to increase power by alleviating the multiple testing burden, but rather to rigorously demonstrate that there is a signal in the data when faced with uncalibrated p-values. That it can also be used to explore sub-significant associations is a nice side-effect that we exploit here.

      Sixth, we do not assume that all methylation genes are known, nor is our goal to find them all.

      With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.

      While this is a sensible suggestion, the focus of this paper is on mCHG, and refining the mCHH measurement would essentially amount to re-doing all analyses.

      I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?

      Yes, this was in the paper, but we only mention it in the Discussion (and Fig S13) as the results were only of methodological interest (as expected, they were very similar).

      The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".

      Done.

      The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?

      The sentence has been changed to make this clearer.

      A few lines below, they write "...huge". Please rephrase.

      Done.

      The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.

      We are not convinced that the double- or triple-mutant show non-additivity. Adding up effects in Figure 1 works pretty well. As for our GWAS results, it is clear that small effects (like the ones in our GWAS) will always tend to look additive for simple mathematical reasons. This does not mean that no interactions exist, and we emphasize this in the paper. We also have an example of non-linearity when it comes to TE activity. This is now also emphasized.

      The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.

      The sentence following the one quoted is “In essence, we sought to simplify a complex trait by breaking it into constituent parts”, which is very much part of the motivation. As the reviewer noted above, it is not surprising that a conditional analysis turns out to be more powerful. The comment may have arisen from the statement “This insight is the basis for this paper”, which is misleading — there is no insight here, just a very obvious hypothesis, which turned to be correct. We have changed the writing to make this clearer.

      The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?

      fas2 induces mCHG hypermethylation in CMT2-targeted TEs, presumably via a complex that also involves MSI1. It is marked in Fig. 1B. We have rephrased the sentence to make this clearer.

      The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).

      We actually generated CRISPR/CAS9 mutants only for MIR823A (Table S5). For JMJ26, a t-DNA insertion line was available, and results based on this and rescue lines provided sufficient results. To clarify this, we corrected the subsection titles.

      In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Indeed: it is meant ironically. It is obvious, yet people do it.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      Specific comments

      • A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.
      • With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.
      • I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?
      • The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".
      • The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?
      • A few lines below, they write "...huge". Please rephrase.
      • The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.
      • The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.
      • The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.
      • The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?
      • The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).
      • In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Significance

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs.

  4. May 2022
    1. scanned for solutions to long-standing problems in his reading,conversations, and everyday life. When he found one, he couldmake a connection that looked to others like a flash of unparalleledbrilliance

      Feynman’s approach encouraged him to follow his interests wherever they might lead. He posed questions and constantly

      Creating strong and clever connections between disparate areas of knowledge can appear to others to be a flash of genius, in part because they didn't have the prior knowledges nor did they put in the work of collecting, remembering, or juxtaposition.

      This method may be one of the primary (only) underpinnings supporting the lone genius myth. This is particularly the case when the underlying ideas were not ones fully developed by the originator. As an example if Einstein had fully developed the ideas of space and time by himself and then put the two together as spacetime, then he's independently built two separate layers, but in reality, he's cleverly juxtaposed two broadly pre-existing ideas and combined them in an intriguing new framing to come up with something new. Because he did this a few times over his life, he's viewed as an even bigger genius, but when we think about what he's done and how, is it really genius or simply an underlying method that may have shaken out anyway by means of statistical thermodynamics of people thinking, reading, communicating, and writing?

      Are there other techniques that also masquerade as genius like this, or is this one of the few/only?

      Link this to Feynman's mention that his writing is the actual thinking that appears on the pages of his notes. "It's the actual thinking."

    2. You may find this book in the “self-improvement” category, but in adeeper sense it is the opposite of self-improvement. It is aboutoptimizing a system outside yourself, a system not subject to you

      imitations and constraints, leaving you happily unoptimized and free to roam, to wonder, to wander toward whatever makes you feel alive here and now in each moment.

      Some may categorize handbooks on note taking within the productivity space as "self-help" or "self-improvement", but still view it as something that happens outside of ones' self. Doesn't improving one's environment as a means of improving things for oneself count as self-improvement?

      Marie Kondo's minimalism techniques are all external to the body, but are wholly geared towards creating internal happiness.

      Because your external circumstances are important to your internal mental state, external environment and decoration can be considered self-improvement.


      Could note taking be considered exbodied cognition? Vannevar Bush framed the Memex as a means of showing associative trails. (Let's be honest, As We May Think used the word trail far too much.)

      How does this relate to orality vs. literacy?

      Orality requires the immediate mental work for storage while literacy removes some of the work by making the effort external and potentially giving it additional longevity.

    1. Joint Public Review:

      The present manuscript compares the connectomes of a large range of mammal species using diffusion MRI data. The manuscript reports two main findings: (1) connectomes of more related species are generally more similar, as assessed using Laplacian eigenspectra, than of unrelated species; (2) differences between species' connectomes are generally driven by local regional connectivity profiles, whereas global features are generally preserved.

      The first finding is comforting, but in a way not extremely surprising. It would be extremely surprising if more related species do not show more similarity in their connectome. Indeed, this is the reason many phylogenetic analyses use statistical techniques that take the relatedness of species explicitly into account. I find the statement that connectome organization recapitulates traditional taxonomies a bit over the top, as this suggests that a phylogenetic tree constructed based on connectomes would be similar to a tree based on other measures, such as morphology or genetics. This will probably be the case, but is not what the authors have tested here.

      The second result is in my opinion the key result of the paper. The main novelty of the paper is that -finally, for the field-bridges approaches taken by some researchers in searching for differences across species (these are usually researchers interested in anatomy) and researchers searching for conserved principles across species (usually researchers approaching connectivity from a network or graph theory perspective). By showing what aspects of a connectome are generally conserved and which are changed, this paper starts unifying the two views and this is an important contribution.

      It would, however, have been nice if the authors had explored this notion a bit further. Now, they just state that taking certain features into account means the connectomes look more different, but they do not zoom into the specific brains to see what this means at a biological level. Some of the authors have published, for instance, on the unique connectivity profiles of parts of the human brain and it would have been nice to show that these fall under the local regional connectivity profile aspects of the connectomes. This is a missed opportunity to even further unify the different research traditions.

      The manuscript suggests that white matter connectivity in mammals is more similar between species within one taxonomic group than across different groups, proposing that the brain's connectome reflects phylogenetic relationships. The manuscript further details which features of the network organisation are associated with larger differences across groups and hence may drive speciation; and which features seem to be a common principle across mammals.

      The authors present evidence based on the analysis of diffusion-weighted brain imaging data across 124 species, 111 of which were included in the comparison. The dataset is a great resource to address their research question.

      The paper is clear and the evidence compelling. The manuscript adds valuable insights into the connectome architecture across species, potentially opening a new perspective on the link between genetics and behaviour. I would like to point out the great open science practice of the authors - code is available with a great ReadMe to guide potential users, connectivity matrices are available, and all software packages used in the analyses have been cited.

      The figures are clear and complement the manuscript.

      Technical Comments:

      - Spectral approach / Interpretation<br /> It would be good to have more insight into the meaning of the spectral distance results. My understanding is this: the eigenvalues of the normalised Laplacian obviously have a mean of 1 (because their sum equals the trace of the Laplacian, which is equal to N [number of nodes]). Therefore, the distances between the spectra essentially amounts to comparing higher moments, and in particular the variance (as the histograms look quite Gaussian, I am guessing the distances are dominated by differences in the variance). But what does it mean that bats have a higher variance in these Eigenvalues than primates? I know that the authors try to give *some* insight, e.g. that when the distribution is peaky around 1, it means there are more stereotypical local patterns of connectivity. I understand that. But what are these patterns?

      - Effect Size / Null Distribution<br /> I like the idea and the ambition of this paper. My main concern is that the differences are very small. Pretty much all the measures (laplacian eigenspectra and network-theoretic measures) are very similar between animals. This can be interpreted in two ways. (1) it may mean that the brain organisation is preserved, which is the interpretation of the authors. But it could also mean that (2) the metrics are not very informative. How do we know if we are in situation (1) or (2)? There is no comparison to a good null model (except in Fig4 but I don't think a random network is a good null). One possible null is two random networks connected to each other with a few random connections (to mimic left-right brains)?

      * The authors use cosine similarity to compare the eigenspectra distributions. I think this does them a disservice. cosine similarity normalises the distributions quadratically instead of linearly. But the main thing that is changing is the variance. So normalising quadratically diminishes the dissimilarities between distributions. I have looked at their data (thanks for sharing!) and using multidimensional scaling with Euclidean looks much better than with cosine distance. I would suggest using euclidean.

      * The authors use a bootstrapping method to calculate an average distance which they claim is useful because they don't have the same number of animals in each category. I don't think this bootstrapping is useful at all. If anything, it just adds noise. Averaging 10,000 samples with replacement does not change the outcome compared to simply averaging the matrices without the sampling. To test this: vary n and it should converge to the average of the original non-sampled data. (I've tried it!)

      * The authors should clarify whether they are using the weighted or binarised connectivity matrices in the spectral approach (and also what threshold). I suspect that they are using binarised matrices, which probably explains why the spectral results fit better with the graph topology results when the latter uses binarised matrices.

      - Parcellation.<br /> One main issue is the way in which the connectomes are divided up into 200 regions each, independent of the brain size. This to me seems a confound. I know it's rather standard practise in the field, but I have yet to see a validation that this does not influence the results. Given the enormity of the dataset here I would ask the authors to run their analyses in a way that the number of regions is a function of the size of the brain-this is a much more realistic assumption, as we know that a shrew size brain has about 20 cortical areas, whereas the human has about 180 according to Glasser et al.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:

      It turns out the stable estimates just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big differences in the results from death time series and reported cases time series, which one should I trust?

      We think it is a strength to compute different Rt values based on different data, as this allows researchers, policy makers and the public alike to compare the information from different observation types directly. Any discrepancy between two Re trajectories (e.g. between the Re based on cases, Rcc(t), and that based on hospitalisations Rh(t)) is an indication to investigate which external variables (e.g. testing strategy) have changed. We have found it a great advantage when communicating and sharing our results outside of academia that we could point to these separately obtained Re estimates: if the estimates all agreed, more confidence could be given to them.

      If one would want to estimate a single estimate, this would require adopting a fundamentally different framework to estimate Re, which exceeds the scope of this work. One could use heuristics (weights representing the trustworthiness of a given source at a given time) to combine the various Re estimates into a single ensemble estimate. Alternatively, one could model the full underlying population dynamics (e.g. with a compartmental model including hospitalization and death) and adopt a fully Bayesian approach to fitting such a model. However, both options require heuristics or priors that will vary substantially through time and per country (as discussed in the Supplementary Discussion), and thus limit how widely the pipeline can be applied.

      We have revised the manuscript to make it more clear (early on) that we estimate multiple Re values from separate types of data (see also the response to reviewer 3, item #5). In addition, we now discuss more explicitly what the advantages and disadvantages are of showing these estimates separately (lines 281-290).

      2) Adequate representation of uncertainty:

      This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapidly changing. However, I have concern on the methods for simulation (details below).

      Indeed, the difference in coverage between our method and EpiEstim is due to observation noise. We agree the CI from EpiEstim should be correct assuming that the infection incidence time series can be observed perfectly. However, in reality quite a bit of variability is introduced between infection and case observation: not only due to the delay from infection to observation, but also due to e.g. reduced testing capacity on weekends or reporting errors. To accurately assess the coverage of our method (and whether the CIs are too narrow or too wide) we need to include realistic amounts of observation noise in the simulations. This is why we add autocorrelated noise to our simulated observations, where this noise mimics observed residuals in Switzerland and other countries (Figs. S3, S4, S15, S17).

      We have now added explicit comparison to the EpiEstim confidence intervals to supplementary Fig. S4. In addition, we extended the corresponding method section to describe more extensively why and how we added observation noise to our simulations (lines 498-518; see also the detailed response to comment 4 below).

      3) Real-time of the Rt

      There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      As suggested, we included an additional simulation study to investigate the accuracy and stability of the last possible Re estimate. We present this analysis in a new results paragraph (subsection "Stability of Re estimates in an outbreak monitoring context"; line 121) and Figure S10. Using this analysis, we highlight the trade-off that exists between the timeliness of the Re estimates and their stability.

      4) simulation methods to estimate Rt

      Both 2) and 3) need simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      We believe there may have been some confusion about how our simulation set-up works, and we provided insufficient detail on the design decisions behind this set-up. We have added more explanation for both points to the paper (lines 503-518; additional supplementary Figs. S15-S17). In brief, our simulation process consists of three parts. We first conduct the two steps the reviewer also mentioned: (i) simulating the infection time series, and (ii) simulating the observed time series by using the delay distribution from infection to death/hospitalisation/case report.

      However, we find that the observations simulated this way are too smooth compared to real data (see Figure S17). Possible reasons for this are that the delay distribution does not account for weekend and holiday effects, the random and occasional delay in recording confirmed cases, nor irregular components such as confirmed cases that are imported from abroad. We therefore added a noise term in our simulations, resulting in a third step: (iii) adding noise generated from an ARIMA model.

      To obtain a realistic ARIMA model for this third step, we fitted a model based on the confirmed case data for SARS-CoV-2 in Switzerland. Specifically, we first obtained the additive residuals based on the log-transformed confirmed cases. We then fitted ARIMA models of various orders and assessed the resulting ACF and PACF plots of their residuals. Based on this, we chose an ARIMA(2,0,1)(0,1,1) model. We refer to Figure S16 to support this: The first row shows the ACF and PACF plots of the original residuals, showing strong autocorrelation. The second row shows the ACF and PACF plots of the residuals after fitting the ARIMA model. We see that there is little autocorrelation left, indicating that this model is reasonable.

      In Figure S17, we present simulated observations based on all three steps, and one can see that they look more realistic than the simulated observations after step (ii).

      We would also like to point out that the ARIMA model is only used to obtain simulated observations. Our main method to estimate Re and obtain the related confidence intervals does not require fitting an ARIMA model.

      Minor comments:

      1) What does near real-time mean? The estimates of Rt are delayed for a few days like other approaches?

      Indeed, the estimates of Rt are delayed by the time it takes from infection to a case to be observed. We have replaced the term “near real-time” by “timely” throughout the manuscript, and added this explanation of the delay more explicitly to the text (line 86).

      2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.

      We have improved and extended the comparison of our method against others in two ways: (i) we added further comparison of the coverage of our method vs. that of EpiEstim to Fig. S4 (see also the response to major comment 2), and (ii) we added comparison against different commonly used pipelines (see minor comment 3 below). Instead of comparing to other approaches, the analysis in Table 1 was meant to illustrate the use of the Re estimates resulting from our method alone.

      3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.

      We added a section to the results (paragraph starting line 182; Fig. 3), dedicated to comparing our approach with relevant alternatives. We compared some of our empirical results with the estimates published on epiforecasts.io (based on EpiNow2 package from Abbott et al.), as well as official COVID-19 Re estimates for Austria (by AGES) and Germany (by RKI). We find that estimates published by the RKI and AGES health authorities are likely to be overconfident and to suffer from previously-identified biases (notably in Gostic et al., 2020, PLOS Computational Biology). We provide a detailed comparison of the features and approaches of these methods (EpiNow2, AGES, RKI), with the addition of the epidemia R-package (Supp File S2). This comparison highlights the unique features of the method developed: its ability to account for time-varying delay distributions and to combine symptom onset data with case data.

      4) Figure S11 is about accounting for known imports. While if the local cases are dominant and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assuming imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

      We thank the reviewer for this interesting comment and reference. We added a brief discussion in the result section of the manuscript to address this limitation (lines 174-177).

      Reviewer #2 (Public Review):

      This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

      I have several moderate critiques and suggestions:

      1) From a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combining the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

      We agree that such an integrated approach is useful, and makes the uncertainty interval estimation more coherent. However, in such an integrated approach one can not use the analytical solution for the likelihood, and methods that choose this approach (like EpiNow2 and epidemia) tend to pay for it in computational complexity. It also makes it harder to include time-varying delay distributions into the model, one aspect that sets our pipeline apart from existing alternatives.

      An additional advantage of our method is that estimates for the infection incidence are not influenced by priors on Re. In case of a bad model fit this allows us to separate more easily which part of the model may be misbehaving; and as such can help as a sanity check.

      Lastly, our framework has the advantage of modularity: pieces of the pipeline can be (and were) continuously refined or replaced with better pieces. This continuous improvement process allowed a flexible response to the pressing circumstances (the COVID-19 pandemic), and allowed us to extend it to entirely new types of proxy data (e.g., wastewater viral loads - https://ehp.niehs.nih.gov/doi/10.1289/EHP10050 ).

      2) Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted an ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise the estimation of R_e (t) much.

      Yes, there is reason to believe the residuals are autocorrelated. New supplementary Figure S15 shows the ACF and PACF of the residuals based on the confirmed cases of Switzerland, China, New Zealand, France and the US, and one can see that for most countries, the obtained residuals are clearly autocorrelated. We added this point to the simulations method section in the paper (lines 503-518). Please also see our response to Reviewer 2, major point 4 above.

      Choosing an optimal block size for the block bootstrap method is generally difficult. To capture weekly patterns, we need a block size of at least 7. We tried different sizes and found that 10 tended to work well in a variety of simulation settings (an example is given in Fig. S19).

      3) I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

      We started using a segmented Re(t) trajectory to allow for simple parametric generation of different scenarios (e.g. in new Fig. S10), and to specifically study our ability to estimate sudden transitions in Re (discussed wrt. Table 1, Fig. S2). We agree this approach makes our method look worse than necessary, since it is generally difficult to estimate such abrupt changes in Re. However, we thought this would be the more stringent test of our method, as we will perform better on any more smooth trajectory.

      4) The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

      Our goal of transforming the original case observations is to stabilize the variance of the residuals. Indeed, the square root transformation is generally recommended if the data to be transformed is Poisson distributed. In our case, however, the original case observations are not quite Poisson. Specifically, the infection incidence at time t given the past incidence is modelled with a Poisson process (see Section 4.4), but the case observations are modelled with an additional convolution step of the infection incidence with a delay distribution, and there is additional variation due to e.g. weekday effects. It is thus not clear a priory which transformation works best for our data, and we therefore investigated various possible transformations (including the square root transformation). We found that no transformation was uniformly the best for data of different countries, but that the log-transformation tended to perform best overall. This is why we chose the log-transformation. Please see the new supplementary Figure S14, where we show the residuals after the square root transformation and the log transformations for various countries.

      Regarding the bootstrap confidence intervals, we also investigated different options. Again it is not clear a priory which bootstrap confidence interval performs best for our data, so we compared common choices like quantile, reversed quantile and normal-based in a simulation study. Specifically, we assessed their coverage and found that the normal-based confidence intervals performed best overall (see Fig. S4).

      For low incidence settings, none of the bootstrap methods perform very well (as bootstrap consistency does not apply). We now mention this consideration in the paper (line 442).

      Finally, regarding the suggestion to compute exp(SD(log(X)): This quantity is generally different from SD(X), which we need for the confidence intervals. We also refer to the coverage in the various supplementary figures (e.g. S2, S4, S5) to support that our approach works well.

      5) The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

      We have added mobility data (from Apple and Google location data) to our general dashboard, and to the analysis shown in Fig. 5. The mobility traces give more detailed insight in the behavior that may have led to decreases in Re. However, we find similar patterns wrt. decreases in Re as with the stringency index. A more extensive analysis that focuses on different phases of the pandemic may allow for more detailed insights, but we believe this is beyond the scope of our manuscript.

    2. Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:<br /> It turns out the stable estimates are just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big difference in the results from death time series and reported cases time series, which one should I trust?

      2) Adequate representation of uncertainty:<br /> This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapid changing. However, I have concern on the methods for simulation (details below).

      3) Real-time of the Rt<br /> There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      4) simulation methods to estimate Rt<br /> Both 2) and 3) needs simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      Minor comments:<br /> 1) What is near real-time mean? The estimates of Rt are delay for a few days like other approach?<br /> 2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.<br /> 3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.<br /> 4) Figure S11 is about accounting for known imports. While if the local cases are dominate and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested that in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assume imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

    1. Author Response

      Reviewer #1 (Public Review):

      Xiong and colleagues use an elegant combination of theory development, simulations, and empirical population genomics to interrogate a largely unexplored phenomenon in speciation/ hybridization genomics: the consequences and implications of admixture between species with differing substitution rates. The work presented in this well-written manuscript is thorough, thought provoking, and represents an important advancement for the field. However, there are a few instances where I feel the strength of the conclusions drawn is not fully supported.

      Thank you for the positive comments!

      The authors begin by presenting evidence based on whole genome sequencing that the two focal species, P. syfanius and P. maackii, are highly diverged despite ongoing hybridization. Though the discussion of remarkable mitochondrial sequence similarity is underdeveloped. I do not understand how such a pattern is not most likely the result of introgression from one species to the other given the relatively high FST across much of the nuclear genome coupled with the generally higher mitochondrial mutation rate in animals.

      That’s a very good point. We have included this likely explanation of mitochondrial genome similarity in Line 84-86.

      Next, they posit that barrier loci are likely to exist. To support this assertion, the authors use a combination of parental population genetic diversity and divergence comparisons and ancestry pattern analysis in hybrid populations. They show that there is a strong correlation between divergence across pure species and within species diversity across the autosomes. Then using four hybrid individuals they show that low ancestry randomness, as quantified estimates of between group and within group entropy, is associated with genomic region of reduced within group diversity and elevated between group divergence. The use of entropy estimates as a stand-in for admixture proportions and ancestry block analysis when sample size is severely limited is particularly clever. Though I must admit, I do not fully understand the derivations of the two entropy measures, it seems to me that relatedness might have a strong effect on the interpretability of between individual entropy estimates (Sb). With very small population sizes this may be a real issue.

      Yes, genetic relatedness will play a big role in between-individual entropy (Sb). A group of highly correlated individuals will produce highly predictable ancestry (knowing one individual’s local ancestry gives much information on the local ancestries of others), and Sb will be small because entropy is a measure of uncertainty. If inbreeding is very severe, Sb will no longer be a useful measure because it will be too small across the entire genome. In our hybrid samples, although some genomic regions imply the possibility of inbreeding (see local ancestry of Z chromosomes in Figure 3–Figure supplement 1), there is still considerable variation of Sb across the genome which allows us to test for its correlation with DXY and π.

      A brief discussion of potential caveats in using the new method developed here seems warranted given its potential usefulness to the population genomics field more broadly. One plausible but less likely alternative interpretation of these patterns is briefly discussed.

      We have now devoted the first subsection of Discussion to the caveats and various motivation for entropy metrics. The appendix also contains further explanation of our intuition (section “Appendix-The entropy of ancestry”).

      The authors then move on to evidence for divergent substitution rates. Analysis of both D3 and D4 statistics using several different outgroups and a series of progressively stringent FST thresholds shows that site patterns between the two species are highly asymmetrical with P. maackii lineage harboring more substitutions than P. syfanius. The authors offer two possible explanations for this finding and then test both hypotheses. First, they use a comparative tree-based method to show that there is little phylogenetic evidence for lineage biased hybridization from outgroups into either of the focal lineages. Further, the range overlaps of the study species do not correspond with the inferred direction of allele sharing from the Dstat analysis. This is a good argument against contemporary gene flow between the outgroups and P. syfanius, but I am not convinced that ancient gene flow that could have occurred when, say, species distributions may have been different, can be ruled out using this analysis.

      Yes, we also felt that our original wording was overly strong. Now we say that our argument is based on current geographic distributions, but that archaic gene flow cannot be totally ruled out. However, we also point out that archaic gene flow with outgroups should still leave some detectable fractions of paraphyletic local gene trees after phylogenetic reconstruction. (Line 192-194).

      To test whether this asymmetry can be explained by a difference in substitution rate between the two species the authors show that observed D3 increases and D4 decreases with increasingly divergent outgroups as predicted by theory developed here. The authors take this as evidence supporting the divergent substitution rates. Though they claim only that existence such rate divergence is likely. The unfortunately limited samples sizes seem to preclude attaining more certainty than this. Interestingly, as a byproduct of using D4 as an extended measure of site pattern asymmetry the authors highlight one way in which the ABBA-BABA test can give false positives for introgression. This is an important contribution to the field.

      We agree with the reviewer that, for our data type – a handful of unphased genomes, it will be difficult to obtain more direct evidence for substitution rate differences. In line 182-187, we show using maximum-likelihood gene tree reconstruction that P. maackii samples often inherit more derived mutations than P. syfanius. This could be viewed as a separate test utilizing more accurate substitution models in phylogenetic software, while our theoretical calculation provides a coarse but testable signature of D3 and D4.

      To provide more direct evidence, we believe one ought to measure spontaneous mutation rates in both species under their native habitats, and obtain better knowledge of generation times and population sizes. The limitation of sampling and rearing these rare species are major barriers for incorporating this kind of evidence into this study.

      Finally, the authors observe a monotonic relationship substitution rate ratio and relative genetic divergence across the genome which is in line with their theoretical predictions for differential substitution rates in the face of gene flow. From this they infer an 80% increase in substitution rate from P. syfanius to P. maackii. It is remarkable to be able to extract these substitution rates from genomic regions with the least gene flow. However the veracity of these estimates relies on the assumptions I have highlighted above and should be presented with appropriate caution.

      We have included the limitations of our conclusions in the final subsection of the Discussion. Because high FST regions are relatively rare, estimates of observed rate ratio “r” have larger errors in those regions. This problem is partially resolved by using the entire monotonic relationship between r and FST to estimate the true rate ratio, so we rely not only on regions with the least gene flow but the full dataset.

      However, we do agree with the reviewer that ours is still a coarse theoretical framework since we do not impose a realistic substitution model (e.g., we don’t allow reverse mutations). We have now emphasized this weakness in the Discussion (Line 348-350).

      Reviewer #2 (Public Review):

      In their manuscript ("Admixture of evolutionary rates across a hybrid zone"), Xiong et al. use whole genome resequencing data to assess rates of genome evolution between two species of butterflies and determine whether putative barrier loci between the species are also those that evolve at asymmetric rates between them. This work presents a novel hypothesis and rigorously tests these ideas using a combination of empirical and theoretical work. I think the authors could more formally link loci that are evolving at highly asymmetric rates with those that are most likely to be barrier loci by evaluating the relationship between ancestry entropy and ratios of substitution rates between species. Additionally, clarifying the relationship between barrier loci and asymmetric evolution would be beneficial (i.e. are loci that we typically envision to be barrier loci, such as loci involved in reproductive isolation, evolving at asymmetric rates or do asymmetrically evolving loci represent a new type of barrier loci?).

      Many thanks for these comments! For the second point (clarifying the relationship between barrier loci and asymmetric evolution), we specifically mean that barrier loci (which specifically are of interest to those who study speciation) cause asymmetric rates of evolution to be preserved between hybridizing species. Asymmetric rates themselves are caused by other factors (spontaneous mutation rate differences, generation times, environmental effects) specific to each species, and barrier loci merely prevent the mixing of asymmetric rates. For the first point (evaluating the relationship between entropy and ratios of substitution rates).

    1. Gyuri Lajos 2 minutes ago https://youtu.be/5IfgBX1EW00?t=887 Listen go Frank Herbert for 3 minutes What he says there is perfect harmony of what you say. Thank you for saying. Top Quotes from the Frak Herbert Interview "remember that there's nothing at all wrong with saying that the Protestant ethic is full of it that it's all right to 00:14:30 enjoy your work you don't have to fight your way out of bed every morning you can get up every morning eager to go do whatever it is you do have a love affair 00:14:43 with your with your world and remember that you're not going to be able to predict every consequence of what you do" fiducary roots of science "question things I have the most fun that I'm writing questioning things that people do not question the assumptions that everybody 00:15:56 knows are true I'm going to declare a heresy for you all science if you go 00:16:07 back into its ruts saying why do I believe this well I believe this because of these tests and this this proof well why do I believe this why did I set up 00:16:21 this test why did I believe that proof all science goes back to something that we believe because we believe it we 00:16:34 believe it because we believe it and we have no proof for it it's like a religion so" And the message: Being comfortable with the unknown, as a finite human being "when you dig into the roots of 00:16:45 science a gray area at the bottom but it's like a balloon and the surfaces word the computer science has given us I 00:17:00 love this language the surface of the balloon is their face with what we do not know inside the balloon as we blow into it is what we have proved okay but 00:17:17 as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe" as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe no dead end, on and on and on "but isn't it more interesting to live in a universe where there are unknowns to discover new lands 00:17:43 to explore than to live in an absolute box where when you find the edge that's it baby no place to go from there I 00:17:57 I like the fact that we cannot predict everything I like the fact that we live in a universe where anything may happen because the alternative to me is a 00:18:12 constricting dead end" No End is the Ending, never Ending! Thank you Quinn. You've got it. Creating a space whaer I can share the same learnings. Anybody who got as far as Chapter House, may be on the second time of reading of it all will sure to get THIS. I believe that Show less Read more 0 0 Reply Gyuri Lajos 42 minutes ago Thank you articulating what I felt back then when I read it back then when it came out. I learned since recently that the message is "being comfortable with unknown", nay delight in it with pious awe towards the dignity of being reflected in human being

      never ending is the ending

      being comfortable with the unknown

      Frank Herbert Dune

    1. Author Response

      Reviewer #1 (Public Review):

      Redman and colleagues employed microprisms and two-photon optical imaging to track separately the structure of dorsal CA1 pyramidal neurons or the activity patterns of dorsal Dentate Gyrus, CA3, CA2 and CA1 pyramidal neurons, longitudinally in live mice. First, they carried out a characterization of the optical properties of their system. Second, they performed an example tracking of dendritic spines in the apical aspect of dorsal CA1 pyramidal neurons. Finally, they characterized differences in spatial coding along the tri-synaptic pathway, in the same animals. The main focus of the manuscript is technological and the authors show interesting data to support their technique, which I believe will be of relevance to neuroscientists interested in the hippocampal formation.

      Strengths.

      While using microprisms to achieve a "side" view of neurons in specific brain areas is not new per se [see Chia et al., J. Neurophysiol. (2009), Andermann et al., Neuron (2013), Low et al., PNAS (2014) etc.] the authors were able to visualize activity of a large neuronal circuit such as the hippocampal trisynaptic pathway - for the first time - in the same animal exploring an environment. This is not only a technical feat but it opens new scientific avenues to study how information is transformed at different stages within the hippocampus, as such I think this will be of broad interest for people in the field. In addition, the authors demonstrated imaging of dendritic spines in the apical aspect of pyramidal neurons but limited to dorsal CA1 due to the labelling density of the transgenic mouse line they decided to use. Despite the fact that imaging apical dendritic spines in dorsal CA1 has been shown earlier [see Schmid et al., Neuron (2016) and Ulivi et al., JoVE (2019)], the use of the micro periscope greatly increases the flexibility of these sort of experiments by enabling tracking of large portion (both apically and basally) of the dendritic arbors of dorsal CA1 pyramidal neurons.

      Thank you for the positive comments. We have clarified that apical CA1 dendrites have been imaged in previous work as you point out, just not along the somatodendritic axis (lines 127-130). We have also clarified that we were able to image CA2 and CA3 spines as well (only DG exhibited the increased labeling density in Thy1-GFP-M mice; lines 130-132).

      Weaknesses.

      While the data are sufficient to demonstrate the technique, the conceptual advance of the paper is very narrow. The findings on spatial coding differences in different hippocampal subregions - namely a nonuniform distribution of spatial information in the different hippocampal subregions - do not add new knowledge but largely confirm the literature. The results on the dynamics of apical dendritic spines of pyramidal neurons in dorsal CA1 seem to confirm previous work, but the interpretation of these results differs fundamentally. In fact both papers cited by the authors (Attardo et al., and Pfeiffer et al.,) come to the conclusion that dendritic spines on basal dendrites of CA1 pyramidal neurons are highly unstable, at least by comparison to other neocortical areas. The authors seem to ignore this discrepancy. However, this discrepancy has importance also to the characterization of the technique the authors developed. In fact, the optical resolution of the system strongly affects the ability to resolve neighboring spines - especially at the high density of dorsal CA1 - and thus it has a direct effect on the measures of synaptic stability [Attardo et al., Nature, (2015)]. The authors duly report lateral and axial resolutions for their micro periscopes and both are lower than the ones of Attardo and Pfeiffer, thus the authors should consider the effects of this difference on the interpretation of their data.

      We agree that the advance described in this manuscript is more methodological than conceptual. We do have other studies in progress that will be of greater conceptual interest. However, we believe the technique is of sufficient interest to the field that it is worth publishing the methodological approach and characterization as soon as possible.

      We have also addressed the comparison with Attardo et al. and Pfeiffer et al. mentioned by the reviewer. We actually agree with the previous work that dendritic spines in CA1 show a high degree of instability compared to cortex, finding ~15% spine addition and ~13% spine subtraction between consecutive days (Fig. 3H, I), similar to single-day turnover rates observed in Attardo et al. and other papers. Despite the high turnover rate, the fraction of experimentally observed spines that persist across 8-10 days plateaus around 75-80%, indicating that there is a substantial fraction of apical spines that remain stable in the face of ongoing daily turnover. This was also observed in basal dendrites by Attardo et al. (with similar survival fractions) and Pfeiffer et al. (albeit with lower survival fractions), so we would not necessarily characterize this as a discrepancy. We have clarified these points in the manuscript (lines 157, 162-168, 331-332).

      The reviewer pointed out that some previous studies used super-resolution microscopy to detect smaller structures and reduce optical merging. This would be an excellent extension of our work, as in principle super-resolution microscopy could be used with the implanted microperiscopes. Although the survival fractions we observed were similar to Attardo et al., they were higher than Pfeiffer et al., possibly due to the predicted effects of optical merging. We have updated the text to note that our results may inflate the degree of stability due to resolution limitations (lines 165-68, 335-340).

      Reviewer #2 (Public Review):

      Strengths

      The Hippocampus is a key brain region for episodic and spatial memory. The major Hippocampal subregions: Dentate Gyrus (DG), CA3, and CA1 have predominantly been investigated independently due to technical limitations that only allow one subregion to be recorded from at a time. In this paper the authors developed a new method that allows DG, CA3, and CA1 to be imaged simultaneously in the same mouse during behavior with a 2-photon microscope. This method will allow investigation of the interactions between Hippocampal subregions during memory processes - a critical yet unexplored area of Hippocampal research. This method therefore provides a new tool that will help provide insight into the complex functions of the Hippocampus during behavior.

      This method also provides high resolution optical access to deep dendritic structures that have been out of reach with existing methods. The authors demonstrate they can measure the structure of single spines on distal apical dendrites of CA1 cells. They track populations of spines and quantify spine changes, spines loss, and spine appearance. Spine turnover is thought to be a key process in how the Hippocampus encodes and consolidates memories, and this method provides a means to quantify spine dynamics over very long time periods (months) and can be used to study spine dynamics in CA3 and DG.

      We appreciate the comments.

      Weaknesses

      This method requires the implantation of a relatively large glass microperiscope that cuts through part of the Septal end of the Hippocampus. This is a necessary step to image transversally and observe all the major subregions simultaneously. This is an unfortunate limitation as it damages the very circuits being investigated. The authors attempt to address this by measuring the functional properties of Hippocampal cells, such as their place field features, and claim they are similar to those measured with other methods that do not damage the Hippocampus. However, it is very likely the implant-induced damage is affecting the imaged cells in some way, so caution should be taken when using this method. The authors are very aware of this and briefly discuss the issue. In addition, the authors observe damaged adjacent to face of the glass microperiscope that extends to ~300 um from the face. This area should therefore be avoided when imaging the Hippocampus through the microperiscope.

      We agree. This will be important for the interpretation of experiments using the microperisope approach. For many experiments, electrophysiology or traditional CA1 imaging approaches might be preferable to avoid damage to the hippocampal structure. We have tried to be straightforward about these caveats in our discussion. However, we believe the capability of imaging the transverse hippocampal circuit will allow a number of experiments that are currently intractable, and that the benefits will outweigh the caveats in these cases.

      Reviewer #3 (Public Review):

      Redman et al. describe a novel approach for long-term cellular and sub-cellular resolution functional and structural imaging of the transverse hippocampal circuit in mice. The authors discuss their procedure for implanting a glass microperiscope and show data that clearly support their ability to simultaneously record from neurons within the DG, CA3, and CA1 subregions of the hippocampus. They offer optical characterization demonstrating sufficient resolution to image at the cellular and subcellular level, which is further supported by experimental data characterizing changes in morphology of CA1 apical dendritic spines. Finally, neurons are recorded from as mice engage in navigation behavior, allowing authors to characterize spatial properties of hippocampal cells and relate findings to prior work in the field.

      The ability to image from multiple hippocampal subregions simultaneously is a great technical achievement, sure to advance study of the hippocampal circuit. In particular, this approach will likely have tremendous application for addressing the question of how neural representations dynamically change across the hippocampal subfields during initial encoding of novel contexts or later during retrieval of familiar. While the feasibility and utility of this preparation is supported by the data, further characterization of recorded cells will aid the comparison of data collected using this imaging approach to data previously collected with other methodologies.

      Thank you for the comments, we have addressed the specific concerns below.

      1) Further measures could be taken to more thoroughly evaluate the impact of the implant on cell health. While authors evaluate glial markers, it is not obvious how long after implant these measurements were taken. Additionally, authors could characterize cell responses of neurons recorded proximal to and more distal to their implant to further evaluate implant effect on cell health.

      Good points. We have added the date post implantation for the histology samples (Figure 1F caption). To address the second point, we added additional experiments characterizing functional response properties as a function of depth (Figure S7). We did not find systematic changes in place field width or place cell spatial information, as a function of imaging depth (lines 220-224; Figure S7A, B). We did however find a significant relationship between the decay constant for the fitted transients and depth, with cells close ( 130 um) to the surface of the microperiscope face exhibiting slower decay (Figure S7C). This appeared to be due to a small fraction of cells exhibiting longer decay times closer to the microperiscope face. As a result, we advise only imaging neurons >150 um from the microperiscope face (lines 224-226).

      2) More in-depth analysis of place cells will aid the comparison of data collected using this novel approach to previously published data. For instance, trial-by-trial data and clearer descriptions of inclusion criteria will allow readers a more detailed understanding of observed place cells.

      We have included example place cells with individual trial data (Figure 5C) and have added additional discussion and detail on our selection process for identifying place cells (lines 207-209, 663-666, 674676). In the revised manuscript, we further increased the stringency of our place cell criteria so that none of the cells with time shuffled responses pass the criteria. It should be noted that our place cells were not as reliable as those recorded in the presence of reward (Go et al, 2021). We chose to forgo reward to help ensure that the neurons were responding to spatial location and not to other task variables, but this likely reduced response reliability (see Krishnan et al, bioRxiv; Pettit et al, 2022). We have added discussion of this issue to the manuscript (lines 307-318).

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the work was to test for direct and indirect fitness costs associated with specific types of constructs that could be used for gene drive. The authors conclude that there are no direct fitness costs associated with the presence and expression of either Cas9 or the guide RNAs but that the Cas9 is causing off-target cuts that result in loss of fitness. They also conclude that a newer form of CAS doesn't cause these off-target cuts. While the goal of this study is important, there are many caveats associated with the work as reported, and these limit interpretation of the results, Many of the caveats are pointed out in the discussion.

      1.a) I am specifically concerned by the fact that from what I read, a company made the transgenic lines and that there was only one transgenic line per treatment. Unless the fly line used for the insertion was completely homozygous for the chromosome where the insertion was made, the lines could have differed in fitness, due to somewhat deleterious reccessives captured in one G1 but not another. This cost could have persisted for a number of generations after the crosses were made, especially in the high frequency "releases". This may not have been a real problem, but without any replication it is difficult to know.

      We apologize that this was unclear in our initial submission. We did in fact generate several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where four lines were used in seven population cages (replicates 1 to 4 were founded with the same line). All of these were also crossed to w1118 flies before we obtained homozygous lines, so the impact of deleterious alleles would have been minimized. We have edited the section “Generation of transgenic lines” in the Methods to clarify this.

      We also examined the possibility of fitness effects being caused by such alleles in our maximum likelihood analysis (assuming they are unlinked from the construct — otherwise they should have appeared as direct fitness effects). This model was not a good match for the data, nor was the model with direct fitness effects. Based on these results, we consider it unlikely that such deleterious alleles had a major impact on the observed frequency trajectories in our cage populations.

      1.b) My concern is reinforced by the fact that the no-Cas9, no-gRNA line goes up in frequency for the first 5 generations and then becomes stable in frequency. The loss of the fitness advantage is consistent with a fitness effect partially linked to the insertion site in that one cross but not others.

      Both of these cages were made with independent lines. We agree with the reviewer that the increase in frequency of the no-Cas9_no-gRNAs construct at the beginning of the experiment seems surprising at first. However, if an initial fitness advantage was truly driving the dynamics of this construct, we would expect that the “initial off-target model” (where fitness costs originated before the experiment) should have yielded the highest model quality in our maximum likelihood analysis, since we also allowed advantageous cut off-target alleles (i.e., fitness estimates > 1) in this model. While the maximum likelihood fitness estimate in the “initial off-target model” indeed exceeded the reference value of 1, its 95% confidence interval still included a fitness value of 1, and a neutral model actually yielded the lowest AICc value (i.e., best model quality, Table 3). We think that one possible explanation for this apparent initial frequency increase is that population cages tend to undergo larger than average fluctuations in the first one or two generations due to the smaller initial population size and potential health differences between founding fly lines (which can persist for a generation or two). We briefly note this in the manuscript methods section.

      1.c) It is important to note that the starting points are cages with separate vials of the control and experimental strain. Even a small difference in development time of the two strains in the first generation could lead to an excess of homozygotes in the next generation.

      We agree. In our maximum likelihood framework, such differences in development time should show up as a viability difference (fraction of offspring that made it to adulthood in the time window of our experiment). We now note in our revised manuscript that fitness differences between genotypes could be due to longer development time rather than an increase in the juvenile death rate in Cas9_gRNAs carriers. In the “Phenotypic fitness assays” section of our revised manuscript, we additionally state that “longer development time of individuals carrying the Cas9_gRNAs construct would also have appeared as a viability cost in our cage study but not in these fitness assays.”

      1.d) I am also concerned by the fact that the main conclusion is that the decline in frequency in the Cas9-gRNA line is due to off-target cuts, but there was no sequencing to back up that conclusion. In the discussion, this problem is mentioned but dismissed. I don't see how it can be dismissed when this is a major conclusion that remains based on very indirect evidence.

      We thank the reviewer for raising this important concern, which touches on the issue of how our approach differs from previous approaches that sought to directly detect off-target cleavage through sequencing. Our approach, by contrast, seeks to provide a “direct” measurement of the fitness of an allele. While this allows us to avoid the challenging task of detecting off-target mutations in vivo through whole-genome, population-level sequencing (and then predicting their potential effects), it comes at the price that inferences about the molecular nature of these fitness effects will rely on indirect evidence. However, we want to point out that our conclusion of these fitness effects being primarily due to off-target cleavage is based on three independent lines of evidence: (i) The maximum likelihood analysis of the frequency trajectory of the Cas9_gRNAs construct, where statistical model comparison ranked the off-target effect model higher than the direct fitness costs model; (ii) The fact that we inferred fitness costs only for the Cas9_gRNAs construct but not the construct in which Cas9 was replaced with the high-fidelity Cas9HF1 endonuclease (which should have similar expression and thus, similar direct fitness costs); and (iii) The heterogeneity we observed in the frequency trajectories of the Cas9_gRNAs construct in our cages, which is consistent with a model where off-target sites accumulate over the course of the experiment yet more difficult to reconcile with a model of direct fitness costs.

      Inspired by the reviewer’s recommendation, we wondered whether we may in fact be able to directly detect cuts at a few computationally predicted off-target sites. To this end, we performed Sanger sequencing at six sites that were computationally predicted for our Cas9_gRNAs construct by CRISPR Optimal Target Finder, which unfortunately revealed only wild-type sequences (this analysis is described in the new section “Evaluation of computationally predicted off-target sites”). However, we believe that this does not rule out off-target cutting as the primary driver of fitness costs for the Cas9_gRNAs construct due to the following arguments we state in the discussion section of our revised manuscript:

      “For example, our sequencing approach would not have allowed us to detect larger insertion/deletion events, which are frequently observed at on-target sites (48, 49). More likely though, we suspect that cleavage events occurred at other sites than the six computationally predicted ones. Indeed, the predictions by CRISPR Optimal Target Finder are based on cleavage specificity in cell lines, where off-target cutting is known to occur more frequently than in animals (47). All but one of the predicted off-target sites carry combinations of single nucleotide mismatches in the PAM-proximal and the distal region, which could make in-vivo cleavage less likely at these sites. Generally, our results are consistent with other studies that found off-target cleavage to frequently occur at sites which would have been difficult to predict computationally (50).”

      In a sense, our inability to detect any mutated alleles at this small set of computationally predicted off-target sites might actually highlight a key benefit of our approach: It can estimate the potential fitness costs of a construct without having to rely on accurate computational predictions of putative off-target sites or requiring the very costly approach of whole-genome, population-scale sequencing.

      Additionally, we would like to point out that while we found off-target effects to explain the empirical data best, we would probably consider our estimation of the overall magnitude of the fitness costs of the Cas9_gRNAs construct as one of the main conclusions of our manuscript, together with the fact that these were avoided when using the high-fidelity Cas9HF1 endonuclease instead. Thus, even if some readers may remain skeptical about the role of off-target cleavage (and we made sure to qualify our claims on this in the Discussion section accordingly), our systematic analysis of the overall fitness effects is more robust and should be of broad interest.

      1.e) When releasing homing gene drives, the initial frequency of the transgenic line is very low, and as in the Garrood et al paper cited, it is possible for the gene drive to outpace the non-target cutting. The modeling does not address what the impact of the presumed fitness costs in this experiment would be for a replacement/suppression drive released at low frequency.

      We thank the reviewer for raising this point. It has led us to add a completely new analysis on the “Effect of off-target fitness costs on gene drive performance”, in which we now show simulation results to illustrate the effect of direct and off-target fitness effects on both modification and suppression homing drives. We have also added more discussion on how these different types of fitness costs may affect other frequency-dependent CRISPR based gene drives.

      Reviewer #2 (Public Review):

      This paper reports a set of Drosophila population cage experiments aimed at quantifying fitness effects associated with the expression of Cas9 gene drive constructs in the absence of homing. The study attempts to deconvolve fitness effects due to the presence of the active nuclease at a genomic location from those that arise from off-target effects elsewhere in the genome: an important issue when considering gene drive strategies in the wild. To distinguish effects due to cleavage at the target site from activity elsewhere in the genome, a construct where Cas9 was replaced with a high fidelity nuclease (Cas9HF1) was employed. The experimental design compares the active nuclease-gRNA constructs targeting a site on another chromosome with no gRNA and reporter only controls, all inserted in the same locus. The Cas9 construct was assayed in 7 replicates with Cas9HF1 and controls assessed as duplicates with cages running for between 8 and 19 generations.

      2.a) There is a lack of clarity in terms of the cage set up design, the description in the supplementary methods could clarify if all the replicates came from a single founder and the difference in set-ups that necessitated ignoring some 1st generations.

      Thank you for pointing this out. We have thoroughly revised and extended our Methods section on “Generation of transgenic lines” to clarify this point. We now explicitly mention that we generated several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where we used four lines in seven population cages (replicates 1 to 4 were founded with the same line).

      For the cage start conditions, we now note that “To avoid potentially confounding maternal fitness effects on the construct frequency dynamics (which could arise based on minor differences in health or age between the initial batches of flies mixed together), we excluded the first generation of five cage populations…” In general, it is quite common for this to happen in insect population cage studies (please see some examples below) and is always a very short-term effect.

      2.b) The main finding reported from this part of the work is that with the control populations the frequency of the construct remained fairly constant across the generations, but the active nuclease tended to decline. I am somewhat confused by some of the claims here. First, the authors report a "bottoming out" effect where construct frequency declines then levels off: I am not entirely convinced that Figure 2 shows this. For example, comparing replicates 4 and 5 (8 and 16 generations respectively), it looks to me that there is a steady decline at the same rate with no evidence for a plateau. Perhaps replicates 2 and 3 show "some" evidence of leveling. In addition, replicates 4, 5, 6 and 7 have similar construct starting frequencies (particularly 5 and 7, which are only a few % different) yet the former show a steady decline whereas the latter maintain the construct at a steady level. This does not appear to be consistent with the author's explanation of higher off-target effects in populations carrying high frequencies of the construct. It would be helpful if the authors could more clearly explain the trajectories presented in Figure 2.

      We agree with the reviewer that our initial description of the raw construct frequency dynamics solely based on visual clues was making too strong claims (e.g., “different frequency dynamics between single replicates”) without providing more quantitative statistical support. This was originally intended as some basic introduction, with our maximum likelihood analysis then providing a more rigorous assessment in the next section. To improve clarity, we have completely restructured this in our revised manuscript. We removed the comparison of Cas9_gRNAs replicates solely based on visual clues, highlighted the general heterogeneity in trajectories among replicates (without making any specific claims), and instead of the vaguely defined “bottoming out” interpretation, we now only mention the average construct frequency change for the Cas9_gRNAs construct. In addition, we now present our more rigorous maximum likelihood analysis of the construct frequency trajectories and statistical model comparison earlier on in the Results section, so that all of our conclusions are now based on this statistical analysis, rather than an initial visual inspection of the curves. Please see also our comments to point 3.a) below, as reviewer 3 made very similar comments and suggestions.

      2.c) Utilising the allele frequencies obtained from the cages, 2 locus ML models were applied with the construct insertion site and an idealised off target site. They argue, correctly in my view, that fitness effects can be attributed to off target activity and not cleavage at the 3L target since the Cas9HF1 construct shows no substantive effect. In the models they assume that the presence of Cas9 in the germline (or maternally contributed) will invariably lead to cleavage at the idealised site. The model indicates that the construct insertion per se has no direct fitness costs but that off-target effects may have fitness consequences of approximately 30%, and seek to support this conclusion with simulations. I found this section difficult to follow but I feel that the conclusions are supported.

      We agree with the reviewer that the “Maximum likelihood analysis” section was too dense and therefore challenging to follow, especially for non-expert readers who may not be very familiar with such methods. We have revised and extended this section. In particular, we now also provide a brief summary of the modeling approach at the beginning of the section and have added subsection titles aiming to better guide the reader through the various steps of the analysis. Furthermore, we added a table with an overview of all tested models and highlighted the best-fitting models in tables 2 and 3. We hope that this has improved the clarity of our revised manuscript.

      2.d) Direct phenotypic assays with the active Cas9 nuclease were performed, looking at viability, mating preference and fecundity. Relegating these data to the supplements is not useful. While significant effects are attributed to the Cas9-gRNA construct, the authors cannot rule out a DsRed effect and it is a shame they did not assay at least one of the control constructs. In addition, in their modelling they assume that Cas9 activity will always cleave but see no evidence for this in the heterozygote viability assay. Whether this is due to the difference in rearing conditions that the authors claim is debatable.

      We thank the reviewer for this valuable feedback. As suggested, we have moved the phenotypic assays (Methods & Results) of the Cas9_gRNAs construct to the main part of the revised manuscript. We decided to conduct phenotypic assays only for the Cas9_gRNAs construct, because it was the only one that displayed some fitness costs in our maximum likelihood analysis (in particular, the DsRed construct did not display any fitness costs in the cages). However, given more time and capacity, we agree that additional phenotypic assays would have been desirable (e.g., a larger sample size per construct and additional constructs). Regarding our choice of model for the maximum likelihood analysis, we used a highly simplified off-target approach, which was necessary given the available information.

      2.e) Finally, since the initial cage experiments suggest that the Cas9HF1 enzyme reduces off-target effects they assay this enzyme in a model homing drive, indicating that this enzyme performs as well as the regular Cas9. Again, relegation of these data to supplementary datasets is unhelpful and it would improve the manuscript if these results could be simply summarised in a figure.

      We added an additional figure at the end of the “Cas9HF1 homing drive” section in the Results showing the gene drive inheritance rate and resistance allele formation rate in early embryos for the Cas9HF1 and Cas9 homing drive respectively. The gene drive inheritance rate is the percentage of offspring with DsRed fluorescence when crossing individual gene drive heterozygotes with “wildtype” homozygotes (i.e., not carrying any gene drive allele) and is used to calculate the gene drive conversion rate (i.e., the rate at which wildtype alleles are converted to drive alleles) mentioned in the main text. We hope that this has improved the clarity of our revised manuscript.

      2.f) Taken together, I think this is a useful study but is presented in a way that is at times impenetrable to the non expert. More clarity in presenting the cage and modelling data, as well as promotion of figures from supplementary material to the main manuscript would considerably aid the non expert and provide greater confidence in the interpretations. If these issue could be clarified I feel the work provides a useful addition to the gene drive field and will help those thinking about developing such strategies, particularly relevant are the findings related to the Cas9HF1 enzyme.

      We thank the reviewer for the valuable feedback. We have significantly revised the Results as well as the Discussion, provided additional information on the modeling approach, and shifted supplementary material to the main text of the manuscript. We hope this has improved the overall clarity of the manuscript.

      Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      3.a) My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the increase in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      We thank the reviewer for this detailed recommendation. We agree that our description of construct frequency dynamics solely from visual clues was indeed making too strong claims (e.g., regarding “different frequency dynamics”) without providing enough statistical support for these specific statements. We had originally thought that some readers would prefer we first provide such a qualitative description of the allele frequency trajectories, prior to going into the mathematically more rigorous (but therefore also more complicated) maximum likelihood inference of fitness costs and statistical model comparison of different selection scenarios (“full inference model” vs. “construct model” vs. “off-target model”, etc.)

      In response to the reviewer’s comments, we decided to completely restructure this first part of the Results section. Specifically, we have removed our comparison of Cas9_gRNAs replicates solely based on visual clues, and also any mention of the admittedly vaguely defined “bottoming out” behavior. Instead, we now only mention the average frequency change for the Cas9_gRNAs construct across all replicates, while highlighting the heterogeneity among replicates. The maximum likelihood analysis is now introduced right after this and has also been revised extensively to improve clarity. We believe that this analysis provides a very powerful framework for the systematic inference of fitness costs and for assessing which of the different selection scenarios best explains our empirical data. This is because it combines the data from all replicates while fully accounting for the heterogeneity among them. For example, it could well be that construct frequency trajectories in individual replicates may not be statistically distinguishable from neutral evolution, yet in aggregate, an inferred fitness cost of the construct becomes highly significant. Note that the maximum likelihood framework also provides confidence intervals for its estimates, based on the entirety of the data. So the question of whether a departure from a neutral model is significant comes down to whether the 95% confidence interval surrounding the fitness estimate of the given construct still includes a value of 1 (which it does for the “direct fitness” estimate of the full model, but not for the “off-target fitness” estimate, see Table 2).

      Regarding the comment about error bars for the allele frequency trajectories in Figure 2, we want to point out that our construct frequency estimates are actually based on the genotype counts of all adult flies present in the given cage experiment at the specific time point. We therefore did not include uncertainty estimates in Figure 2, nor did we include sampling noise in the maximum likelihood analysis. We have now clarified this in the caption of Figure 2 and in the Methods section (“Maximum Likelihood framework for fitness cost estimation”). We also acknowledge that we still cannot rule out sampling noise completely (for example through escaped flies, phenotyping errors, or loss of frozen flies due to destruction or other issues). However, we expect that the relative contribution of these errors should be negligible compared to drift.

      The reviewer raises an interesting question: Why did the Cas9_gRNAs construct frequency not decrease in the two replicates with the highest construct starting frequency (replicate 6 and 7)? A possible explanation could be that — given a limited set of off-target sites — cut off-target alleles that impose a fitness cost will accumulate and start to independently segregate from the construct alleles very quickly in populations where the construct has a high starting frequency (and thus a higher overall rate of cleavage events). We now state this possible explanation in the section on “Construct frequency dynamics suggest moderate off-target fitness costs” of our revised manuscript.

      3.b) My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      The reviewer raises a very important point: modeling only one off-target site that represents the net fitness effect of Cas9 cleavage outside the target region as well as a cut rate of 100 % (i.e., the off-target site is always cut in the presence of Cas9) is highly idealized.

      (1) We agree with the reviewer that in reality, the experimental populations might have a polygenic off-target landscape, where the fitness of cleavage alleles could differ vastly within as well as between loci. However, given the limited number of data points (e.g., n=87 generation transitions for experimental populations with the Cas9_gRNAs construct), it would be extremely difficult if not impossible to disentangle the numerous parameters that would be necessary to describe such a more complex off-target scenario with our modeling approach. We have now highlighted our model choices, potential caveats, and resulting limitations in both the Discussion section and also the section “Construct frequency dynamics suggest moderate off-target fitness costs” in the Results.

      (2) Similar to the single off-target locus, our cut rate of 100 % is an idealized assumption that was chosen with the aim to reduce model complexity. As outlined above, it would be extremely hard to disentangle the cut rate from other parameters (such as the number of target sites if fitness effects are multiplicative across loci). Additionally, we would like to point out that the reported conversion efficiencies (~80 % in males, ~60% in females) are not the conversion efficiencies of the constructs in the experimental populations shown in Figure 2, but of separate homing drives with a single gRNA. All constructs in the experimental populations are designed in a way that no homing can occur, and they have four gRNAs if any. We apologize for the confusion. Our revised manuscript contains now a paragraph in the “Cas9HF1 homing drive” section in the Results that highlights the differences between the constructs in the cage populations and the homing drives assessed in this study. Furthermore, we have added an additional figure that displays the individual results of the homing drive (Figure 5) — we hope this improves clarity.

      3.c) My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      We thank the reviewer for pointing this out. We removed the claim that the phenotypic assays “broadly confirmed our previous findings” and highlight now the differences in estimated fitness costs for male and females in the phenotypic assays as well as the discrepancy to our maximum likelihood estimates. Furthermore, we provide now additional explanations for what might be causing this phenomenon (i.e., single crosses vs. large populations, vial vs. cage, interactions between individual genotypes and the environment, delayed development of construct homozygotes being interpreted as reduced viability in the maximum likelihood analysis). We also point towards the discrepancies in the Discussion of our revised manuscript and recap potential explanations.

      3.d) My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      The reviewer is correct: The constructs in the population cages are different to the homing gene drives for which we estimated the gene drive conversion rates. However, we were able to confirm at least one mutated gRNA target site in every PCR-based genotyped offspring of individuals carrying either the Cas9_gRNAs or the Cas9HF1_gRNAs construct (this is now specified in the manuscript). Thus, we did not expect a systematic difference in on-target mutation rates for Cas9_gRNAs, and Cas9HF1_gRNAs constructs respectively. We acknowledge in the Discussion that construct performance might substantially vary with genomic sites and even organisms.

      3.e) Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

      We apologize for the confusion. We have highlighted the similarities (e.g., nanos promoter, DsRed) as well as the differences (e.g., number of gRNAs) between the homing drives and the constructs in the cage populations at the beginning of the section “Cas9HF1 homing drive” in the Results. We hope this makes it more clear.

    2. Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the *increase* in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      The primary strength of this study is in establishing the N999S heterozygous mouse as a useful model system for debilitating paroxysmal non-kinesigenic dyskinesia (PKND), with or without epilepsy. This outcome was hard-won following a comprehensive analysis of biophysical, neurophysiological, and behavioral tests. Ultimately the convincing evidence was demonstrated through a clever application of a stress-related behavioral test (quite in alignment with triggers in patients) to elicit the hypo-motility associated with PKND. Like patients who exhibit variable penetrance, even highly inbred mice exhibit much variability, and uncovering a robust phenotype took a nuanced approach and perseverance.

      To reach this point, several experiments provided mechanistic insights into the mutant channel behavior. First, whole-cell patch clamp experiments revealed shifts in the G-V consistent with gain-of-function behavior previously characterized using the N999S and D434G mutants expressed heterologously. Novel observations of H444Q revealed a loss-of-function (LOF) behavior with the G-V shifted to positive potentials but to a lesser degree. These electrophysiological phenotypes establish the rank of predicted severity as N999S>D434G>H444Q.

      This prediction was tested in brain slices of heterozygous animals where the mutant channels would be normally spliced and associate with WT subunits and other components such as beta subunits. The investigators evaluated BK currents by patch clamp from hippocampal neurons where BK channels are known to play key functional roles. Both N999S and D434G showed the predicted increase in current magnitude, though interestingly the differences between them apparent in heterologous expression were lost in the native setting. Curiously, no differences in BK current magnitude were observed in neurons of heterozygotes carrying the putatively LOF mutation H444Q.

      In terms of seizure susceptibility, D434G mutants different from WT and less severe than N999S mutants with respect to time to evoked seizure, although differences in "EEG power" were not statistically significant between D434G and WT. These observations support the conclusion that D434G represents an intermediate disease phenotype.

      The behavioral studies were the most effective in revealing differences among the variants and in defining GOF N999S heterozygotes as a compelling animal model for PKND and providing evidence that the LOF mutation conferred the opposite effect of hyperkinetic mobility. The findings provide the new insight that KCNMA is the target of heritable, monogenic disease, a conclusion that was previously not forthcoming because known human mutations have arisen de novo. The dyskinetic phenotypes in response to stress induction are wholly consistent with patient symptoms.

      With respect to rigor and reproducibility, it is commendable that the investigators were blinded to genotype during data collection and analysis. Moreover, the study provides an important confirmation of previous findings from another lab regarding the cellular phenotype of the N999S mutant. WT controls were compared to transgenic littermates within individual transgenic lines. In some cases, the sample sizes were rather low (see below), but otherwise the study seems rigorous.

      The strengths of the manuscript far outweighed the weaknesses. The experiments interpreted to suggest a gene dosage effect with D434G were not compelling to this reviewer and might be better documented in the supplement with the conclusion that further work is required.

      Due to pandemic-related animal and lab issues, we were unable to generate and surgically implant full Kcnma1D434G/D434G homozygous cohorts for the EEG/seizure portion of the study. We focused instead on using the limited mice of this genotype for the novel PNKD3 assays (n=7), leaving the seizure dataset at n=3.

      To address the concern, the Kcnma1D434G/D434G data was removed from Figure 4 to avoid overinterpretation of a gene dosage effect. However, we did retain the individual measurements within the Results text (lines 383 and 385), on the basis of facilitating direct comparisons between our study and other D434G studies. For example, even with only three measurements, the trend toward the shortest seizure latencies in Kcnma1D434G/D434G mice is similar to the result obtained with an independently generated D434G mouse model (Dong et al, 2022). Yet seizure power and the presence of spontaneous seizures do not show a similar trend, suggesting our results differ from theirs in these important aspects. This is now stated more clearly in the revised conclusion for that paragraph, ‘While not conclusive and requiring substantiation in a larger cohort, the Kcnma1D434G/D434G seizure data raise the possibility of a gene dosage effect with D434G that qualitatively differs from an independently-generated D434G mouse model (Dong et al., 2022),’ (lines 388-390).

      In contrast to the seizure part of the study, the increased severity of Kcnma1D434G/D434G PNKD-immobility is fully supported by the data with sufficient statistical power (Figure 5D). However, the idea that the increased severity with homozygous D434G in PNKD-immobility was consistent with gene dosage observations for seizure was removed for consistency (lines 549-550).

      As a side note, we also added additional clinical descriptors (akinesia) and colloquial descriptions for PNKD3 (‘drop attack’) to disambiguate how a PNKD3 episode appears different from other types of motor dysfunction. This was to facilitate comparison with the two other KCNMA1-D434G models (mouse and fly; Dong et al, 2022; Kratschmer et al., 2021), which report aspects of dyskinesia in the setting of baseline locomotor dysfunction. To our knowledge, these models have not been evaluated for the striking ‘drop attack’ immobility presenting in patients (lines 84-85).

      The consequences of the altered BK current levels were assessed on the voltage dependence of firing frequency in the hippocampal neurons, but it was not very clear how increased BK current would enhance neuronal excitability. Also, how might it lead to the PKND phenotype? A paragraph even speculating on these mechanistic links in the Discussion would be welcome.

      The mechanism for how BK currents increase action potential firing are not fully identified in this study (see also response to reviewer #2). In the Results, a new paragraph was added at the end of action potential section to summarize the AHP changes in more detail and speculate an indirect mechanism of action for the increase in BK current, predicted from a similar ‘GOF’ BK current type, where β4 regulation of BK channels is lost (lines 294-304). Additional details have also been added to the Discussion regarding the factors contributing to lower seizure threshold (lines 675-680).

      Additional re-organization of Discussion text addresses the basis for PNKD. A direct statement that it is not clear yet which neurons/circuits are the most critical for PNKD-like symptomology was added, and which of these express BK channels (lines 680-700). We follow with a succinct summary of phenotypically-relevant PNKD models. While there is a lot to unpack with respect to similarities and differences between different paroxysmal dyskinesia models in the literature, they ultimately shed little light the question of KCNMA1 PNKD3-related dysfunction. With the addition of the d-amp rescue control, we focus mainly on the amphetamine response predicting a CNS locus (lines 692-693). The d-amp response may even suggest dopaminergic pathways (some of which express BK channels) as a plausible to investigate in future studies, but due to the complex interplay of d-amp dosage and the novel motor assay, we don’t think speculating on a specific circuit is supported with enough actual data to add in the Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

      In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

      First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

      This concern is summarized in Essential Revision 1 above; see our comments there for our detailed response. Briefly, we have added an additional Figure Supplement (Fig. 1 – Supplement 8; see above) demonstrating that the 91 insertion mutants have a similar range of effects in this study as in the previous one (which may be expected since the genetic backgrounds here are as closely related to those in the 2019 study as the backgrounds in the 2019 study are to each other).

      Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.

      We agree that details about statistical methods, most of which are taken from Johnson et al. (2019), were not clear in our text. As we also describe in our response to the Essential Revisions above, we have rewritten a large part of the methods text to provide more details about statistical methods and have calculated and reported errors more broadly:

      Errors on fitness effects: We have expanded our methods text describing how the fitness effect of a mutation is determined for a single clone / condition. This text now emphasizes the internal replication provided by redundant barcodes, which allows us to calculate a standard error for the effect of a mutation in a single clone / condition. These errors are shown in Figure 1 – Figure Supplements 1-3. We have also added details on how errors are calculated for a mutation for a population-timepoint, and these errors are now included in Figure 2.

      Errors on the DFE mean: We discuss this below.

      Considering clones separately: As we also describe in the essential revisions above, Johnson et al. (2021) shows that the mutational dynamics in these evolving populations are dominated by successive selective sweeps, so we expect clones isolated from the same population-timepoint to rarely differ by many mutations. However, we agree that there are likely some cases in which the two clones have important genetic differences. To address this concern, we have reanalyzed our data as you suggest, considering each clone separately. The results of this analysis are included for every main text figure in the form of figure supplements (Figure 1 - figure supplement 7, Figure 2 - figure supplement 5, Figure 3 - Figure supplement 5, and Figure 4 - figure supplement 1), which show that our qualitative conclusions are unchanged.

      Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs.

      We thank the reviewer for these positive comments and the nice summary of our work.

      As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      Related points were also raised by the other reviewer. To address this, we have added multiple-hypothesis-corrected p-values for these least-squares Wald Tests (using the Benjamini-Hochberg method) to our dataset (Supplementary File 1). As you suggest, for this particular analysis in which we compare the overall number of mutations following each pattern, we are willing to accept the possibility of false positives, so we still use the original p-values to categorize the mutations in Figure 2. We address this point in the main text and provide the numbers of mutations falling in each category after we perform this correction:

      “Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini Hochberg multiple-hypothesis correction these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.”

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

      Thanks for these detailed comments about the modeling approach and analysis, which raise points that were also described in the Essential Revisions and by Reviewer 1. We agree that these details were not presented sufficiently clearly in the original manuscript. In the revised manuscript, we have added a much more in-depth section on the details of the modeling procedures in the Materials and Methods, including formulas for each model and a discussion of how noise could affect our modeling results (see responses to essential revisions and reviewer 1 above for more information). This includes an analysis of shuffled and simulated datasets, which will give readers a better sense of how to interpret these modeling results. We have also included a new paragraph in the results that compares the models for each mutation and for the entire dataset using the Bayesian Information Criteria (BIC):

      “We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.”

      We also appreciate the suggestion to look at how coefficients are spread among mutations. We have made a new supplemental figure (Figure 3 - Figure supplement 3) that clearly shows the coefficients broken down by mutation for each condition. This figure shows that coefficients are often clustered for one mutation. That is, multiple populations often have similar coefficients / patterns of epistasis for a particular mutation. We don’t view this as a source of bias in our data, but as an indication that the mutations fixing in these populations sometimes exhibit similar patterns of epistasis with these insertion mutations. We now reference this supplemental figure in the main text (“see Figure 3 – figure supplement 3 for a breakdown of coefficients by individual mutations”) as a better representation of the coefficients that result from our modeling.

    2. Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs. As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

    1. Reviewer #3 (Public Review):

      Punishment is a key form of learning and behavior change, yet its core behavioural and brain mechanisms remain poorly understood and certainly less well understood than reward learning. This manuscript by Jacobs et al from the Moghaddam laboratory uses dual fibre photometry for calcium transients to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the PFC and VTA of rats. This work builds on the elegant single unit work from this group reported previously. The authors use a single action, probabilistic task whereby rats are first trained to nosepoke for sugar pellets on an FR1, with a 5 sec DS signalling reinforcement. Then, in blocks of 30 trials each, the nosepoke is punished on a probabilistic contingency of 0%, 6%, 10%. The authors used dual fibre photometry to concurrently record calcium transients in "dmPFC" and VTA, with a focus on transients related to action emission and punisher as well as reward delivery.

      There are quite a few key findings here: 1) action transients in dmPFC change across punishment from modest inhibitory transients in 0% risk to no change (i.e possible loss of inhibitory transient in PFC) or modest positive transients (in VTA) as risk increased from 6-10%; 2) comparison with past single-unit data suggested similarity between photometry and single unit measures for the action but not DS; 3) there was no change in punisher transients in these regions; 4) diazepam which had modest behavioral effects to alleviate punishment had no effects on PFC transient to the action or punisher but did reveal peri-action ramping-like transients in VTA; 5) diazepam increased correlated activity between VTA and PFC at 0% and 6% risk

      Overall, I enjoyed reading this manuscript and I learned much from it. The work builds neatly and clearly on the past work of this group in this task, providing new information on how punishment shapes action coding in the prefrontal cortex and VTA, how it shapes correlated activity between these regions, and how benzodiazepines may affect these to achieve their anxiolytic effects. The critical conclusions are that these regions are important for action, but not punisher, encoding, and that peri-action ramping in VTA neurons and VTA-PFC correlated activity contribute to the anxiolytic effects of benzodiazepines in this task.

      Comments

      1. I think it is worth drawing the distinction between punishment (i.e. learning and performance) versus the punisher (footshock). For example, the title (and across the manuscript) refers to "punishment coding" to mean transients to the punisher itself. I would suggest using "punisher" when referring to the outcome used (footshock) or its associated transients and "punishment" when referring to learning. So, learning punishment involves changes in action but not punisher encoding in these regions.

      2. "dmPFC". Different researchers mean different things by this term. Would it be possible to state exactly where the fibres were instead (e.g., Laubach et al., eNeuro, 2018)?

      3. I did struggle to understand the functional significance of the PFC transients. I am convinced they are real and robust because we see precisely the same in our own unpublished work. But, I am still puzzled as to what a loss of an 'inhibitory' transient around the punished action in PFC means? This is not really addressed but it is the main effect of punishment on action coding in the PFC and I think some readers would appreciate the author's interpretation of this.

      4. Related to 3, it was also not clear why these PFC transients differed only at 6% risk and not also 10% risk. Again, I think this is worth discussing.

      5. Re: analyses. I thought these were generally well done. There are two questions one might be interested in. The first is whether the transients are different from 0%. The second is whether transients differ across sessions. The figures do a good job at answering the second question (which to me is the most important question) by using coloured bars above transients to show when session differences are present as assessed by a robust analysis. However, I do think some readers would also appreciate knowing whether and when transients themselves were significantly < or > 0%. Perhaps these figures could be presented as supplementary data.

      6. The comparison with previously published single-unit data was very interesting. Here I was persuaded that these correlations were meaningful because of the difference between these correlations for cue and action. I am not suggesting the authors do the following, I only offer it for their consideration in future work. Kriegeskorte has developed ways of assessing dissimilarity in different data types from the same behavioural designs that could prove very helpful and persuasive here (e.g., Front. Syst. Neurosci., 24 November 2008; https://doi.org/10.3389/neuro.06.004.2008).

      7. The authors comment on the overgeneralisation of punishment learning. That is, in session 1 there is a broad suppression of behavior by punishment that was not obviously present in the remaining sessions. I am not sure overgeneralisation is the best term because this implies punishment learning generalised. More likely is that Pavlovian fear was present in session 1 to generally suppress nosepoking and this fear was reduced in the remaining sessions as the instrumental punishment contingency was learned. Bolles made this point some years ago and it may be worth citing Bolles et al. Learning and Motivation Volume 11, Issue 1, February 1980, Pages 78-96, on this point.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript from Shi, Ballesta, and Padoa-Schioppa examines the relationship between neural activity in the monkey orbitofrontal cortex (OFC) and various choice patterns that arise in sequential (versus simultaneous) choice. This approach addresses a central question in the study of decision-making: how can one identify value-dependent versus value-independent effects on choice behavior when value is defined from that behavior itself? Here, the authors document three behavioral differences in sequential choice: choosers are nosier, show an order bias, and show a preference bias. Leveraging a conceptual computational framework for OFC activity that the authors have developed over many years, the authors link reduced accuracy to changes in neural valuation in the OFC, order effects to post-valuation decision activity in the OFC, and preference effects to extra-OFC processes. For decision neuroscientists, these findings show specific differences between sequential and simultaneous choice, and suggest the integration of multiple stages (valuation, decision, and post-decision) in the selection process. More broadly, this work shows how an examination of neural activity can shed light on aspects of the decision process that cannot be distinguished by an examination of behavior alone.

      Strengths:

      Overall, this paper presents a novel and thoughtful task design that allows comparison of neural and behavioral value and choice effects. In concert with an established circuit-based framework for parsing different types of OFC response patterns, the authors test and validate a number of hypotheses on the link between neural activity and choice.

      (1) Comparing sequential and simultaneous choice tasks in an interleaved manner is a clever approach to separate valuation and comparison processes in time. While not entirely novel (e.g. see work from the Hayden group), the combination of this approach with the OFC response pattern (offer value, chosen value, chosen juice) framework allows a distinction between valuation and comparison-related effects.

      (2) This paper is the latest in a significant series of related papers on orbitofrontal activity from this group, and cleverly utilizes their expertise in characterizing, analyzing, and conceptualizing different patterns of OFC activity. In addition to the long-established offer value/chosen value/chosen juice categorization, recent papers from this group have established the causal contribution of OFC offer value activity to economic choice and established similar OFC neural contributions to sequential and simultaneous choice tasks.

      (3) Apart from a causal test (e.g. cell type specific stimulation) of the contribution of different neural responses to different choice effects, the next strongest evidence is a demonstration of a consistent relationship across sessions. The authors show such a relationship between offer value coding strength and choice accuracy, between chosen value sequence effects and behavioral order bias, and between chosen juice inhibition and order bias. At the least, these relatively strong effects show a strong correlation between different OFC responses and behavior.

      Thank you for emphasizing these points.

      Weaknesses:

      While the experimental approach and rigor of the analyses are strengths, there are issues of interpretation and generality of analytical approaches that should be clarified.

      (1) The abstract, introduction, and discussion touch on canonical behavioral economic choice effects as a prelude to the behavioral effects documented here, but it's not clear they are so closely related. [A] Many of the effects in the cited literature (framing effects in risky choice, preference reversals, etc.) are robust across different task paradigms, whereas the effects shown here arise specifically from a comparison of choice across different task paradigms (sequential vs. simultaneous). Furthermore, [B] it's not clear that the term "bias" adequately captures the array of effects in the behavioral economic literature (for that matter, [C] one of the main effects in this paper is reduced choice accuracy rather than a bias). [D] The paper would benefit from a clearer conceptual linkage between documented behavioral biases (particularly in humans) and the effects shown here.

      [B] We beg to differ. In our reading of the literature, the term “bias” is very general and it is invoked practically every time choices present some effect that seems idiosyncratic or “irrational”. The list of documented biases is very long – a good reference is the Wikipedia page on cognitive biases (for more scholarly references, see (Gilovich et al., 2002; Kahneman et al., 1982)).

      [A] As for whether biases documented in behavioral economics are robust across task paradigms, that’s really matter of perspectives. For example, we all understand the phenomenon of loss aversion (a.k.a. “status quo bias”) to be very robust and almost intuitive. But before the prospect theory paper of Kahneman and Tversky (1979), that was not at all the case. In the 15 years following that paper, much of what Kahneman and Tversky did was to show how loss aversion affected choices in different domains (Kahneman and Tversky, 2000). Other biases are much less reliable. For example, there is an extensive literature on decoy effects – i.e., violations of the axiom of “independence of irrelevant alternatives”. However, it turns out that the strength and even the direction of decoy effects depend on seemingly minor details (Spektor et al., 2021). In other words, decoy effects are not as robust as one might think. As for the biases dicussed here, our hunch is that the order bias is quite ubiquitous. Indeed, it was already documented using different tasks in different species (Krajbich et al., 2010; Rustichini et al., 2021). The preference bias might also be the manifestation of a rather general phenomenon. Afterall, there is a common intuition that when a decision is difficult we sometimes fail to finalize it, and eventually choose some default option. In conclusion, we think of the two biases discussed here as conceptually very comparable to biases described in behavioral economics.

      [C] We agree that the drop in accuracy is (strictly speaking) not a choice bias, and we carefully chose the title and wrote the whole manuscript to keep that point clear. However, let us note that the drop in accuracy observed under sequential offers could easily be construed as a choice bias – specifically, a bias favoring in any situation the lesser option (lower value). As we conclude the present study, this phenomenon continues to fascinate us. Indeed, while it is clear that the behavioral effect arises at the valuation stage, we still don’t understand why the activity range of offer value cells is reduced under sequential offers. Naively, one might have guessed the opposite – i.e., that when only one offer is on display, the lack of competition translates to stronger offer value signals. We plan to give this issue more thought in the future. One possibility is that the system modulates the activity range of offer value cells depending on the task and/or the behavioral context. If so, differences in choice accuracy measured under sequential versus simultaneous offers would be a manifestation of a more general phenomenon. Of course, this matter remains open for future research.

      [D] The link between the biases discussed here and other biases described in the literature is conceptual. The main point we want to make is this: Over the past 20 years, we have gained some understanding of the neural circuit and mechanisms underlying simple economic choices. While our understanding remains incomplete and object of ongoing research, notions acquired for simple choices can be used to make sense of a broader class of choices. Thus, in principle at least, it is possible to shed light on a variety of traits and biases by observing the activity of particular cell groups. The last paragraph of the ms conveys this point.

      (2) The analyses rely on a particular quantification of choice behavior (probit regression), which interprets choice effects (e.g. relative valuation of the two juices, sigmoid steepness) via specific parameter combinations and relies on specific assumptions about the construction of choice (e.g. cumulative normal distribution, constant sigmoid slope across order effects). This method of quantifying choice behavior is well-documented in previous studies, allowing a comparison to past work. However, given the importance of this approach to both quantifying choice effects and comparing choice to OFC responses, the paper would benefit from directly addressing two issues: (1) how well does probit regression actually capture stochastic choice behavior (in both Task 1 and Task 2), and (2) do the findings rely on specific choice modeling assumptions? The second issue is most important for the order bias effects, which assume a constant sigmoid across conditions - do the authors reach similar conclusions if this assumption is relaxed?

      Thanks for raising this question. We address it more thoroughly below (under “Recommendations for the authors”, point (2)). In a nutshell, when we designed the behavioral analysis, we chose the probit function and the log value ratio model (as opposed to the value difference model) based on general considerations and for consistency with our previous studies. We now conducted a series of control analyses using logit instead of probit and value difference instead of log value ratio. We also repeated all the analyses of neuronal activity using measures for relative value, choice accuracy and order bias derived from these behavioral models. The upshot is that all of our results hold true independently of the regression model used to analyze choices. Thus we kept the results as in the original ms, and we included a new section in the Methods to describe our control analyses (p.16-17).

      (3) There are some issues with the strength and interpretation of the preference bias that need to be addressed. Re: strength and significance of the preference bias, the text seems to overemphasize the dependence of the effect on relative value (rotation of the rho-2 vs rho-1 ellipse) at the cost of the simple task difference (shift in the ellipse above the identity line). Conceptually, a preference bias (an shift in relative value towards the favored item) requires only the task difference, not the dependence on relative value. It would be clearer for example if the main text (pg. 6) presented the statistics (t-test, Wilcoxon) supporting the difference in relative values (rhos) between Tasks 1 and 2. Furthermore, the rotation does not seem as robust: the text states that the result is significant in both animals (p<0.04) but the ANCOVA results (Fig 3C and 3F) suggests that the effect is only significant in Monkey J. Is the preference effect significant only in one animal, and if so, is the effect significant across the combined data?

      Let us refer to Fig.3C. There is no question that the separation between the red and blue lines is statistically significant (order bias). In addition, the two lines appear (a) displaced upwards and (b) rotated counterclockwise compared to the identity line. In our understanding, the question raised by R2 is whether the two effects – displacement (a) and rotation (b) – are both present and both necessary to define the preference bias. We actually gave this issue extensive thought early on, and we concluded that displacement and rotation are not easily dissociable, at least in our data set. The reason is simple: to dissociate them, we would have to make some assumption about the center of rotation. For example, if we assume that the center of rotation is [0, 0], then there clearly is a rotation but the displacement is close to zero. Conversely, if the center of rotation is [1, 1] (which, in some ways, is a more logical assumption), the rotation is still there but the displacement is >0. When we considered these elements, we realized that any choice of a center of rotation would be somewhat arbitrary. Further complicating things, once a center of rotation is chosen, rotation and displacement are non-commutative operations. Importantly, this issue only affects the displacement, meaning that the rotation angle (and its statistical significance) does not depend on choosing any particular center of rotation. In this light, we chose to define the preference bias in a way that is more tight to the rotation than to the displacement, while noting that the net effect of the phenomenon was to bias choices in favor of the preferred juice (hence, the phrase “preference bias”). The only problem with this definition is that it doesn’t do full justice to the phenomenon in monkey G (Fig.3F), where the displacement is more clearly evident than the rotation (indeed, the latter only trends towards statistical significance (p=0.07)). Still, we don’t see a better way to design our analyses. Thus we kept the ms unchanged in this respect.

      (4) On a related note, the authors present and view the effects as detrimental for the animals, but I think they have to more explicitly state how they are defining outcomes. For example, the abstract states "By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey". Does this mean that outcomes are less valuable, with value defined by (offer value cell) firing rates? A clarification is particularly important for the preference bias, where animals show a stronger bias for the preferred option compared to simultaneous choice. At the behavioral level, this effect seems to only be a poorer outcome if one assumes that simultaneous choice demonstrates true values - can it not be assumed that sequential choice demonstrates true preference, and the preference bias reduces performance in simultaneous choice? The authors may have an explanation in mind based on OFC value coding, and it would be helpful to be explicit here.

      Thank you for raising this question. The revised ms includes a new section (Discussion; ‘The cost of choice biases’; p.13) that discusses this important issue. In a nutshell, if in two conditions subjective values are the same but choices are different, in one or both conditions the subject fails to choose the higher value. In that sense, the choice bias is detrimental. Our analyses of neuronal activity indicated that subjective offer values were (a) the same in the two tasks and (b) independent of the presentation offer in Task 2. Hence, both the preference bias and the order bias were detrimental to the animal.

      (5) Finally, at a broad level, the authors rigorously define and test hypotheses about how the different behavioral effects relate to OFC activity within the context of their neurocomputational framework (offer value, chosen value, chosen juice cells arranged in a competitive inhibition network; Fig. 1). However, it should be acknowledged that the primary conclusions - about how the different behavioral effects arise during valuation, comparison, or post-comparison - relies on the assumption that the different OFC response patterns reflect these specific circuit functions, and that OFC is causally related to choice. It would be more balanced if the authors could acknowledge this point in the discussion, and discuss any relevant potential alternative explanations for their findings.

      This issue is addressed above (Essential revision, point 1). In essence, R2 is correct: all our analyses were designed, and all our results are interpreted, under a series of assumptions. Most of these are backed by empirical evidence (e.g., showing that the encoding of decision variables in OFC is categorical in nature). However, one assumption remains a working hypothesis. Specifically, we assume that the cell groups identified in OFC constitute the building blocks of a decision circuit. If so, the activity of different cell groups may be associated with different computational stages. We edited the Discussion to clarify this point (p.11-12). As for possible alternative explanations, we agree that it is a very reasonable question to ask, but we honestly are at a loss addressing it. Indeed, one would never conduct the analyses presented in this ms if not in the framework of Fig.1. Consequently, it is hard to come up with any interpretation for the results without embracing that computational framework. If R2 can propose some alternative interpretation for the results presented in the ms, we would be more than happy to think about it, and possibly revise our thinking.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements [optional]

      We are grateful for the very kind, thoughtful, and detailed comments of the reviewers, which we have strived to fully integrate into the revised manuscript.

      Of note are the concerns with the data from stages S21 and S22, which we acknowledge do appear to be qualitatively and quantitatively distinct from the other samples. While we are unable to completely disambiguate meaningful biological variation from technical or experimental noise using our data, we hope a few additional analyses and visualization tools we have included can provide greater confidence in the reliability of our findings.

      Additionally, while attempting to evaluate Reviewer #2’s suggestions about examining the distribution of intergenic peaks along the genome, we discovered an error in our code that resulted in the improper assignment of peak categories. The error resulted in the improper assignment of intronic and exonic peaks as intergenic peaks. While the largest group of peaks in our dataset remains distal intergenic peaks (30.2%), and distal intergenic peaks remain a larger proportion of our intergenic peaks than proximal intergenic peaks, many of the peaks originally assigned to the intergenic categories have been reclassified as exonic or intronic peaks. We have updated our code and figures upon reanalysis of our data and have revised our findings and discussion accordingly.

      Description of the planned revisions

      Reviewer #3, Comment #3 of 11_

      “In general, I thought that the bioinformatic methods (i.e., the code or the options used for each program) would have been helpful for my understanding in some cases. The authors say that these will be published on an accompanying GitHub repository, which should be fine if this is sufficient for journal policy.”_

      We are still at work compiling the code for our analyses into a more reader-friendly form and setting up a GitHub repository to enable easy access to more detailed methods for interested readers. Some of the most important settings have been included in the Methods and Supplementary Methods sections, but we hope to include more thorough detailing of our pipelines in the GitHub repository. The raw data for portions of the RNA-Seq and all of the ATAC-Seq data have been uploaded to the Sequence Read Archive, and we are finalizing additional raw data submission. We are also in the process of determining what data to include in our Gene Expression Omnibus submission, which we hope to include all pertinent final data analysis files as well as any intermediate or accompanying datasets which would facilitate downstream analyses. The large size and number of our final analysis files has resulted in some challenges with data transfer and storage, which has delayed the upload and submission process.

      We are also collating several of the data visualization scripts built for this manuscript into a Jupyter notebook. This tool will enable the visualization of ImpulseDE2 models and peak classifications for arbitrary genes and genome regions of a user’s choice, alongside additional functions which are discussed in this revision plan.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have addressed the following substantive concerns with the manuscript:

      Reviewer #2, Comment #2 of 3:_

      “Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.”_

      Reviewer #3, Cross-comment #2 of 3:_

      “Focus on stages S21/S22: This might indeed be somewhat problematic. The libraries from these two stages (particularly S21) seem to be very different from those from the other stages. In the PCA (Fig. 1C), S21 doesn't cluster well with anything, and the difference between the two replicates is massive compared to other stages. The accessibility pattern (Fig. 1D) also looks odd. The libraries also have the lowest scores for % of mapped reads (Fig. S2B), fragment size distribution (S2E), and Spearman correlation (S2I). All this could be biologically sound and be due to a major developmental transition at this point, but maybe it justifies revisiting the data and testing whether leaving out S21 (and/or S22) makes a big difference for the clustering analyses.”_

      1. Reviewers #2 and #3 discussed concerns with the outlying nature of libraries S21 and S22. We had also previously held concerns about these samples and had performed some analyses to examine whether the global properties of our dataset are dramatically changed upon removing those samples. We did not observe dramatic changes to the structure of our data in the absence of the S21/S22 samples.

        • a. Samples S21 and S22 appear to be highly separated from the rest of our data using Principal Components Analysis. We had also previously believed that this suggested that these samples might be problematic. However, a colleague indicated to us that researchers in microbiome ecology had observed similar phenomena, often caused by strong single axes of variation (or “linear gradients”) in the datasets. In “Uncovering the Horseshoe Effect in Microbial Analyses” (mSystems, 2017) by Morton et al., the authors describe how a strong linear gradient can create a “horseshoe effect” or “Guttman effect”, where PCA results in the two ends of a linear gradient appearing to come together in ordinal space. The authors also describe a similar “arch effect” which strongly resembles the general shape of our PCA curve. We suggest that the strong apparent “outlier” appearance of S21 and S22 may be exaggerated or induced by the technical “arch effect” phenomenon, and may be caused by a strong single biological gradient – a developmental timecourse – which our data aimed to capture.
        • b. We also performed PCA on our dataset with the S21 and S22 time points removed prior to performing the analysis (see right panel, bottom). When we did so, we observed that the relative positions of the remaining libraries remains largely similar, with time points closer to the middle of development showing a positive loading in PC2, and time points closer to the beginning and end of development showing a negative loading. This suggests that the second major axis of variation in our dataset would remain a contrast between middle vs. terminal timepoints, even without the S21/S22 data, and that the relative positioning of the remaining data within PC-space is not entirely driven by S21/S22.
        • c. To further assess the degree of the S21/S22 samples’ outlying effects, we also performed ImpulseDE2 analysis to generate model fits without S21/S22 data. Doing so allowed us to determine to what degree the S21/S22 stages are necessary for driving the accessibility trajectory of individual peaks, and of the data more broadly. We performed IDE2 with either all data, or the S21/S22 data removed prior to input into IDE2. This generated two sets of model fits to the “cloud” of accessibility vs. time measurements: one that included the S21/S22 data, and one without. We evaluated, for each peak in our dataset, the time point at which the IDE2 model achieved maximum accessibility (the “IDE2 max fit”), and plotted both the “all” and “noS21S22” data as a histogram (see right panel, top graph). The presence of peaks that achieve predicted maximum accessibility in the S21/S22 stages in the “no S21/S22” data is a result of how we calculate “max fit”, which does not require that there is a known accessibility value at a given timepoint; only that the time point during which the model fit is maximum is closest to the timing of that developmental stage. Overall, we still observed early, middle, and late enrichment of IDE2 max fit even when the S21/S22 data are removed. We do see a rightward shift in the middle timepoint histogram in the direction of later stages, although this may be expected given the absence of concrete accessibility values at S21/S22 in the “no S21/S22” data. This indicates that our data globally retain the general trends of early, middle, and late enrichment of accessibility in the absence of the S21/S22 data. Moreover, this suggests that, even without the S21/S22 data, the remaining data from early and late stages result in a model fit that still predicts maximum accessibility at middle developmental stages for many peaks.
        • d. To further measure the influence of the S21/S22 data in IDE2 model fit, we also evaluated the degree of change in the global behavior of a peak when the S21/S22 stages were removed. This analysis aimed to assess whether removing S21/S22 data resulted in an IDE2 model with the same general trajectory as with all data, as opposed to the more stringent requirement of evaluating whether the exact developmental stage of the peak was changed. To perform this analysis, we grouped developmental stages into five quintiles, each representing three stages of development. We asked, for each peak in our dataset, whether that peak’s IDE2 max fit was “stable” when the S21/S22 data were removed; that is, if the quintile of the IDE2 max fit was altered when the S21/S22 data were removed (i.e. if a peak moved more than 3 developmental stages away from its original position), a peak was considered “unstable”. We observed that over 80% of peaks in each quintile remained “stable” after removing the S21/S22 data, suggesting that the vast majority peaks show the same general trajectory of accessibility even without the S21/S22 data. Peaks within the middle time points appeared to be more unstable than peaks at the terminal timepoints, which could be expected given that the S21/S22 timepoints constituted the middle-most timepoints in our dataset.

      We acknowledge that the S21/S22 timepoints still appear to be qualitatively different in other ways. Moreover, we acknowledge that some of the peaks in our dataset are “dependent” on the S21/S22 stages, given that their accessibility trajectory changes when these stages are removed. It is difficult to determine whether a change in accessibility trajectory for a given peak caused by the removal of S21/S22 data is indicative of technical differences in sample preparation, such as batch effects; biological variation, such as a potentially unknown mutant or sick embryo; or due to genuine wildtype biological processes that occur at the S21/S22 stages.

      These caveats acknowledged, a comparative analysis of the data in the absence of the S21/S22 stages suggests that much of the global picture of development remains the same. In the interest of providing the data we generated as a resource, we decided to include the S21/S22 data in the final manuscript we have prepared for submission.

      We have included an additional supplementary figure (Supp. Fig. 2.2) highlighting these further analyses, which we hope future readers will consider when performing their own analyses with these timepoints, as well as a summary of the ways we evaluated this potential concern in the Supplementary Methods. To facilitate future users of this dataset, we will include the model parameters calculated from IDE2 using both the full dataset and the data with S21/S22 removed in the GEO accession data, as well as a Jupyter notebook (ParhyaleATACExplorer.ipynb) that allows users to plot the raw accessibility data and IDE2 model fits for individual peaks of interest (C, example on right panel), so that downstream experiments can consider the potential differences with the S21/S22 samples.

      Reviewer #2, Comment #3:_

      “The majority of ATAC-seq peaks in the distal intergenic regions is a very surprising result. Authors defend this result by suggesting that this organism has big genome. May author perform a short analysis that shows that these peaks are indeed represent nearby genes or may point towards 3D genome organisation. For example, I see that this genome might have regions in the genomes that are densely organised in gene clusters, in those cases does the pattern remains same i.e he majority of the genes are very distant from each other and hence use vital regulatory elements?”_

      Reviewer #3, Cross-comment #3 of 3:_

      Peaks in distal intergenic regions: I agree that this could be elaborated on. It might also be that >10 kb is not actually that distal for Parhyale. I would suggest to split the "distal peaks" further (e.g., in 10 kb or 2-log steps, or whatever makes most sense) and try to understand if >10 kb is mostly <20 kb, or if most of them are hundreds of kb from the nearest gene?_

      1. Reviewers #2 and #3 expressed interest in understanding the absolute distribution of distal intergenic peak distances from nearby genes in our dataset. In generating the analyses to address this question, we stumbled upon an error in our code that reveals that the true number of intergenic peaks is much lower than we had originally reported. We discuss the nature of the error below. Moreover, we address the previous question using the new data, which overall still indicates that distal intergenic peaks remain a large portion of the Parhyale genome.
        • a. To address Reviewer #2’s comments with respect to the presence of potential clusters of intergenic regions, we built a Python tool (included in ParhyaleATACExplorer.ipynb) enabling the visualization of different cis-regulatory element categories along a genomic coordinate. Upon plotting our data with this tool, we observed problems with the categorization of the peaks – namely, that intronic and exonic peaks were erroneously classified as intergenic peaks (see right panel, top). We analyzed our script for classifying annotations more carefully and realized that we had erroneously used “bedtools closest” instead of “bedtools intersect” to try to identify all peaks overlapping with gene annotations in our genome. We corrected this error and observed the expected distribution and categories of peaks in our data (right panel, bottom).
        • b. The revised peak categories have been added to the updated manuscript in Fig. 3H and Fig. 5C. The categories of peaks we observed differ substantially from our previous results, in that we observe a much higher representation of exonic and intronic peaks in our dataset, with intronic peaks now representing 28.2% of all peaks (increased from <1%), and distal intergenic peaks representing 30.2% (decreased from 51.2%). While distal intergenic peaks remain the largest category over time, the proportion is relatively equal to the fraction of intronic peaks. Intergenic peaks (distal and proximal combined) now make up only a slightly larger fraction of peaks (37.2%) than gene body peaks (exon, intron; total 34.4%). This updated result is a significant departure from our previous report, and we have updated the text of the manuscript to correct this mistake.-
        • c. While intergenic and distal intergenic peaks constitute a much smaller portion of our data, we still wanted to address Reviewer #2 and #3’s questions about the distribution of distances between intergenic peaks and nearby genes. We generated a plot to illustrate the number of intergenic peaks at variable distances to the nearest gene (B, right panel). As illustrated in the plot, there are a very large number of distal intergenic peaks, including many peaks >100kb away from the nearest gene. The average distance of intergenic peaks from the nearest gene was 73,351bp. We neglected to mention in the original manuscript that one of the rationales for choosing a 10kb cutoff as “distal intergenic” was that peaks beyond this distance would be considerably more difficult to isolate as single fragments combined with a proximal promoter using PCR, agnostic of their orientation with respect to the promoter element. Such peaks could not have been easily identified using previous transgenic approaches, and are thus distinguished from “proximal” peaks by their necessary identification using techniques such as ATAC-Seq. We have updated the text to reflect this distinction.
        • d. Given that both intergenic and gene body peaks appeared to comprise large fractions of our revised data, we also examined the relative enrichment of intergenic and gene body peaks with respect to time (after normalizing for the fraction of “unknown” peaks, as suggested by Reviewer #3). We observed that the proportion of peaks belonging to intergenic and promoter regions declined slightly as development progressed, while the proportion of gene body peaks increased (E, below). There appeared to be slightly more intergenic peaks than gene body peaks at all developmental time points, and the ratio of intergenic peaks to gene body peaks declined very slightly over time (F, below). These data indicate that intergenic and gene body peaks have different enrichment trajectories over time. As development progresses, gene body peaks are increasingly enriched, and may have a greater impact on gene regulation. We have added these additional observations to the text and to a new Supplementary Figure 2.3.

      We have also addressed the following textual and conceptual concerns with the manuscript:

      Reviewer #3, Comment #1 of 11_

      I felt that the first paragraph of the introduction is not necessary._

      1. We believe the introductory paragraph helps frame the paper in the context of the broader scope of advances in technologies for emerging research organisms – currently, it has become straightforward to both generate a genome sequence and to identify and manipulate coding genes of interest across diverse taxa, but the identification of gene regulatory mechanisms remains more difficult. We have edited the introduction to better reflect this perspective and to link the first paragraph to the rest of the paper.

      Reviewer #2, Comment #1 of 3_

      “In Introductory paragraph 2, sentence one, authors suggest that gene regulation plays more important role in evolutionary process than genes. Although a significant amount of research has been dedicated to gene regulation based evolution still this field is in nascent form. For example evidence of inheritance of the gene regulation pattern across generation is scarce and requires more evidence. I suggest authors to modulate the claim that still gene based evolution is the main paradigm instead otherwise.”_

      Reviewer #3, Cross-comment #1 of 3_

      Evolution via gene regulation vs. coding sequence: While (to my understanding) it is largely accepted in the field that changes to the CDS will often have more deleterious effects than changes to the expression of a gene, I agree that this could be elaborated on a bit.

      1. As requested by Reviewers #2 and #3, we have clarified the language surrounding the debate between gene functional and gene regulatory evolution to indicate that both mechanisms appear to be important for evolutionary processes, with the importance of the latter more recently revealed.

      Reviewer #3, Comment #2 of 11_

      Use of Genrich: I presume this was run on both duplicates simultaneously? This is not clear from the methods section. It might have implications for downstream analyses (e.g., differential accessibility between time points) because running on both sequencing library replicates simultaneously leads to a single "replicate" of peaks per time point, while running it individually leads to two. However, I have never tested if this actually does make a difference. Maybe the authors have and can comment on this?

      1. In response to Reviewer #3’s inquiry about Genrich, we have added additional clarifying information into the Methods section. “Genrich analysis was run on both duplicate libraries simultaneously; Genrich performs peak calling on each peak individually, and then merges the p-values of the replicates using Fisher’s method to generate a q-value, obviating the need to calculate an Irreproducible Discovery Rate (IDR).” We did not test running Genrich on individual libraries, opting for the more conservative approach of using the combined q-value as a filtering score for peak quality. For further information, the reviewer can see the Genrich Github repository section here: < [https://github.com/jsh58/Genrich#multiple-replicates]

      Reviewer #3, Comment #4 of 11_

      The section on the IDE2 models (the paragraph at the end of page 4/beginning of page 5) was unclear to me but appears sound. (The only instance where I didn't quite understand what the program actually does.) Maybe this can be explained a bit easier?_

      1. As requested by Reviewer #3, we have attempted to explain the methods and logic of using ImpulseDE2 a bit more clearly:

      “To identify regions of dynamically accessible chromatin, we used the ImpulseDE2 (IDE2) pipeline (Fischer et al., 2018). IDE2 differs from other software for differential expression analysis in that it allows the investigation of trajectories of dynamic expression over large numbers of timepoints. It does so by modeling a gene expression trajectory as an “impulse” function that is the product of two sigmoid functions (Chechik and Koller, 2009; Yosef and Regev, 2011). This approach enables the modeling of a trajectory of gene expression in three parts: an initial value, a peak value, and a steady state value, thus summarizing an expression trajectory using a fixed number of parameters. With the ability to capture the differences between early, middle, and late expression values for each gene in a dataset, IDE2 also enables the detection of transient changes in gene expression or accessibility during a time course. Identifying differential expression over large numbers of timepoints is difficult for more categorical differential expression software such as edgeR and DESeq2, which generally use pairwise comparisons between timepoints to assess change over time (Love et al., 2014; Robinson et al., 2010).”

      Reviewer #2, Comment #2 of 3_

      2-2) Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.

      Reviewer #3, Comment #5 of 11_

      On page 7, Fig.3J needs changing to 3H. This figure should, in my opinion, also contain the absolute number of peaks for each time point to set the individual proportions into context.

      1. As requested by Reviewer #3, we have added a bar charts representing the number of peaks found at each time point (Fig. 3H) and the number of peaks found in each cluster (Fig. 5C) to the peak type proportion plots. We have also fixed references to Fig. 3J to instead refer to Fig. 3H – we apologize for the confusion.

      Reviewer #3, Comment #6 of 11_

      Last paragraph of the "Improving the Parhyale genome annotation" section: I think this needs to focus on those regions of the genome for which the location is known - after all, the "unknown" regions" could all be "distal transgenic", which would significantly change the relative proportions._

      1. We have revised our analysis of this topic with our updated peak type proportions, as described above in point 2d above under “substantive concerns”.

      Reviewer #3, Comment #7 of 11_

      “On page 9, t-SNE is mentioned but doesn't seem to be cited.”

      1. As requested by Reviewer #3, we have added citations for the t-SNE method, as well as scikit-learn, the software we used for t-SNE visualization.

      Reviewer #3, Comment #8 of 11_

      “The third paragraph on page 9 ("We evaluated the differences...") should mention the fact that clusters 1 and 2 are the only ones with significant proportions of exonic and intronic peaks. In the accompanying figure (5C), the total number of peaks would again be helpful.”_

      1. After identifying the error in our peak category classification pipeline, this observation was no longer true. However, upon examining the new distributions by cluster, we observed that in Clusters 3–7, for which we observed GO enrichment for developmental processes, there appeared to be slightly higher enrichment of intronic regulatory elements than distal intergenic regulatory elements. These results resemble the observation from recent work showing that tissue-specific enhancers are enriched in intronic regions in various human cell types (e.g. Borsari et al. 2021, Genome Research). We have noted this new observation in the text.

      Reviewer #3, Comment #9 of 11_

      In figure 5D, I can't quite make out at which stage the dip in the peak of Cluster 8 occurs. This is quite an unusual pattern of accessibility change, and I can't help but wonder if it has something to do with the quality of one of the libraries? Also, the fact that half of the peaks fall into unmapped regions of the genome is unusual, and I feel this deserves more discussion._

      1. In Figure 5D, Reviewer #3 asks about a dip in accessibility for Cluster 8 peaks. The dip in accessibility was actually observed for Cluster 9 peaks and is marked by the asterisk in that panel. We have updated the figure legend to clarify the significance of the asterisk and have referred readers to examine Supp. Fig. 5.1B, where the IDE2 model fits more clearly show a collective dip in accessibility for Cluster 9 peaks. Upon examining the size distribution of the clusters, we have also noticed that Cluster 8 is the smallest cluster. We have noted the small cluster size and high “unknown” peak enrichment for Cluster 8 in the text.

      Reviewer #3, Comment #10 of 11_

      “On page 10, the abbreviation PFM appears, but it is only explained in the legend of Fig.4. This should appear in the text.”_

      1. Reviewer #3 mentions that on page 10, we use the abbreviation for position frequency matrices (PFMs) without previous reference. We first introduce the abbreviation on page 8, but given the repeated use of “PFM” on page 10, we have added an additional explanation of the abbreviation on page 10, for ease of reading.

      Reviewer #3, Comment #11 of 11_

      “The section on "Concordant and discordant expression and accessibility" is the one I disagree most with. The authors seem to suggest that a repressive cis-regulatory module should become less accessible when the gene is activated. However, they leave trans-acting factors completely out of their conceptualisation here. It is in general likely the availability of transcription factors that leads to repression, while the "silencer" can be well accessible in all cells. Moreover, it has become clear in recent years that CRMs are not just repressors or enhancers per se but can act as either depending on the availability of transcription factors. I think these facts could partially explain the weak correlation and should be discussed.”_

      1. We appreciate the comments from Reviewer #3, which alerted us to the more recent literature around the bifunctional potential of regulatory elements. We have revised our claims to clarify that concordance and discordance analysis cannot be used to directly assign “enhancer” or “silencer” identity to given regulatory elements. Instead, we suggest that evaluating concordance and discordance can be useful for downstream users of our data, such as those aiming to build reporter constructs for a given gene of interest. To facilitate such tool development, we have built additional functions into a Jupyter notebook to enable the visualization of accessibility, gene expression, fold change of accessibility and gene expression, significance of fold change, and concordance/discordance assignment for arbitrary peak-gene pairs. An example of this visualization is shown on the following page. Panel A shows the region around the Engrailed-1 and Engrailed-2 loci in Parhyale (text labels within the plot region were added manually in Illustrator). Panel B shows visualization of the En1 promoter peak alongside En1 expression. Significant log fold changes (DESeq2 padj < 0.05) are marked by asterisks in the bar plots, and concordance/discordance assignment at each time point is indicated by the color of the comparison text (red = concordant, blue = discordant). Panels C and D show accessibility and expression visualization for a single peak (En1 peak5) compared to two nearby genes (En1 and En2). We hope to include sufficient documentation in our GitHub repository such that using these tools is accessible for most researchers, even with limited programming knowledge.

      Description of analyses that authors prefer not to carry out

      We were unable to easily visualize the distribution of regulatory elements across the whole genome as suggested by Reviewer #2. One challenge of working with the Parhyale genome is the lack of complete chromosomes. The genome is distributed across ~290,000 contigs of variable size. We were unable to find any software that could be easily and quickly set up to visualize our data, although we will provide in a Jupyter notebook the tools for local visualization of peak types that we developed.

    1. publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

      That is the key issue

    1. A lot of us may have felt pressure at times to find our purpose — to find our one true cause, our personal mission, what we personally should be doing and where we fit in.

      I think everyone rushes to find out their purpose in life but I think it's fine not to know. You'll get there eventually in life. My purpose in life has always been to be a good person and I realized that purpose a long time ago while I was in a bad place in life. Your purpose doesn't have to be the same as anyone else's. It's simply yours and you choose what to make of it.

    1. The new lines you mention really are present in the text content of the element. HTML tags are not being replaced by new lines, they just get omitted entirely. If you look at the textContent property of the <p> element you selected in the browser console, and you'll see the same new lines. Also if you select the text and run window.getSelection().getRangeAt(0).toString() in the browser console you'll see the same new lines. In summary, this is working as it is currently expected to. What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When copying to the clipboard, new lines in the source get replaced with spaces, and <br> tags get converted to new lines. Browser specifications distinguish the original text content of HTML "in the source" as returned by element.textContent from the text content "as rendered" returned by element.innerText. Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text. This behavior causes issues with line breaks as well. It might make sense for us to look at capturing the rendered text (as copied to the clipboard) rather than the source text in future. We'd need to be careful to handle all the places where this distinction comes up, and also make sure that all existing annotations anchor properly. Also we should talk to other parties interested in the Web Annotations specifications to discuss how this impacts interoperability.
      What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When <mark>copying to the clipboard, <mark style="background-color: #8000314f">new lines in the source</mark> get <mark style="background-color:#00800030">replaced with spaces</mark>, and <br> tags get converted to new lines</mark>. </br> <mark>Browser specifications distinguish <mark style="background-color: #00800036">the original text content of HTML "in the source"</mark> as returned by <mark style="background-color: #00800036"/>element.textContent</mark> from <mark style="background-color: #ffa500a1">the text content "as rendered" returned by element.innerText.</mark></mark> Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text.
    1. "As We May Think" predicted (to some extent) many kinds of technology invented after its publication, including hypertext, personal computers, the Internet, the World Wide Web, speech recognition, and online encyclopedias such as Wikipedia:

      Dispositivo avanzado para la época, pudo predecir de forma general el funcionamiento de la web hoy en día, aun así ni siquiera se ha igualado ese nivel de pensamiento, puesto que el Memex planteaba una forma de imitar procesos neuronales complejos de organización y asociación.

    1. Herald: Nay, ill it were to mar with sorrow's tale The day of blissful news. The gods demand Thanksgiving sundered from solicitude. If one as herald came with rueful face To say, "The curse has fallen, and the host Gone down to death; and one wide wound has reached The city's heart, and out of many homes Many are cast and consecrate to death, Beneath the double scourge, that Ares loves, The bloody pair, the fire and sword of doom"-- If such sore burden weighed upon my tongue, 'Twere fit to speak such words as gladden fiends. But--coming as he comes who bringeth news Of safe return from toil, and issues fair, To men rejoicing in a weal restored-- Dare I to dash good words with ill, and say How the gods' anger smote the Greeks in storm? For fire and sea, that erst held bitter feud, Now swore conspiracy and pledged their faith, Wasting the Argives worn with toil and war. Night and great horror of the rising wave Came o'er us, and the blasts that blow from Thrace Clashed ship with ship, and some with plunging prow Thro' scudding drifts of spray and raving storm Vanished, as strays by some ill shepherd driven. And when at length the sun rose bright, we saw Th' Aegaean sea-field flecked with flowers of death, Corpses of Grecian men and shattered hulls. For us indeed, some god, as well I deem, No human power, laid hand upon our helm, Snatched us or prayed us from the powers of air, And brought our bark thro' all, unharmed in hull: And saving Fortune sat and steered us fair, So that no surge should gulf us deep in brine, Nor grind our keel upon a rocky shore. So 'scaped we death that lurks beneath the sea, But, under day's white light, mistrustful all Of fortune's smile, we sat and brooded deep, Shepherds forlorn of thoughts that wandered wild, O'er this new woe; for smitten was our host, And lost as ashes scattered from the pyre. Of whom if any draw his life-breath yet, Be well assured, he deems of us as dead, As we of him no other fate forebode. But heaven save all! If Menelaus live, He will not tarry, but will surely come: Therefore if anywhere the high sun's ray Descries him upon earth, preserved by Zeus, Who wills not yet to wipe his race away, Hope still there is that homeward he may wend. Enough--thou hast the truth unto the end.

      Herald: menelaus had disappeared don't make me taint good news with bad

               there was a storm and boats crashed but we were spared, they may be alive but they will think we are dead just as we think they are dead
      
                wait for Menelauss's return because Zeus favors him
      

      .

    2. Think you--this very morn--the Greeks in Troy, And loud therein the voice of utter wail! Within one cup pour vinegar and oil, And look! unblent, unreconciled, they war. So in the twofold issue of the strife Mingle the victor's shout, the captives' moan. For all the conquered whom the sword has spared Cling weeping--some unto a brother slain, Some childlike to a nursing father's form, And wail the loved and lost, the while their neck Bows down already 'neath the captive's chain. And lo! the victors, now the fight is done, Goaded by restless hunger, far and wide Range all disordered thro' the town, to snatch Such victual and such rest as chance may give Within the captive halls that once were Troy-- Joyful to rid them of the frost and dew, Wherein they couched upon the plain of old-- Joyful to sleep the gracious night all through, Unsummoned of the watching sentinel. Yet let them reverence well the city's gods, The lords of Troy, tho' fallen, and her shrines; So shall the spoilers not in turn be spoiled. Yea, let no craving for forbidden gain Bid conquerors yield before the darts of greed. For we need yet, before the race be won, Homewards, unharmed, to round the course once more. For should the host wax wanton ere it come, Then, tho' the sudden blow of fate be spared, Yet in the sight of gods shall rise once more The great wrong of the slain, to claim revenge. Now, hearing from this woman's mouth of mine, The tale and eke its warning, pray with me, "Luck sway the scale, with no uncertain poise. For my fair hopes are changed to fairer joys."

      we won, troys' triumphant and subdued are like oil and water the triumphant revel in it, the subdued weep and toil

      if we don't desecrate troys' shrines we'll be fine but if our people do it'll be bad

      we all have cause to celebrate

    1. Blog Tucker Carlson: Biden Giving WHO Power to 'Deploy Proactive Countermeasures Against Misinformation and Social Media Attacks' By Craig Bannister | May 20, 2022 | 10:39am EDT Tucker Carlson (Screenshot) Pres. Biden has found a new way to censor free speech – by giving the World Health Organization (WHO) control of Americans’ speech – Fox News Host Tucker Carlson warned on Thursday. After dissolving his “Disinformation Governance Board, due to public outcry, Biden is preparing to sign WHO’s new World Pandemic Treaty, giving a global operational control and power – through ‘proactive countermeasures’ - to combat what it deems “disinformation,“ Carlson explained, citing a WHO working group's draft text:#stickypbModal625{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal625 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-625"); }); “So, what would this ‘operational control’ mean? “Let’s be specific. Right off the bat, the treaty demands ‘National and global coordinated actions to address the misinformation, disinformation, and stigmatization that undermines public health.’ “Oh! Here we go! Right to censorship: ‘People are criticizing us, and for public health reasons, that can't be allowed. If you criticize us, people will die.’  “So, you saw yesterday that the Biden administration, in the face of universal laughter and derision, had to fire the head of its new Ministry of Truth - but they found another way to do it: ‘W.H.O. Secretariat to build capacity to deploy proactive countermeasures against misinformation and social media attacks.’” “So, they are going to get to censor anybody who doesn't agree with what they do, as they control the intimate details of your life,” Carlson explained: “And they will control those details. Under this treaty, the World Health Organization will get to establish vaccine passports and regulate travel. World Health organization will ‘Develop standards for producing a digital version of the international certificate of vaccination and prophylactics.’  “Okay.  “So you may think, ‘Well, it is just about COVID and I went along with mandatory vaccines and vaccine passports at the time, how bad could it be?’ [Laughs] First of all, if you went along with that, you should be repenting right about now. But, it is not just about COVID because the W.H.O. Will be in charge of ‘The digitalization of all health forms.’ The World Health Organization will also ‘Share real-time information about travel measures.’  “So you are going to find out exactly when you are allowed to get on a bus or train or airplane, or how about your bicycle, will they regulate that too? Maybe. Now the World Health Organization has sought this authority for years. Of course. Who doesn't want more power?” Carlson then played a foreboding comment by W.H.O. Director-General Tedros Adhanom Ghebreyesu. “Here’s Tedros back in April of 2020: “People in countries with stay-at-home orders are understandably frustrated with being confined to their homes for weeks on end. But the world will not and cannot go back to the way things were. There must be a new normal. A world that is healthier, safer, and better prepared.” Americans should question relinquishing control over their lives to an unelected person and global authority they had no say in choosing, Carlson said:#stickypbModal711{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal711 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-711"); }); “Okay, so there’s a guy with a long and documented history of subverting public health, who is clearly a liar, who is acting as an agent for the Chinese government, and you have to ask yourself, ‘Did I vote for that guy? Is he one of my elected representatives in this democracy? How did he get power over where I can travel and when?’ “Good question.”

      Summary of Tucker's televised evening talk show.

    1. Author Response

      Reviewer #3 (Public Review):

      The import of soluble precursor proteins into the mitochondrial matrix is a complex process that involves two membranes, multiple protein interactions with the translocating substrate, and distinct forms of energetic input. The traditional approaches for in vitro measurement of protein translocation across membranes typically involve radiography or immunodetection-based assays. These end-point approaches, however, often lack optimal resolution to analyze the sequential processes of protein transport. Therefore, the development of techniques to dissect the kinetic steps of this process will be of great interest to the field of protein trafficking.

      This study by Ford et al. employs a novel bioluminescence-based technique to analyze the import of presequence-containing precursors (PCPs) into the mitochondrial matrix in real time. As a follow-up study to previous work from the Collinson group (Pereira et al. 2019), this approach makes use of the split NanoLuc luciferase enzyme strategy, whereby mitochondria are isolated from yeast expressing matrix localized 'LgBiT' (encoded by the mt-S11 gene) and used for import experiments with purified PCPs containing 'SmBiT' (the 11-residue pep86 sequence). The light intensity that results from the high-affinity interaction of pep86 with mt-S11 is convincingly shown in this study to be a reliable reporter of protein import into the matrix space. Therefore, from a technical stance, this appears to be a very promising approach for making high-resolution measurements of the different kinetic steps of protein translocation.

      The authors leverage this technology to seek insights into several features of mitochondrial protein import, with some observations challenging key longstanding paradigms in the field. Using series of PCP constructs differing in length and placement of the pep86 peptide, the authors perform luminescence-based import tests with varying protein concentration, energetic input, and presequence charge distribution. Fits to the time course data suggest two main kinetic steps that govern matrix-directed import: transit of the PCP across the TOM complex into the IMS and association of the PCP with the TIM23 motor complex. The results support some very interesting insights into TIM23-mediated protein import, including: that precursor accumulation is strongly dependent on length; that the kinetically limiting step of IM transport is engagement with the TIM23 complex, not transmembrane transport itself; and that presequence charge distribution differently affects import rate and matrix accumulation. The results of this study appear repeatable among samples and the mathematical fits to time courses are well explained. However, there remain some questions about the nature of the experimental approach and the interpretation of the kinetics data in terms of the underlying biological processes. These questions are as follows:

      Major points

      Overall system characterization and mathematical analysis

      1) The Western-based characterization of the amount of matrix-localized 11S (shown in Figure 1 - figure supplement 1) shows that the concentration of 11S varies significantly (> twofold concentration difference, quantified as a ratio to Tom40) among yeast/mitochondria preps. Is there a particular reason for this large variability? Perhaps more significantly, the import efficiency (judged by luminescence amplitude) shows high batch variability as well (> twofold efficiency difference). While this series of experiments makes the case that the luminescence readout of import is not limited by matrix-localized 11S, it does raise a potential concern of batch-to-batch variation in import competence. Could this have any implications for the reproducibility of results by this assay, particularly regarding the kinetic parameters reported?

      It is very difficult to know what causes this variability as it can be seen even between triplicate preparations carried out on the same day. It could be due to slight differences in the flasks used to grow cells (such as the size of the baffles). However, we have determined that the variability in 11S concentration does not correlate with import competence (Figure 1 – figure supplement 1C), and that the kinetics of import are not affected (Figure 1 – figure supplement 2C).

      2) My understanding from the Pereira 2019 JMB paper is that the yeast expressing the matrix-targeted 11S were engineered so that the 11S construct contained a 35 residue presequence from ATP1. In Figure 1 - figure supplement 1, panel A, it looks like the mitochondria-derived 11S constructs are significantly larger than the purified 11S constructs used to calibrate concentration. If the added residues on the mitochondrial 11S constitute a presequence, then they should be cleaved up on import to yield the mature sized protein. Why are the mitochondrial 11S constructs so much larger than the purified ones? Explicit labeling of MW markers would be useful here.

      We noted that it seemed likely that the presequence was not getting cleaved off. There may also be some kind of SDS-PAGE mobility issues for 11S (common for beta-barrels), such that the purified version has a different mobility to the matrix localised version. Therefore, the possibility remains that the MTS is cleaved off, but the mature product migrates anomalously on gels. For this reason we carried out experiments to show that 11S is matrix localised, which turned out to be the case (Figure 1 – figure supplement 1D). So irrespective non-MTS cleavage, or unexpected gel mobility of correctly processed 11S, the reporter is where it should be – in the matrix. These points are elaborated in the text.

      Labels have been added to molecular weight markers, as requested.

      3) From Figure 1D, given that the amplitude linearly increases with added Acp1pep86 up to ~45 nM, this suggests that matrix-localized 11S is in stoichiometric excess of imported peptide within this range of added substrate. Given a matrix [11S] of 2.8 uM, a stoichiometrically equivalent amount of Acp1-pep86 would be equivalent to an import of <0.5% of added substrate, and it is suggested that import efficiency is actually much lower than that. How can this very low import efficiency be explained?

      Import is single turnover under our assay conditions and is therefore limited by the number of import sites rather than matrix [11S]. Under standard conditions, we intentionally add substrate in vast excess and only anticipate that a very small proportion will be imported.

      4) Apropos of point #3 above: Given the low efficiency of import observed for the purified PCP substrates in this study, one wonders if this due to the formation of off-pathway (translocation incompetent) precursors established during the import reaction, before substrates have a chance to engage OM receptors (e.g., due to aggregation, etc.) In this case, the interpretation of single-turnover conditions may instead be caused by a vast majority of PCP losing translocation competence, rather than the requirement for energetic resetting that is suggested. Might this be a possibility?

      We anticipate that some PCP will aggregate and add substrate in excess to allow for that. Our interpretation of the reaction as single turnover was drawn from a comparison of PCP-pep86-DHFR import amplitude in the presence versus absence of MTX, rather than amplitudes from absolute amounts of PCP. We cannot think of a reason why MTX would affect protein solubility.

      5) Import time courses in many cases show a progressive drop in luminescence at later time points after a maximum value has been reached. This reduction in signal cannot be accounted for by the two rate constants in the equation used in two-step kinetic model. How were such luminescence deviations accounted for when fitting data to obtain these kinetics parameters? What might be the reason for this downward drift in signal once maximum amplitude has been reached?

      We almost always see this gradual drop in luminescence in both the mitochondrial and bacterial systems. The data points acquired after the amplitude are excluded for the fitting. The assay is based on an enzymatic reaction and we think that the downward drift is due to a combination of substrate depletion and accumulation of reaction products.

      Import kinetics: dependence on total protein size

      6) In Figure 3 - figure supplement 1, some of the kinetic parameters from the PCP concentration-dependent responses are quite noisy. For instance, responses for the shortest constructs (L and DL) show a lot of variability in the k1 and k2 parameters. Is this (partly) due to difficulty in resolving these two parameters during the nonlinear least-squares fitting protocol for these particular constructs?

      It is difficult to resolve k1 and k2 perfectly, so the numbers are only estimates.

      7) The data in Figure 3, panels E and F (derived from Figure 3 - figure supplement 1) in some cases show non-linear dependence of kinetic parameters on the 'N to pep86 distance' for the length (panel E) and position (panel F) variants. For instance, from the length series, the k1 mean goes from 132 to 385 to 237 nM for the DL, DDL, and DDDL constructs, respectively. The variances suggest that these differences are real. Is there a reason that kinetic parameters would have such non-monotonic dependence on length?

      We don’t know the reason for this variance, but it could be investigated in future studies.

      Import kinetics: dependence on energetic input

      8) The data of Figure 4A show the results of partial dissipation of the membrane potential by 10 nM valinomycin. Most studies designed to cause a gradual dissipation of membrane potential do so by protonophore (e.g., CCCP) titration. Given that matrix-directed import is completely blocked by low micromolar amounts of this potent ionophore, it would be useful to have an independent readout (e.g., TMRM measurements) of the residual membrane potential that exists upon treatment with the lower concentrations of valinomycin used here.

      We have now included data that shows the partial effect of 10 nM valinomycin on membrane potential (TMRM measurements) and protein import (Figure 4 – figure supplement 1A-B).

      9) The step associated with k1, designated as transport across the TOM complex, is suggested to go to completion before starting the step associated with k2, engagement of the TIM23 complex. The k1 step shows a strong dependence on membrane potential (Fig. 4A, middle), particularly for the length series. Why would this be, given that no part of translocation across the OM should be associated with a valinomycin-sensitive electric potential?

      This effect is relatively small and mainly affects shorter PCPs. Our interpretation is that passage of the PCP through TOM is reversible, and committing PCP to import across the IMM (which requires ∆ψ) prevents this reversibility. However, it is also possible that transport through TOM and TIM23 are partially coupled. Both these possibilities are discussed in the discussion.

      Working model

      10) One of the most surprising outcomes of this study is that passive transport of substrates across the TOM complex and energy-coupled transport via the TIM23 complex are kinetically separable and independent events. As the authors note in the Discussion, the current paradigm of the field is that matrix-targeted substrates concurrently traverse the OM and IM via the TIM-TIM23 supercomplex, and this model is supported by quite a bit of experimental evidence. Even in this study, the fact that the PCP-pep86-DHFR construct exposes the pep86 sequence to the matrix in the presence of MTX (Figure 2) is evidence of a two membrane-spanning intermediate. Key mechanistic questions arise regarding the model proposed in this study. For example, if PCPs traverse the TOM complex as a stand-alone step, what is the driving force (e.g., a simple pathway of protein interactions with increasing affinity)? And would soluble, matrix-directed substrates be expected to accumulate in the very restricted space of the IMS? If so, how would TIM23directed membrane proteins keep from aggregating in the aqueous IMS? These questions would be worth addressing in the discussion of the model.

      We have included a discussion of the experimental evidence for TOM-TIM23 supercomplexes. The acid chain hypothesis has been proposed as the driving force for PCP transport though TOM ‒ an interaction between positive charges of the presequence and negatively charged residues within the TOM40 channel. Proteins that are targeted to the IMS are imported through TOM without the participation of TIM23 and we think that matrix-targeted proteins can do the same. This could explain why TOM is in excess over TIM23. We also think that some matrix-targeted PCPs can accumulate in the IMS, although this may not be true of membrane proteins.

      Import kinetics: dependence on MTS charge distribution

      11) The fact that import rates are increased with a more electropositive presequence makes sense in terms of the electrophoretic pull exerted on the PCP (matrix, negative). However, the greater accumulation of precursors containing more electronegative presequences remains puzzling. In the manuscript, this is explained based on the concept that accumulation of positive charges will cause partial collapse the membrane potential. However, I am still uncertain about this explanation for a few reasons. First, for each PCP, the presequence will constitute just a small fraction of the total length of the precursor, and therefore contribute a small fraction of the total charge density of imported protein. Would such a small change in total PCP charge be expected to have the dramatic effect observed among samples?

      The majority of the total PCP charge is from the mature region, and whilst the positive charges in the presequence undoubtedly deplete ∆ψ, the differences in extent of ∆ψ depletion that we see between PCPs that vary in charge, is due to the difference in charge of the mature regions (as their presequences are identical).

      Second, given the small amount of protein imported under these conditions, would the total charge of imported PCPs be expected to affect transmembrane ion distribution so significantly? For instance, as I recall, it takes up to micromolar amounts of mitochondria-targeted lipophilic cations (e.g., TPP+) to cause a major change in the TMRM-detected membrane potential.

      The effect was indeed unexpected. Despite the seemingly small number of PCPs that are imported, the total number of charged residues will be much greater.

      Finally, I would expect isolated mitochondria to be capable of respiratory control. It is well known, for example, that isolated mitochondria can respond to temporary draw-down of the membrane potential (e.g., by ADP/Pi addition) by going into state 3 respiration and restoring membrane gradients. Why would that not be the case here (Figure 5D)?

      The isolated mitochondria that we used for the import assays demonstrate increased O2 consumption in response to ADP addition, as expected (Figure 5 – figure supplement 1A-B). In addition to this new figure, we have now included TMRM data (Figure 6 – figure supplement 2B) that shows a depletion of ∆ψ in response to ADP addition, that is temporary and dependent on the amount of ADP added. We are therefore confident that our isolated mitochondria are capable of respiratory control as expected. We think that the lack of restoration of ∆ψ, following import-induced dissipation, is a consequence of the import process in vitro. Perhaps the import process compromises the channel resulting in concomitant ion/ charge dissipation during the active process. Moreover, this is likely to be exacerbated in vitro upon acute exposure to PCP, causing a sudden saturation of the import sites – thereby compromising the ∆ψ and the mitochondria’s ability to rapidly recover (this possibility has been noted in the MS).

      General

      12) Although the spectral approach in this study is developed as an alternative to the more traditional import assays, it would be useful to have some control import tests (done with Westerns or autoradiography) as complements to the luminescence-based imports. For example, control tests to accompany Figure 1 that show import efficiency or tests that accompany Figure 3 to show import of the different length and position series constructs. Perhaps this could be done with immunodetection of Acp1 or the pep86 epitope, showing protease-protected, processed import substrates that appear in a membrane potential/ATP-dependent manner. Even if the results from the more traditional techniques ran contrary to the results using the NanoLuc system, this would still allow the authors to compare which effects are consistent and which are dissimilar between different approaches.

      We have now included a Western blot import assay for the PCP-pep86-DHFR substrate and show that import is ∆ψ-dependent (Figure 2 ‒ figure supplement 1).

      13) The authors might also consider conducting imports with mitoplasts as a way to test the kinetic model that includes the TIM23-mediated step alone.

      We conducted import assays with mitoplasts and have now included this as a main Figure 5.

      14) It is difficult to follow the logic in the Discussion regarding the number of TIM23 sites limiting the number of 11S imported into mitochondria in live cells (page 15, lines 23-27). Are the authors suggesting that in vivo, one TIM23 complex serves to transport a single protein? This needs to be clarified.

      This has been removed, and this section of the discussion has been clarified.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper is very well written, the question is interesting, and the analyses are innovative. However, I do have concerns about the overall approach. My main concern is about looking at asymmetries in the low dimensional representation of connectivity. A secondary concern has to do with looking at the parcellated connectome. I explain these concerns in succession below.

      We thank the Reviewer for the appreciation of our work and the insightful comments, which we have addressed below. The page numbers are corresponding to the clean version of the manuscript.

      The first concern is to me quite a fundamental issue: looking at connectivity in a low dimensional space, that of the laplacian eigenvectors. There are two issues with this. The first one, which is less important than the second, is that the authors have a reference embedding to which they align other embeddings using a procrustes method with no scaling. While the 3D embedding is still optimally representing the connectivity (because distances don't change under rotations), we can no longer look at one axis at a time, which is what the authors do when they look at G1. In this case, G1 is representative of the connectivity of the reference matrix (LL), but not the others.

      But even if the authors only projected their matrices onto a single G1 dimension with no procrustes (and only sign flipping if necessary), there is still a major issue. One implicit assumption of this whole approach is that if there is a change in connectivity somewhere in the original matrix, the same "nodes" of the matrix will change in the embedding. This is not the case. Any change in the original matrix, even if it is a single edge, will affect the positions of all the nodes in the embedding. That is because the embedding optimises a global loss function, not a local one.

      To make this point clear, consider the following toy example. Say we have 4 brain regions A,B,C,D. Let us say that we have the following connectivity:

      In the Left Hemisphere: A-B-C-D

      In the Right Hemisphere: A-B=C-D

      So the connection between B and C is twice as strong in the right hemi, and everything else remains the same.

      The low dimensional embedding of both will look like this:

      Left: ... A ... B ....... C ... D ...

      Right A... ... ... B ... C ... ... ... D

      Note how B,C are closer to each other in the RIGHT, but also that A,D have moved away from each other because the eigenvector has to have norm 1.

      So if we were to calculate an asymmetry index, we would say that:

      A is higher on the LEFT

      B is higher on the RIGHT

      C is higher on the LEFT

      D is higher on the RIGHT

      So we have found asymmetry in all of our regions. But in fact the only thing that has changed is the connection between B and C.

      This illustrates the danger of using a global optimisation procedure (like low-dim embedding) to analyse and interpret local changes. One has to be very careful.

      We thank the Reviewer for the detailed description of the first concern. We agree that low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Moreover, we indeed assume that the connectivity embedding of a given node gives us information about its position along ‘gradients’ relative to other nodes and their respective embedding. Thus, indeed, when a single node (node X) has a different connectivity profile in the right hemisphere relative to the left, this will also have some impact on the embeddings of all nodes showing a relevant (i.e., top 10%) connection to node X.

      To evaluate whether asymmetry could be observed in average connectivity within functional networks, an alternative approach to measure asymmetry was taken by computing average connectivity within different functional networks. Following we compared the within-network connectivity between left and right. We have now added this conceptual analysis to our results robustness analysis section. In short, we observed that transmodal networks (DMN, FPN, and language network) showed higher connectivity in the left hemisphere but other networks showed higher connectivity in the right hemisphere. Thus, this indicates that observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres. We have now detailed the outcome of this analysis in our Result section and Supplementary Materials.

      Results, p.14.: “As low-dimensional embedding is a global approach to summarize functional connectivity we reiterated our analysis by evaluating asymmetry of within network functional connectivity in the current sample. Observations made with respect to asymmetry of functional gradients are similar to those observed for within-network functional asymmetry between the left and right hemispheres.”

      “To further explore functional connectivity asymmetry between left and right hemispheres, we calculated the LL within network FC and RR within network FC (Figure 2-figure supplement 5). It showed that connections in the left hemisphere and right hemisphere were relatively equal in the global scale. However, for the local differences, networks showed significant subtle leftward or rightward asymmetry (vis1: t = -5.203, P < 0.001; vis2: t = -22.593, P < 0.001; SMN: t = -8.262, P < 0.001; CON: t = -32.715, P < 0.001; DAN: t = -11.272, P < 0.001; Lan.: t = 33.827, P < 0.001; FPN: t = 24.439, P < 0.001; Aud.: t = 0.191, P = 0.849; DMN: t = 11.303, P < 0.001; PMN: t = -35.719, P < 0.001; VMN: t = -11.056, P < 0.001; OAN: t = 0.311, P = 0.756).”

      Irrespectively, we have further highlighted that such a global interpretation for asymmetry of areas is still meaningful, given that a node is always placed in a global context. We have now further explained that our metrics give insights in local embedding of global phenomena in the introduction, p. 3.

      Introduction, p. 3: “These low-dimensional gradient embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context.”

      My second concern is about interpreting the brain asymmetry as differences in connectivity, as opposed to differences in other things like regional size. The authors use a parcellated approach, where presumably the parcels are left-right symmetric. If one area is actually larger in one hemisphere than in the other, the will manifest itself in the connectivity values. To mitigate this, it may be necessary to align the two hemispheres to each other (maybe using spherical registration) using connectivity prior to applying the parcellation.

      Thanks for this nice idea. We have now computed the differences of the mean rsfMRI connectome along the first gradient at the vertex level using 100 random subjects, as we have the data mapped to a symmetric template (fs_LR_32k), indicating that each vertex has a symmetric counterpart in the right hemisphere. Our results show left-right asymmetry as language/default mode-visual-frontoparietal vertices, which is consistent with the main results of the parcel-based approach. We have also added this response to the Supplementary materials.

      Though overall findings are consistent, spherical registration may also have new issues. Total anatomical spatial symmetry may not provide functional comparability at the vertex level between left and right hemisphere. For example, during language tasks in the current sample, the activated frontal region in the left hemisphere is larger than the activated contralateral region in the right hemisphere. In the current study, we aimed to evaluate asymmetry between functionally and structurally homologous regions, as described by the Glasser atlas. In case of the resting state fMRI data, we used the region-wise symmetric multimodal parcellation (Glasser et al., 2016). This parcellation ensures the functional contralateral regions in both hemispheres. A previous study (Williams et al., 2021) investigated the structural and functional asymmetry in newborn infants. They used spherical registration (make fs_LR symmetric) for structural asymmetry but not for functional asymmetry. As such spheric registration may hide functional information, we think spherical registration may be more suitable for structural studies.

      To address the concern regarding the alignment of hemispheres, we used joint alignment for LL and RR to compare the results between this and the Procrustes alignment technique (Pearson r=0.930, P_spin<0.001), below is the figure of asymmetry along the principal gradient (upper: joint alignment, below: Procrustes alignment) indicating convergence between both approaches. We have reported this information in the Supplementary Materials.

      Lastly, we do agree that parcel size might be an important issue influencing the asymmetry pattern. To test for such an effect, we performed the correlation between the rank of parcel size (left-right)/(left+right) and rank of asymmetry index. It suggests only a small insignificant correlation along G1 (Spearman r_intra=0.130, P_spin=0.105; Spearman r_inter=0.130, P_spin=0.084). Of note, there is a systematic difference in parcel size as a function of sensory-association hierarchy, indicating that the link between parcel-size and asymmetry may vary as a function of sensory vs associative regions.

      Reviewer #2 (Public Review):

      Using recently-developed functional gradient techniques, this study explored human brain hemispheric asymmetry. The functional gradient is a hot technique in recent years and has been applied to study brain asymmetries in two papers of 2021. Compared to previous studies, the current study further evaluated the degree of genetic control (heritability) and evolutionary conservation for such gradient asymmetries by using human twin data and monkey's fMRI data. These investigations are of value and do provide interesting data. However, it suffers from a lack of specific hypotheses/questions/motivations underlying all kinds of analyses, and the rich observational or correlational results seem not to offer significant improvement of theoretical understanding about brain asymmetries or functional gradient. In addition, given the limited number of twins in HCP project (for a heritability estimation), the limited number of monkeys (20 monkeys), and the relatively poor quality of monkeys' resting functional MRI data, the results and conclusion should be taken cautiously. Below are major concerns and suggestions.

      We thank the Reviewer for the evaluation of our work and the helpful suggestions.

      The gradient from resting-state functional connectome has been frequently used but mainly at the group level. The current study essentially applied the gradient comparison (i.e., gradient score) at the individual level. Biological interpretation for individual gradient score at the parcel level as well as its comparability between individuals and between hemispheres should be resolved. This is the fundamental rationale underlying the whole analyses.

      We thank the Reviewer for this remark, and are happy to provide further rationale for using and comparing individual gradients scores to evaluate individual variation in asymmetry and associated heritability. Though gradients from resting-state functional connectivity have been frequently used at the group level, various studies have also studied individual differences. For example, using linear mixed models to compare gradient scores between left and right across subjects (Liang et al., 2021), applying the individual gradient scores to compare disease and controls (Dong et al., 2020, 2021; Hong et al., 2019; Park et al., 2021), and link individual hippocampal gradients to memory recollection (Przeździk et al., 2019). Together, these studies show individual variations of local gradients, indicating changes in node centrality and hubness (Hong et al., 2019), and connectivity profile distance (Y. Wang et al., 2021). Of note, low-dimensional embeddings describe global embedding of local features, rather than local phenomena. Thus, interpretation for asymmetry of areas is under a global context. The biological interpretation for individual gradients would be to what degree the system segregated and integrated has changed patterns of ongoing neural activity (Mckeown et al., 2020). It reflects that individuals have different functional boundaries between anatomical regions. Whereas, individual neurons are embedded under the global-local boundaries through a cortical wiring space consisting of intricate long- and short-range white matter fibers (Paquola et al., 2020).

      Introduction, p. 4: “We applied the individual gradient scores to study the asymmetry, consistent with prior studies (Gonzalez Alam et al., 2021; Liang et al., 2021). Individual variation along the gradients reflects a global change across subjects in the functional connectome integration and segregation, and it is under genetic control (Valk et al., 2021). Moreover, to what degree the system segregated and integrated relates to patterns of ongoing neural activity (Mckeown et al., 2020), and different individuals have different functional boundaries between anatomical regions.”

      Results, p. 5: “Next, individual gradients were computed for each subject and the four different FC modes and aligned to the template gradients with Procrustes rotation. It rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. As noted, Procrustes matching was applied without a scaling factor so that the reference template only matters for matching the order and direction of the gradients. Therefore, it allows comparison between individuals and hemispheres. The individual mean gradients showed high correlation with the group gradients LL (all Pearson r > 0.97, P spin < 0.001).”

      Only the first three gradients are used but why? What about the fourth gradient? Specific theoretical interpretation is needed. At the individual level, is it ensured that the first gradients of all individuals correspond to each other? In this study, it is unclear whether we should or should not care about the G2 and G3. The results of G2 and G3 showed up randomly to some degree.

      In the current study we focused on the principal gradient in the main analysis, given its association with sensory-transmodal hierarchy, microstructure, and evolutionary alterations (Margulies et al., 2016; Paquola et al., 2019; Xu et al., 2020).

      Conversely, gradient 2 reflects the dissociation between visual and sensory-motor networks and gradient 3 is linked to task-positive, control, versus ‘default’ and sensory-motor regions. We analyzed asymmetry and its heritability of the first three gradients (explaining respectively 23.3%, 18.1%, and 15.0% of the variance of the rsFC matrix). However, we extracted the first ten gradients to maximize the degree of fit (Margulies et al., 2016; Mckeown et al., 2020). We have now also shown G4-10 mean asymmetry results as a supplementary figure. To ensure correspondence of gradients across individuals, we aligned the individual gradients to the group level template with Procrustes rotation. Procrustes rotation rotates a matrix to maximum similarity with a target matrix minimizing sum of squared differences. The approach is typically used in comparison of ordination results and is particularly useful in comparing alternative solutions in multidimensional scaling. Figure S1 shows the mean gradients across subjects of each FC mode, which is close to the Figure 1D template gradient space.

      Results, p. 5: “The current study analyzed asymmetry and its heritability of the first three gradients explaining most variance (Figure 1d). As they all have reasonably well described functional associations (G1: unimodal-transmodal gradient with 24.1%, G2: somatosensory-visual gradient with 18.4%, G3: multi-demand gradient with 15.1%). However, given we extracted ten gradients to maximize the degree of fit 26,52. We stated mean asymmetry of G4-10 in Figure 1-figure supplement 1.”

      The intra-hemispheric gradient is institutive. However, it is hard to understand what the inter-hemispheric gradient means. From the data perspective, yes you can do such gradient comparison between the LR and RL connectome but what does this mean? Why should we care about such asymmetry? From the introduction to the discussion, the authors simply showed the data of inter-hemispheric gradients without useful explanation. This issue should be solved.

      We are happy to further clarify. The LR and RL connectivity reflects cross-hemispheric functional signal interaction via corpus callosum, whose structural asymmetry is usually studied (Karolis et al., 2019). Such intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum, and underlie hemispheric specialization. Different information relies on hemispheric specialization (e.g., visual, motor, and crude information) and/or inter-hemispheric information transfer (e.g., language, reasoning, and attention) (Gazzaniga, 2000). To clarify and motivate the analysis of both intra- and inter-hemispheric asymmetry in functional gradients, we have now added further detail in the introduction, p. 5.

      Here is text: Introduction, p. 4. “The full FC matrix contains both intra-hemispheric and inter-hemispheric connections. Intra-hemispheric connections, compared to the inter-hemispheric connections, have been suggested to reflect the inhibition of corpus callosum and may underlie hemispheric specializations involving language, reasoning, and attention. Conversely, inter-hemispheric connectivity may reflect information transfer between hemispheres, for example a wide range of modal and motor information, and crude information concerning spatial locations 48. Previous studies have reported intra-hemispheric FC to study gradient asymmetry 6,38. By having the callosum related to association white matter fibers, one hemisphere could develop for new functions while the other hemisphere could continue to perform the previous functions for both hemispheres 48. Therefore, in addition to the intra-hemispheric FC gradients, we depicted the inter-hemispheric FC, which is abnormal in patients with schizophrenia 23,49 and autism 24.”

      as well as Discussion, p. 16 “Conversely, the transmodal frontoparietal network was located at the apex of rightward preference, possibly suggesting a right-ward lateralization of cortical regions associated with attention and control and ‘default’ internal cognition 62,63. The observed dissociation between language and control networks is also in line with previous work suggesting an inverse pattern of language and attention between hemispheres 3,64. Such patterns may be linked to inhibition of corpus callosum 65, promoting hemispheric specialization. It has been suggested that such inter-hemispheric connections set the stage for intra-hemispheric patterns related to association fibers 48. Future research may relate functional asymmetry directly to asymmetry in underlying structure to uncover how different white-matter tracts contribute to asymmetry of functional organization.”

      and Discussion, p.18 “Though overall intra- and inter-hemispheric connectivity showed a strong spatial overlap in humans, we also observed marked differences between both metrics across our analysis. For example, although we found both intra- and inter-hemispheric differences in gradient organization to be heritable, only for intra-hemispheric asymmetry we found a correspondence between degree of asymmetry and degree of heritability. Similarly comparing asymmetry observed in human data to functional gradient asymmetry in macaques, we only observed spatial patterning of asymmetry was conserved for intra-hemispheric connections. Whereas intra-hemispheric asymmetry relates to association fibers, commissural fibers underlie inter-hemispheric connections 77 It has been suggested that there is a trade-off within and across mammals of inter- and intra-hemispheric connectivity patterns to conserve the balance between grey and white-matter 76. Consequently, differences in asymmetry of both ipsi- and contralateral functional connections may be reflective of adjustments in this balance within and across species. Secondly, previous research studying intra- and inter-hemispheric connectivity and associated asymmetry has indicated a developmental trajectory from inter- to intra-hemispheric organization of brain functional connectivity, varying from unimodal to transmodal areas 78,79. It is thus possible that a reduced correspondence of asymmetry and heritability in humans, as well as lack of spatial similarities between humans and macaques for inter-hemispheric connectivity may be due to the age of both samples (young adults in humans, adolescents in macaques). Further research may study inter- and intra-hemispheric asymmetry in functional organization as a function of development in both species to further disentangle heritability and cross-species conservation and adaptation.”

      When aligning intra-hemispheric gradient, choosing averaged LL mode as the reference may introduce systematic bias towards left hemisphere. Such an issue also applies to LR-RL gradient alignment as well as cross-species gradient alignment. This methodological issue should be solved.

      We thank the Reviewer for raising this point. Indeed, we also used RR as reference, the results were virtually identical. We have stated this in the Results, p. 13. Regarding the cross-species alignment, we averaged the left and right hemispheres to reduce the systematic bias. It showed that the correlation and comparison results remained robust. Now we have updated the method and corresponding results (p.10). Here is the text:

      Results (p.15): “We also set the RR FC gradients as reference, the first three of which explained 22.8%, 18.8%, and 15.9% of total variance. We aligned each individual to this reference. It suggested all results were virtually identical (Pearson r > 0.9, P spin < 0.001).”

      Results (p.10): “To reduce a possible systematic hemispheric bias during the cross-species alignment, we averaged the left and right hemisphere. We found that the macaque and macaque-aligned human AI maps of G1 were correlated positively for intra-hemispheric patterns (Pearson r = 0.345, P spin = 0.030). For inter-hemispheric patterns, we didn’t observe a significant association (Pearson r = -0.029, P spin = 0.858)”

      The sample size of monkey (i.e., 20) is far less than human subjects (> 1000). Such limitation raises severe concern on the validity of the currently observed gradient asymmetry pattern in the monkey group, as well as the similarity results with human gradient asymmetry pattern. Despite the marginal significance of G1 inter-hemisphere gradient between humans and monkeys, I feel overall there is no convincingly meaningful similarity between these two species. However, the authors' discussion and conclusion are largely based on strong inter-species similarity in such asymmetry. The conclusion of evolutionary conservation for gradient asymmetry, therefore, is not well supported by the results.

      We agree with your comments. Although it is a small sample compared to humans, in NHP studies, it is a relatively decent sample size (most of the studies have N<10). Of note, recent work suggested that the individual variation pattern can be captured using 4 subjects in both human and macaques (Ren et al., 2021).

      To overcome potential overinterpretation of our findings, we have now changed the title to a more descriptive format: “Heritability and cross-species comparisons of asymmetry of human cortical functional organization”

      And further detailed findings already in the Abstract; “These asymmetries were heritable in humans and, for intra-hemispheric asymmetry of functional connectivity, showed similar spatial distributions in humans and macaques, suggesting phylogenetic conservation.”

      We have pointed out the small sample size in the limitation. Please find the text below: Discussion, p. 18: “Due to the small sample size of macaques, it is important to be careful when interpreting our observations regarding asymmetry in macaques, and its relation to asymmetry patterning observed in humans. Therefore, further study is needed to evaluate the asymmetry patterns in macaques using large datasets 53,79”

      And nuanced the conclusion, p.19: “This asymmetry was heritable and, in the case of organization of intra-hemispheric connectivity, showed spatial correspondence between humans and macaques. At the same time, functional asymmetry was more pronounced in language networks in humans relative to macaques, suggesting adaptation.”

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) It is surprising that certain enzymes with established depalmitoylation activity were excluded from BrainPalmSeq data-base (e.g. ABHD4, ABHD11, ABHD12, ABHD6)

      We have now included additional depalmitoylating enzymes in our database and manuscript.

      2) Albeit not essential it will be of great interest to include in the established database enzymes necessary for synthesis of ACYL-CoA (e.g. ACSL enzymes). One improvement may include the ability of future researchers to add such curated analysis to the platform within future research studies.

      We agree with the reviewer there are many expansions of our gene set that would be interesting to include. Given the size of the current manuscript however, for brevity we have decided at present to curate data for the core set of genes that directly regulate dynamic palmitoylation. We have also added a ‘Contact Us’ feature to the website, so that repeatedly requested genes or datasets can be added in future.

      3) The experimental validation presented in figure 6 relies on over-expression of substrates and ZDHHC enzymes. This setup is known to often provide unspecific S-acylation events which result from excess enzyme or substrate availability. Hence, such validation would be greatly strengthened by loss of function experiments.

      We have now done loss-of-function experiments and included results in major discussion point 1 above. If the editors/reviewers think it is appropriate to add to the manuscript, we will comply. However, as our negative data does not negate the fact that ZDHHC9 is able to palmitoylate the myelin proteins tested, but merely suggests it may not be necessary for protein palmitoylation in vivo, we do not think it strengthens the manuscript.

      4) The authors relevantly use in-situ hybridization images from the Allen Brain atlas to validate their predictions. Although it is understandable that an extensive experimental validation of the predictions here established would be out of the scope of the current study, this work could be improved by validating the RNA expression at the protein level of certain abundant ZDHHC enzymes in available neuro-associated cell types.

      We have now validated RNA expression at the protein level for a few palmitoylating and depalmitoylating enzymes.

      5) It would be interesting if the authors would further compare the predicted association clusters (e.g. figure 1), substrates (figures 1 and 2), and S-acylation pairs (figure 4) here determine, with previous determined ZDHHC enzyme associations described in different cell types and biological systems. Alternatively, further relevant validation could include testing whether further established ZDHHC-ZDHHC cascades (e.g. ZDHHC3-7) can be also detected with specific cells or regions of the CNS.

      On our website, all expression data can be downloaded below the heatmaps for each study, and the cell type expression relationships between any 2 genes can be plotted by the user to reveal cell types (if any) within which genes are co-expressed. In response to this comment and that of Reviewer 3 below, we have now performed such analysis on ZDHHC5/ZDHHC20 and ZDHHC6/ZDHHC16, which are to our knowledge the best established ZDHHC cascades. We have included these plots in new Figure 1 – figure supplement 2, along with discussion on line 172. Similar analysis has been performed on the known ZDHHC-accessory protein pairs (see below).

      6) Figure 3B: it is not clear why the cluster of zdhhcs with high layer specific expression displayed at the top of the graph does not follow the low-to-high expression scale of the table.

      The expression data in this figure is grouped by hierarchical clustering, rather than in order of low-to-high expression, in order to be consistent with Figure 2B. While we believe this is the better way to display the data, we are willing to modify if the editors/reviewers have a strong preference.

      7) Figure 4D: the more relevant potential cooperative pairs (ZDHHCs-APTs) could be highlighted in more contrasted colours.

      We thank the reviewer for this suggestion but at this stage would prefer to keep the color scheme as it is so that readers are better able to formulate their own hypotheses when observing these figures.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) There is a vast amount of data available and the description and discussion of this could be endless, but there are a few points that could be brought out in more detail. For example, the correlation (or lack of correlation) of expression of the proposed zDHHC-PAT accessory proteins with their cognate zDHHCs. The dominance of a relatively small number of zDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS also merits some discussion. Is the combination of a high-capacity, low-specificity enzyme (zDHHC3) with others that are regarded as more 'specific'? I believe none of these are ER-resident - they represent Golgi and PM?

      The reviewer brings up many interesting questions. Indeed, we were hopeful that this type of mining of RNAseq data would bring to light many questions that can be followed up on in future publications.

      We have addressed the correlation in expression of accessory proteins with their cognate ZDHHCs with new data.

      We are unsure how to address the dominance of a relatively small number of ZDHHC enzymes (20, 2, 17, 3, 21, 8) in the CNS, beyond highlighting this expression pattern. We believe that interpretation of the expression of this in any way (e.g. co-expression of high-capacity, low-specificity enzymes (ZDHHC3) with more 'specific' ZDHHCs) would merely be speculative. However, we are open to adding further discussion with some guidance from the reviewer.

    1. [Bruno Giussani, co-curator of TED] gave the example of Steven Pinker‘s popular TED talk on the decline of violence over the course of history, based on his book The Better Angels of Our Nature. Pinker is a respected professor of psychology at Harvard, and few would accuse him of pulling his punches or yielding to thought leadership’s temptations. Yet his talk became a cult favorite among hedge funders, Silicon Valley types, and other winners. It did so not only because it was interesting and fresh and well argued, but also because it contained a justification for keeping the social order largely as is. Pinker’s actual point was narrow, focused, and valid: Interpersonal violence as a mode of human problem-solving was in a long free fall. But for many who heard the talk, it offered a socially acceptable way to tell people seething over the inequities of the age to drop their complaining. ‘It has become an ideology of: The world today may be complex and complicated and confusing in many ways, but the reality is that if you take the long-term perspective you will realize how good we have it,’ Giussani said. The ideology, he said, told people, ‘You’re being unrealistic, and you’re not looking at things in the right way. And if you think that you have problems, then, you know, your problems don’t really matter compared to the past’s, and your problems are really not problems, because things are getting better.’Giussani had heard rich men do this kind of thing so often that he had invented a verb for the act: They were ‘Pinkering’ — using the long-run direction of human history to minimize, to delegitimize the concerns of those without power. There was also economic Pinkering, which ‘is to tell people the global economy has been great because five hundred million Chinese have gone from poverty to the middle class. And, of course, that’s true,’ Giussani said. ‘But if you tell that to the guy who has been fired from a factory in Manchester because his job was taken to China, he may have a different reaction. But we don’t care about the guy in Manchester. So there are many facets to this kind of ideology that have been used to justify the current situation.’ —Winners Take All, pp. 126-127

      An early example of the verbification of Steven Pinker's name. Here it indicates the view of predominantly privileged men to argue that because the direction of history has been so positive, that those without power shouldn't complain.

      I've also heard it used to generally mean a preponderance of evidence on a topic, as seen in Pinker's book The Better Angels of Our Nature, but still not necessarily convincingly prove one's thesis.

    1. It is ironical that we Senators can in debate in the Senate directly or indirectly, by any form of words, impute to any American who is not a Senator any conduct or motive unworthy or unbecoming an American -- and without that non-Senator American having any legal redress against us -- yet if we say the same thing in the Senate about our colleagues we can be stopped on the grounds of being out of order. It is strange that we can verbally attack anyone else without restraint and with full protection and yet we hold ourselves above the same type of criticism here on the Senate Floor.  Surely the United States Senate is big enough to take self-criticism and self-appraisal.  Surely we should be able to take the same kind of character attacks that we "dish out" to outsiders. I think that it is high time for the United States Senate and its members to do some soul-searching -- for us to weigh our consciences -- on the manner in which we are performing our duty to the people of America -- on the manner in which we are using or abusing our individual powers and privileges.

      Aristotelian criticism is largely concerned with wondering how effective an artifact is in reaching its intended audience. In this case, Senator Smith never mentions Sen. McCarthy by name, but given the historical context, it is obvious she refers to him and his supporters in condemning "the Senate and its members". If one were to measure her success in doing so, as mentioned in Lorraine Boissoneault's Smithsonian article, stating , "The one person who didn’t forget Smith’s speech was McCarthy himself. 'Her support for the United Nations, New Deal programs, support for federal housing and social programs placed her high on the list of those against whom McCarthy and his supporters on local levels sought revenge,' writes Gregory Gallant in Hope and Fear in Margaret Chase Smith’s America. When McCarthy gained control of the Permanent Subcommittee on Investigations (which monitored government affairs), he took advantage of the position to remove Smith from the group, replacing her with acolyte Richard Nixon, then a senator from California." This may not have been an intended effect, but shows nonetheless the significance and degree to which Smith's speech was able to reach her audience. Unfortunately however, it's hard to say how effective entirely her speech would have been, as popularity of her speech waned as the Korean War broke out later the same month, inclining many to take a more right-wing, anti-communist approach favored by McCarthy and many other Republicans.

    1. It is ironical that we Senators can in debate in the Senate directly or indirectly, by any form ofwords, impute to any American who is not a Senator any conduct or motive unworthy orunbecoming an American -- and without that non-Senator American having any legal redressagainst us -- yet if we say the same thing in the Senate about our colleagues we can bestopped on the grounds of being out of order.It is strange that we can verbally attack anyone else without restraint and with full protectionand yet we hold ourselves above the same type of criticism here on the Senate Floor. Surelythe United States Senate is big enough to take self-criticism and self-appraisal. Surely weshould be able to take the same kind of character attacks that we "dish out" to outsiders.I think that it is high time for the United States Senate and its members to do some soul-searching -- for us to weigh our consciences -- on the manner in which we are performing ourduty to the people of America -- on the manner in which we are using or abusing ourindividual powers and privileges.

      Aristotelian criticism is largely concerned with wondering how effective an artifact is in reaching its intended audience. In this case, Senator Smith never mentions Sen. McCarthy by name, but given the historical context, it is obvious she refers to him and his supporters in condemning "the Senate and its members". If one were to measure her success in doing so, as mentioned in Lorraine Boissoneault's Smithsonian article , "The one person who didn’t forget Smith’s speech was McCarthy himself. 'Her support for the United Nations, New Deal programs, support for federal housing and social programs placed her high on the list of those against whom McCarthy and his supporters on local levels sought revenge,' writes Gregory Gallant in Hope and Fear in Margaret Chase Smith’s America. When McCarthy gained control of the Permanent Subcommittee on Investigations (which monitored government affairs), he took advantage of the position to remove Smith from the group, replacing her with acolyte Richard Nixon, then a senator from California." This may not have been an intended effect, but shows nonetheless the significance and degree to which Smith's speech was able to reach her audience. Unfortunately however, it's hard to say how effective entirely her speech would have been, as popularity of her speech waned as the Korean War broke out later the same month, inclining many to take a more right-wing, anti-communist approach favored by McCarthy and many other Republicans.

    Annotators

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Excellent quality of cell biology and biochemistry. the additional supports are needed for the claim of actin elongation using different formin variants.

      Reviewer #1 (Significance (Required)): Ingrid Billault-Chaumartin and co-authors described interesting research that provides insights on formin-isoform specific function in fission yeast and a new role of Fus1 FH2 domain in cell-cell fusion event. While three formin isoforms have different localization, research proposed an additional dissection in their functional differences by having different functions in C-terminus, including FH1 FH2 and formin C-terminus. The work also described additional factors that regulate cell fusions from autotrophy effect and formin expression level, in addition to the well-accepted formin biochemical activities. Here are my comments regarding the strengths of the work and improvements that could further strengthen the story.

      Major comments 1. Fig.1 shows Cdc12C could recapitulate Fus1 function by ~80% if fused with Fus1C, whereas deletion of the C-terminal tail of Cdc12 following FH2 introduces drastic dysfunction. Together with Fig. 3, these results indicate Cdc12 Cter plays more important roles than Fus1 Cter for there respective functions. Such results suggested a Cter-mediated mechanism that differentiates the functions of three fission yeast formin isoforms. The authors examined contributions from the difference in FH1 (Figs 4,5) and FH2 residues (Fig. 6). Whereas the obvious phenotype of Cter was not further investigated and not much discussed. The Cter of budding yeast formins interacts with nucleation-promoting factors, Bud6 and Aip5. Although S. Pombe does not have orthologs of budding yeast Bud6 and Aip5, I wonder would the author discuss the potential contribution of Cter in differentiating S. Pombe formins.

      The reviewer is correct that the C-terminal tail region of Cdc12 beyond the FH1-FH2 domains has a strong influence on the ability of Cdc12C to replace Fus1C. This is one reason why we specifically investigated the possible role of Fus1 C-terminal tail, which is much shorter than that of Cdc12. We found that Fus1 C-terminal tail plays only very minor role in regulating Fus1 function, as described in Figure 3. We note that contrary to what the reviewer states, Bud6 exists in S. pombe and binds the C-terminal tail of the formin For3 (see Martin et al, MBoC 2007), but whether it binds Fus1 is unknown. We have expanded our discussion to include a paragraph on the role of formin C-termini.

      Because the manuscript is focused on the function of Fus1 formin, we did not explore further the role of the Cdc12 C-terminal tail. It was previously shown that this region of Cdc12 contains an oligomerization domain that promotes actin bundling (Bohnert et al, Genes and Dev 2013). It is thus likely that this helps Cdc12 FH1-FH2 perform well in replacement of Fus1. In fact, it is likely that oligomerization boosts formin function, as we have discovered that Fus1 N-terminus contains a disordered region that fulfils exactly this function. This is described in a distinct manuscript under review elsewhere and just deposited on BioRxiv (Billault-Chaumartin et al, BioRxiv 2022; DOI: 10.1101/2022.05.05.490810). We have now cited this point in the discussion.

      1. Here, the study focuses on the FH1 between Fus1 and Cdc12 to understand their different functions in actin polymerization. FH1 mediated actin elongation through its interaction with profilin via polyP. The transfer rate of G-actin from profilin and profilin sliding depends on the polyP patterns regarding the length of each polyp motif and their distance to FH2 (Naomi Courtemanche and Thomas D. Pollard, JBC, 2012). To better understand the mechanisms by which these engineered FH1 variants on both Fus1 and Cdc12 in Fig. 4, the author may want to list the sequence of these engineered FH1 domains, including the information of the number and length of polyp motifs, and discuss these patterns.

      This list and discussion were available in the initial paper that characterized each of the constructs in vitro (Scott et al, MBoC 2011). We have now re-drawn it in a supplemental figure for convenience (as also answered in response to minor point 2), which is already provided in the revised manuscript as Figure S1. (Previous supplementary figures are re-numbered S1>S2, S2>S3 and S3>S4).

      1. Figs.4,5 cell biology results do not directly support the point of specific elongation rate unless the LifeAct-labeled actin cable elongation speed could be followed and quantified. The fluorescent tagging of tropomyosin does not show the actin cable pattern, which makes it very difficult to be used to study actin cable dynamics, such as elongation. Therefore, I feel the data in current Fig. 4 and Fig. 5 could not claim the differences in actin elongation without a quantitative comparison of elongation rate. I suggest a CK666 treatment to increase the visibility of the actin cable pattern of LifeAct, used before in both fission and budding yeasts, which would allow the author to quantify the actin cable elongation rate. Another way is to use the TIRF assay used in this study, which would give a better quantitation of formin nucleation and profilin-aided elongation.

      We respectfully disagree with the reviewer on this point. All the constructs we use in vivo have been characterized in vitro and their elongation rate carefully measured (Scott et al, MBoC 2011). These values are thus known and can be directly compared to our results in vivo.

      Of course, it would be fantastic to be able to directly measure formin elongation rates in vivo, but we are not aware that this has been done in any system. The proxy experiments that the reviewer suggests would be good ones, but each faces technical challenges that make them impossible in our system. First, because the fusion focus is a structure that forms in response to cell-cell pheromonal communication, we cannot add CK-666 or any other drug during this phase, as this perturbs the pheromone signal. Indeed, we had shown that simple buffer wash leads to loss of the fusion focus (see Dudin et al, Genes and Dev 2016). Second, the fusion focus is at the contact site between partner cells, i-e somewhat distant (1-2µm) from the coverslip during imaging. It is thus impossible to use TIRF. Finally, the fusion focus is a tightly packed actin structure. This is the reason why (rather than use of the tropomyosin marker) we cannot image single actin filaments (or even bundles) of which we could follow the dynamics as has been done to measure the retrograde flow of actin cables in yeast.

      What we have done is to use a better tropomyosin tag, mNeonGreen-Cdc8, which was just described (Hatano et al, BioRxiv 2022; DOI: 10.1101/2022.05.19.492673) to quantify amounts of linear actin. Although this is not a measure of elongation rate, it would give some sense about amounts of polymer assembled. We have obtained images with mNeonGreen-Cdc8 of all experiments previously conducted with GFP-Cdc8 and have replaced them in Figure 4C, Figure 5E, Figure 6E and Figure S2B. We have also quantified the relevant strains. The relative intensities of mNeonGreen-Cdc8 at the fusion focus at fusion time reflect remarkably well the measured elongation rates of the various formin constructs characterized in vitro. These data are now provided as new panels Figure 4F and Figure 5F.

      1. I appreciated the detailed biochemical dissections of multiple aspects of WTFus1 and Fus1R1054E, although the biochemical assays could not identify the mechanism by which R1054E causes the cell fusion. In many cases, the formin functions are diverse in diverse biological processes and sophisticated that cannot be explained well only from its biochemical activities in actin polymerization, such as the bundling, nucleation, and elongation studied in this story regarding fusion. This exciting information allows us to think of more possibilities that might regulate formin function rather than a direct change of formin activities in actin polymerization. I think a discussion of different aspects of functional regulation of formin might inspire society to investigate new possibilities to solve the mysteries. For example, the changes in formin behaviors and functions could be regulated by stress-induced formin turnover by degradation, cell signaling-regulated formin clustering and complex assembly, and their potential relevance to recruit protein constituents for fusion progression.

      We have added a paragraph on the role of Fus1 C-terminus. If you feel we should expand more on the diverse modes of regulation of formins, we could, but we have so far kept the discussion centred around the points of investigation in this paper, whose aim was to probe how changes in nucleation and elongation rates, rather than other regulations, affect the in vivo function of Fus1.

      Minor comments. 1. There are two types of "C", one includes FH1/FH2 and one following FH2, used in the manuscript, and it is a bit confusing. Better to differentiate them that allows an easy following. Fig. 1 uses Cdc12C-deltaC, Fig. 3 uses Fus1-delta Cter.

      We have updated the nomenclature to make this clearer: the C-terminal region beyond the FH1-FH2 domains is now called Cter throughout the manuscript.

      1. It's better to specify the amino acid position on the schematic of formins, such as panel A in many figures. It's always more informative to compare formin activities by considering the domain lengths, especially for the C-terminal tail that is variable in lengths and sequences. With similar thoughts, I suggest a supplementary figure that lists the sequence of all FH1 domains variants and Cter domains, such as the FH2 domain in Fig. S1.

      We have made a supplementary figure (new Figure S1) listing all constructs with specific aa positions as well as the FH1 domain variants and their sequences (see also answer to point 2 above). We have not added the sequence of the Cter domains in this figure, as these are extremely divergent and not particularly informative at this point.

      1. "n" for the statistic needs to be provided for Fig. S3.

      We have added the information to the legend of the figure (now Fig S4).

      1. The SDS-PAGE staining gel of the purified recombinant proteins for biochemical assays should be provided, particularly for these newly reported mutant variants.

      This is now provided as new panel S4C. We show the purified recombinant Cdc122FH1-Fus1FH2 proteins, which are the newly reported ones.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this study, Billaut-Chaumartin and colleagues investigate the molecular specialization of the S. pombe formin, Fus1. The authors systematically modulate the actin filament elongation and nucleation activities of Fus1 by expressing chimeric constructs that contain Formin Homology 1 and 2 domains from two other formins with known polymerization activities. By characterizing the architecture of the fusion focus and the efficiency of cell fusion, they find that both the elongation and nucleation properties of Fus1 are specifically tailored for its cellular role. Comparison of formin constructs with similar elongation and nucleation activities also reveals that the Fus1 FH2 domain possesses a specific property that promotes efficient cell fusion. Using sequence alignment and homology modeling, the authors identify R1054 as the residue that confers this novel, fusion-specific activity to Fus1, despite producing no effect on its bundling or polymerization properties in vitro.

      Overall, this study is well motivated, and the results support the conclusions that are drawn. I have only minor suggestions, as described below.

      Minor comments: (1) The schematic diagrams of the chimeric formin constructs are very helpful. However, it is difficult to distinguish the colors from one another, especially in the case of the Cdc12FH1-Fus1FH2 variant, which requires discernment of the relatively small purple region within the dark blue molecule. Would it be possible to modify the colors to increase their contrast? Similarly, the blue and gray data sets in Figure 3B are very difficult to discern.

      We have changed the colours to improve contrasts.

      (2) The affinities (Kd) with which the formins bind the barbed ends as described in the second-to-last paragraph on page 8, in Figure Legend 7G, and in the "Analysis of pyrene data" section of the Materials and Methods should be defined as dissociation "constants", rather than dissociation "rates". Also, these affinities are lacking units in the following sentence on page 8.

      We have corrected this. The unit is nM.

      (3) When comparing the TIRF micrographs in Figure S3A, it looks as though both formins (but especially the R1054E variant) nucleate more filaments in the presence of profilin than in its absence. Is this a reproducible effect? If so, can the authors provide an explanation for this?

      There is strong variability in the filament numbers observed by TIRF in replicate experiments, which makes it difficult to use this technique to determine the nucleation efficiency. This may be due for instance to the stickiness of the glass, which may influence the number of observed filaments. We have measured the number of filaments after 130s of polymerization for each condition to test whether there are any significant differences between conditions despite overall variability. The measurements suggest that the addition of profilin increases the number of actin filaments. However, these results should be taken very carefully due to the experimental variations (very large error bars). Additionally, because Fus1-associated filaments are very short in absence of profilin, it is quite likely that this influences their crowding at the glass surface compared to longer filaments (in presence of profilin). Since in TIRF we can only observe the filaments at the glass surface, we may miss a portion of short Fus1-bound actin filaments in absence of profilin.

      For these reasons, and because the possible role of profilin in modulating nucleation efficiency by formins is not the object of the work here, would thus prefer not to include this graph in the manuscript.

      Reviewer #2 (Significance (Required)): This study contributes a key advancement towards understanding how the polymerization activities of formins are tailored to support diverse and specific cellular functions. The results in this study nicely complement and expand upon similar recent work that dissected the polymerization requirements of the formin Cdc12, which mediates cytokinetic ring assembly in S. pombe, and For2, which drives the assembly of apical networks that are necessary for polarized growth in Physcomitrella patens. As such, this work will likely be of significant interest to scientists who study mechanisms of actin dynamics regulation. The identification of R1054 as a residue that confers a novel regulatory activity to the FH2 domain of Fus1 will also likely be of great interest to biochemists and other scientists who study formins at the molecular level.

      My expertise is in the field of formins and actin polymerization.

    1. Reviewer #1 (Public Review): 

      In this article Farrell et al. leverage existing datasets which measure frailty longitudinally in mice and humans to model 'robustness' (the ability to resist damage) and 'resilience' (the ability to recover from damage), their dynamics across age, and their relative contributions to overall frailty and mortality. The concept of separating damage/robustness from recovery/resilience is valid and has many important applications including better assessment and prediction of effective intervention strategies. I also appreciate the authors' sophisticated attempts to effectively model longitudinal data, which is a challenge in the field. The use of human and mouse data is another strength of the study, and it is quite interesting to see overlapping trends between the two species. 

      While I find the rationale sound and appreciate the approach taken at a high level, there are a few key considerations of the specific data used which are lacking. The authors conceptualize resilience based on studies which primarily use short time scales and dynamic objective measures (ex. complete blood cell counts in Pyrkov et al.) often in conjunction with an acute stress stimulus. For example, they heavily cite Ukraintseva et al. who define resilience as "the ability to quickly and completely recover after deviation from normal physiological state or damage caused by a stressor or an adverse health event." 

      Given these definitions, the human data used seem to fit within this framework, but we should carefully consider the mouse data. The mouse frailty index is a very useful tool for efficiently measuring the organismal state in large cohorts. A tradeoff for quickly measuring a broad range of health domains is that the individual measurements are low resolution (categorical) and involve inherent subjectivity (which may be considered part of the measurement error). Some transitions in individual components are due to random measurement error and I believe this is especially likely with decreases (or 'resilience' transitions). 

      The reason I think the resilience transitions are subject to high measurement error is that I am skeptical as to whether many of the deficits in the mouse index are reversible under normal physiologic conditions. For example, it is exceptionally unlikely for a palpable/visible tumor to resolve in an aged mouse over the time scales studied here, thus any reversal that was observed is very likely due to random measurement error. Other components which I have doubts about reversibility are alopecia, loss of fur color, loss of whiskers, tumors, kyphosis, hearing loss, cataracts, corneal capacity, vision loss, rectal prolapse, genital prolapse. 

      In summary, I applaud the authors' efforts in generating complex models to better understand longitudinal aging data. This is an important area that needs further development. I appreciate their conceptualization of resilience and robustness and think this framework has an important place in aging research. I also appreciate their cross-species approach. However, the authors may have over-conceptualized and made some assumptions about the mouse data which may not be valid. It will be important to assess the results with careful consideration of the time scales of the underlying biology and the resolution and measurement error inherent to these tools.

    1. What did Franklin himself think about abortions? In 1728 during his early years as a printer, he generated controversy over something he would end up doing himself. According to “Benjamin Franklin: An American Life” by Walter Isaacson, he “manufactured” an abortion debate, largely because he wanted to crush a rival, but his own opinions may not have been too strong about it. Franklin wrote a series of anonymous letters for another paper to draw attention away from Samuel Keimer’s paper: The first two pieces were attacks on poor Keimer, who was serializing entries from an encyclopedia. His initial installment included, innocently enough, an entry on abortion. Franklin pounced. Using the pen names “Martha Careful” and “Celia Shortface,” he wrote letters to Bradford’s paper feigning shock and indignation at Keimer’s offense. As Miss Careful threatened, “If he proceeds farther to expose the secrets of our sex in that audacious manner [women would] run the hazard of taking him by the beard in the next place we meet him.” Thus Franklin manufactured the first recorded abortion debate in America, not because he had any strong feelings on the issue, but because he knew it would help sell newspapers.

      Benjamin Franklin manufactured the first recorded abortion debate in America to help sell his newspapers and to crush a rival.

    1. The student doesn’t have a strong preference for any of these archetypes. Their notes serve a clear purpose that’s often based on a short-term priority (e.g, writing a paper or passing a test), with the goal to “get it done” as simply as possible.

      The typical student note taking method of transcribing, using (or often not using at all), and keeping notes is doomed to failure.

      Many students make the mistake of not making their own actual notes. By this I don't mean they're not writing information down. In fact many are writing information down, but we can't really call these notes. Notes by definition ought to transform something seen or heard into one's own words. Without the transformation, these students think that they're taking notes, but in reality they're focusing their efforts on being transcriptionists. They're attempting to capture something for later consumption. This is a deadly trap! By only transcribing, they're not taking advantage of transforming information by putting ideas down in their own words to test their understanding. Often worse, even if they do transcribe notes, they don't revisit them. If they do revisit them, they're simply re-reading them and not actively working with them. Only re-reading them will lead to the illusion that they're learning something when in fact they're falling into the mere-exposure effect.

      Students who are acting as transcriptionists would be better off simply reading a textbook and taking notes directly from that.

      A note that isn't revisited or revised, may as well be a note not taken. If we were to consider a spectrum of useful, valuable, and worthwhile notes, these notes would be at the lowest end of the spectrum.

      link to: https://hypothes.is/a/QgkL6IkIEeym7OeN9v9New

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      From the start, the authors would like to thank all the reviewers for their careful and constructive consideration of our manuscript. We have now made several changes to the paper and believe it to be better for the feedback.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, Rees et al. perform an RNA-seq circadian time course experiment in the recently formed allopolyploid wheat. Through comparisons with other circadian transcriptomic datasets in other species it appears that the period of rhythmic genes is much more variable in wheat with a shift to longer periods compared to the other species examined. Interestingly, by analyzing circadian parameters among expressed genes, they find evidence that this newly formed allopolyploid already shows signs of divergence in circadian traits among homoeologs. A thorough comparison with circadian regulated genes in Arabidopsis reveals overlap in phasing of genes involved in certain biological processes such as photosynthesis and light signaling whereas genes involved in starch metabolism were found to have different levels of rhythmicity and phasing. This dataset will be a great resource for the community and enable new predictions about the influence of polyploidy on the circadian control of important crop improvement traits and the circadian regulation of gene expression.

      Major Comments

      1. The results section starts with very little explanation of the experiment. It would help to provide a little more detail at the start of the results to explain the context for the experiment and what was done, when samples were collected and for how long. For the methods section, it isn't until line 650 that it is clearly stated that the sampling started at ZT0. It would be better to put this in the plant materials and growth condition section.

      Thank you for highlighting the need for this context, we agree that the manuscript is improved by an introduction to the experiments. We have now included an “Experimental context” section in the results and have taken the opportunity to explain how the full 0-68h and 24-68h datasets are used within our analysis. Ln 74-82. We have also edited the Methods as suggested Ln 610-615.

      The low proportion of circadian regulated genes is likely due to the very low cutoff for calling a gene expressed, especially when there are three days of repeated timepoints. If a gene is expressed across the time course it should have values above TPM 0 for at least 3 time points in order for it to be expressed each day. I'd also be suspicious of a gene with a TPM value less than 0.5. Comparing these types of numbers is always challenging due to the various cutoffs used. Along those lines, why was a different filtering scheme used for Arabidopsis (line 657)?

      We completely agree that the proportion of genes described as rhythmic changes a great deal with the threshold at which you exclude low expression transcripts as well as the window over which measurements are taken and the q-value cut-off for rhythmicity. We performed an analysis to test the effects of applying a pre-filtering step to exclude low-expression genes and discuss our findings in Supplementary Note 1. Briefly, we removed genes with expression less than 0.1 TPM in six or more timepoints and again ran Metacycle to define numbers of rhythmic genes. Our results are discussed in Supplementary Note 1 and are presented in Supplementary Table 1. Regardless of the cut-offs applied, Arabidopsis and wheat data was treated identically, and our findings reported in the main results were consistent with those reported in the Supplementary analysis. Thank you for raising this point, as we have now improved our description of this analysis in the main text (Ln 92-95).

      Regarding the different filtering schemes, the filtering mentioned by Reviewer 1 was applied to both Arabidopsis and wheat data for a stricter retention of rhythmic genes, as part of the pre-WGCNA clustering analysis. Filtering to retain genes with >0.5TPM across 3 timepoints was applied to reduce lowly expressed genes, that act as background 'noise' when defining clusters. We applied this across 3 timepoints rather than the WGCNA suggestion of 90% of samples - because the patterns of expression in our rhythmically filtered datasets were cyclical in nature.

      In reference to the shortening of the period every day, this should be interpreted with caution. Period estimate of a single cycle are not very reliable and the SD for each day is around 3h so it is difficult to draw any conclusions about changes in period each day. One option would be to only include genes with an SD less than 1h or alternatively to remove the discussion surrounding the comparison of period across the three days and focus on the period results for the full 24h-68h window shown in 1b. While 2 days is better it is still not ideal for calling period; however, your first day will still have a strong diurnal driven pattern that will likely skew your circadian period.

      Thank you for your comments. Our question here was to determine whether the mean period lengths of rhythmic transcripts in wheat were always immediately longer upon transfer to constant light, or whether they got progressively longer over time. Upon reading the reviewer’s comment, we realize that the explanation provided of how we conducted this analysis was misleading. Our approach was to take a 44h sliding window (almost 2 days) and measure period at 0-44h, 12-56h and 24-68h. We have now added the previously missing statistics that support our findings in the main text, and which hopefully show the significance of the period changes over time (supplementary note 2). One of the most surprising findings from this analysis was that the periods in the first window were the longest 28.61h (SD=3.421), suggesting that the diel (driven) oscillation had little impact upon immediate transfer to free run. Our interpretation is that the mean period initially lengthens trying to follow the missing dusk signal, before the free-running endogenous period asserts itself in later cycles (Ln 129-128).

      Line 87-93: If the dusk cue is important for clock expression you would think this would be biased towards genes that peak later in the day or near dusk. This argument should be connected better to the period results discussed on lines 98-101.

      Following on from our statement above, we have now combined our hypothesis for why wheat transcripts expressed at dusk have longer periods with the discussion about longer periods upon transfer to constant light. We agree that the two processes are likely to be connected and have now placed them together in Ln 129-128.

      1. Lines 650-652 of the Methods mentions that one of the main interests was the response to transfer to L:L, but this isn't mentioned in the introduction and doesn't come up much in the Results section. Most of the expression comparisons are focused on the 24-68h window. It also isn't clearly explained why the first day in LL is still a diurnal cycle. This would be helpful for non-circadian readers who may wonder why the first day is not included in all the analyses.

      We believe this point is now also addressed by the addition of an Experimental Context section in the results (Ln 74-82), in response to the reviewer’s previous comment.

      1. The phase comparisons shown in Figure suppl 4 are confusing. Suppl. Note 3 states that the period from the 24-68h data window was used to establish the bins but then the phase is shown for 3 different windows for each column? When calculating the phase for each of those 3 windows which period was used as the denominator in the phase calculation? Was it the period that matches the window used to calculate phase? What does the plot look like if phase is called on the same window used to calculate period (24-68)? What method was used to call phase in Suppl. Fig 4? As shown in Suppl Fig. 3 the method can influence the phase distributions. The methods suggest that the phase was determined with Metacycle but then FFT and MESA were used to verify. What does this mean verify, were they adjusted if FFT/MESA didn't agree?

      We agree that this Figure was unnecessarily complicated. We have now simplified Supplementary Figure 4 so that only the phases from 24-68h are presented. We have also clarified the legend to explain why we used FFT-NLLS to improve accuracy of Metacycle predictions.

      It is difficult to interpret the value of the period and phase comparisons shown in Fig. 1b, c, e and f after the preceding section about how variable the period and phase is across days. It is also surprising that the full 3 days were used to calculate the circadian statistics considering the first day is still under diurnal control. Do the ratios remain the same if the statistics are performed only on the 24h-68h window? For consistency with the rest of the paper and avoid confusion it would be best to have all circadian parameters measured using the same time window (24h-68h).

      Thank you for your comments, we can see how our logic in using the different data windows was not clear enough. As mentioned above, we have now explained the use of the full and shortened data windows in Experimental context section (Ln 74-82). Fig 1c is a comparison between different circadian datasets and as such we have only compared periods across 24-68h window. Similarly, Fig 1b is a global analysis of periods in rhythmic genes in comparison with Arabidopsis and so is again measured from 24-68h. We have now clarified this in the Figure legend for 1b.

      For comparisons of homoeologs within wheat triads, our question was in identifying homoeologs which behaved differently when placed under free-running conditions. We therefore still feel justified in using the full 0-68h dataset to identify homoeolog periods and phases which indicate differential circadian regulation, but we have now clarified that we are using the full dataset for the triad analysis in the results (Ln 140).

      Fig 1h-m. How were those genes chosen? It would help to see the SD of the replicates shown, since this is just showing one triad. It would be helpful to see a plot that represents the full set of triads rather than just one that looks best. If normalized to a standard phase they could be put on the same plot. For example, panel j is meant to show the 8h lag of subgenome D. If the data is normalized so that A and B are set to the same phase all the triads could be displayed with shaded SD bars to show the variation. Something like this would be a better representation of the data rather than showing just one example.

      Fig. 1h-m are case-studies illustrating the different forms of circadian imbalance between homoeologs. We agree that it is helpful to see the standard deviation as error bars on these triad plots and have added it as suggested. In line with another Reviewer 2’s suggestion we have removed Fig 1k and have replaced this with a comparison of mean normalised data for Triad 408 and Triad 2454, highlighting the difference between imbalanced rhythmicity and imbalanced amplitudes between homoeologs. Fig 1 I and m do not have error bars as adding standard deviations to mean normalised data wasn’t appropriate.

      Thank you for your suggestion on how to display the different phases between homoeologs. We feel that if we were to plot all of the triads displaying imbalanced phases, the differences in period length and accompanying noise differences would make the plot so busy as to be unreadable. We hope that the pie charts Fig 1 d-g give a global overview of the proportions of triads with circadian imbalance, but agree with the point that it is useful to allow readers to view triads of their own preference. Therefore, we have now provided the replicate level TPM data with the triad IDs annotated (Supplementary File 12) and Supplementary file 11 provides the classification of each triad alongside Metacycle statistics, ortholog identification and cluster information discussed elsewhere in the paper. Readers can now look up a triad or gene of interest and see how it was classified and what the expression looks like over the full dataset.

      It is surprising that there aren't more comparisons with the B. rapa dataset, especially when discussing the clock genes that show balanced or imbalanced expression. Are they similar in B. rapa and does it support your hypothesis that unbalance for certain genes are selected against?

      While we agree that a thorough, multiple species, comparative transcriptomic analysis is undoubtably of interest for the future, we feel it is beyond the scope of the questions being addressed in this paper. We do compare paralogs defined as “similar” in the Greenham dataset with homoeologs described as “balanced” in our dataset and find that genes involved with “photosynthesis” and “generation of precursor metabolites and energy” tend to be common between the two groups, potentially suggesting conservation of balance for certain types of genes (Ln 206-217).

      Figure 2 networks. Why were these specific modules selected? Is it actually appropriate to directly compare these modules? I do see that some of the comparisons have high correlations from panel a, but not all. For example, in panel b the W9 and A9 modules have a correlation value of 0.92, which seems appropriate. However, panel c (modules W3 and A2) have a correlation of 0.42, which seems far too low to make any sort of comparison meaningful.

      The modules were selected to simplify the comparison of genes expressed in the dawn, midday, dusk, and night. We were interested in identifying common GO-enrichment in genes peaking throughout the day, although as you have identified, the differences in period length between Arabidopsis and wheat made this difficult. Our reasons for comparing module W3 with module A2, were that, even though their eigengenes are not highly correlated per se, when period length is taken into account, both modules peak during the subjective day (CT 6.34h and 6.19h) and they share commonly enriched GO terms which make sense for day peaking genes.

      Further, as described in methods comments, using a cutHeight as low as 0.15 will likely lead to some number of genes in any given module that do not necessarily "share" a similar expression pattern. These genes could have a pattern that has very low correlation to their module eigengene and were only placed in that module because the pattern was "less similar" to other module eigengenes. The current expression plots in this figure follow a clear pattern, but I suspect this would be even more apparent if the genes within these modules had a higher correlation to the module eigengene. Perhaps the current genes in these modules could just be filtered to have a higher correlation score?

      Thank you for your comments, we have now made changes to the Results and Methods to clarify our approach (Ln 237-239 and Ln738-765). Merging modules with highly correlated module eigengenes (ME) is the final step in constructing our co-expression networks. To do this, as the reviewer describes - we used the WGCNA default parameter of a mergeCutHeight() of 0.15. This results in the merging of modules with highly correlated ME as the 0.15 mergeCutHeight() refers to the dissimilarity metric of 1 minus the eigengene correlation. So for WGCNA, a mergeCutHeight() of 0.15 corresponded to a correlation of 0.85. For the wheat modules, we took the additional step of merging closely related modules (mergeCloseModules()) using a cutHeight of 0.25, again a dissimilarity metric of 1 minus the eigengene correlation (corresponding to a correlation of 0.75). Reducing the stringency of the cutHeight to merge highly correlated wheat modules enabled us to more easily compare significantly correlated wheat and Arabidopsis co-expression modules to identify groups of genes in wheat and Arabidopsis expressed at similar times in the day, and enable the comparison of whether similar phased transcripts in wheat and Arabidopsis had similar biological roles.

      Lines 327-334: I am not following the connection between 'response to abiotic stimulus' and the photoreceptor and light signaling proteins. At the start of this section (line 308) the authors say that the GO analysis was only done on rhythmically expressed genes but the reference to only one PHYA being rhythmic and yet multiple genes are shown in the plot in fig. S16. Does this mean that all the genes were shown and not just the rhythmic ones? This would explain why many of the PHY and CRY genes don't seem to have rhythms. This should be clarified better in the text or indicated in the plot which ones were called rhythmic. Since the first day following transfer is still the diel pattern from the entrainment condition, what does the PHY and CRY expression look like? Does it appear rhythmic under diel but lose rhythmicity in LL? It should be noted in the text that arrhythmicity in circadian conditions doesn't mean there isn't rhythmicity under diel conditions. This could be an additional explanation apart from the current one in the text that the regulation is at the level of protein stability/localization. Overall, this entire section is very long and entirely based on data shown in the supplemental material. I do appreciate having the individual gene plots that supplement Figure 4 and would suggest either providing a main figure to highlight a small subset of genes or pathways in this section or shorten it and focus on the results shown in the main figures.

      Upon reading the reviewer’s comment, we realize that we should have made our motivations and processes clearer within this section. We used the data filtered for rhythmicity to conduct the GO-enrichment analysis and then used that to identify processes which should be of interest for further investigation. We have now added an additional sentence (Ln 352-354) to explain this more clearly. We then considered the orthologs of well-known Arabidopsis gene networks and extracted their expression from our circadian dataset, whether rhythmic or not. Supplementary Table 10 contains all of the genes we investigated, their expression and their MetaCycle statistics. We have also indicated here which genes are plotted in which Supplementary Figure 18-20. The reasons for plotting non-rhythmic genes in some cases was that it illustrates the differences between circadian control in Arabidopsis versus wheat (as is the case for the PHY and CRY genes). We understand that it is useful to see at a glance which genes are classified as rhythmic or arrhythmic, so have now highlighted each row in Supplementary Table 10 to make this more intuitive, and added a read me tab.

      Regarding your point about oscillation under diel cycles, we agree that some transcripts will show rhythmic behaviour under entraining environments but not under constant conditions, and may perform time-of-day specific functions. However, these transcripts are likely to not be regulated by the circadian clock (at the transcriptional level) and so are not discussed in the context of a circadian transcriptome.

      For your interest, here is the full expression of PHY and CRY transcripts starting at ZT0:

      [Image]

      It is difficult to say for definite, but it seems likely that some of these photoreceptors will have rhythmic patterns of expression under diel cycles, but these rhythms do not endogenously persist under constant conditions.

      We appreciate your feedback that this section would benefit from cutting down of text and addition of a Figure to illustrate the text. We have now cut some of this section down and created a new main figure based on some of the oscillation plots from Supplementary Figure 18 and 19. We chose examples that reflect a conservation of relationships between transcripts of different peak phases, as we find it interesting that both species have similar patterns. (Main Figure 4, Ln 361--363, 382).

      1. Primary metabolism section: in terms of the supplemental figure, similar to the previous one I think it would declutter the plots if the genes that are not rhythmic were left out and simply indicate below the plot that they didn't meet the rhythmicity cutoff. This is another area where there is more discussion surrounding the supplemental figures than the main figure 4.

      One of the overall findings of this section was that many of the genes involved in Starch and T6P metabolism which are rhythmically expressed in Arabidopsis are not rhythmically expressed in wheat. We feel removing these genes from the results would detract from the importance of this finding. We have now edited Supplementary Table 10 to highlight which genes are classified as rhythmic. We have also added in a sentence to the start of this section which lays out our motivations for this analysis, summarises our findings and better connects the text with an explanation of Fig. 5 (Ln 408-430).

      For all gene expression figures there should be SD or SE shown either as bars or ribbons to represent the variation in replicates.

      Although we agree that error bars are informative for showing variation between replicates (and have added them to Fig. 1 to show differences within wheat triads) we feel that adding error bars to the gene expression plots in Fig. 3, Fig 4 and Supplementary Fig 19-20 would make these plots difficult to read, particularly where the wheat homeologs are very similar. The purpose of these gene expression plots is to compare circadian profiles in Arabidopsis and wheat orthologs rather than to claim significant differences in expression at any particular timepoint. This is fairly common in other circadian biology studies:

      https://www.pnas.org/doi/10.1073/pnas.1408886111 ,

      https://www.jbc.org/article/S0021-9258(17)49454-3/fulltext#seccestitle20 , https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0169923 , https://www.science.org/doi/10.1126/science.290.5499.2110?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed,

      https://www.frontiersin.org/articles/10.3389/fgene.2021.664334/full,

      https://www.science.org/doi/full/10.1126/science.1161403

      The replication level information for each gene has now been made available in Supplementary file 12.

      1. It would be very helpful to include the code used to generate the networks and perform the cross-correlation of eigengenes across networks should be included in the Methods. This will also save you from responding to email requests!

      Thank you for your comment, Code for the cross-correlation analysis, Loom plots and WGCNA network construction is now available from our groups GitHub repository: https://github.com/AHallLab/circadian_transcriptome_regulation_paper_2022/tree/main

      Minor Comments

      1. Figure 1, panel d: - The "unbalanced" triads that are depicted by the lighter shading; do these in fact have a different cutoff than the original rhythmic homoeologs? In the figure it says qThank you for bringing this to our attention, this has now been corrected.

      Hard to directly compare the GO term overlap in Figure 2f. Might be better to only show the results for the 4 pairs shown in b-e and put them side by side in the bubble plot.

      Thank you for this feedback, We have tried to make this plot easier to understand without losing any of the available information. Hopefully it is now more intuitive to understand which columns are being compared. We have changed the coloured lines to make them slightly wider, put the modules in corresponding coloured boxes and highlighted GO-slim terms shared by modules being compared.

      1. Line 314 -316 don't see supp tables 10, 11

      Our apologies, these files were missed previously from the upload are now available.

      1. For the selection of B. rapa circadian paralogs with similar and differential expression patterns (starting line 714), the authors choose a hard cut off of 0.001 (differentially patterned) OR 0.1 (similarly patterned). What happens to the genes that are between these two cut offs or is this a typo. Since all the other cutoffs for rhythmicity was set at 0.01 it seems likely that this is a typo.

      We have now clarified this in the methods, (Ln 807-822). This is not a typo, but it is a different method to the Metacycle approach we have used for our wheat data. We defined similar/different paralogs as characterized in Greenham et al, (2020) using DiPALM p-values. We chose these DiPALM p-value cut-offs as they gave us approximately equal numbers of paralogs in each category, which represent tails of similarly expressed or differently expressed circadian genes. We checked these cut-offs by calculating average Pearson’s correlation statistics between paralogs and found that differential Brassica paralogs had a mean Pearson correlation coefficient of 0.31 (SD = 0.43) and similar Brassica paralogs had a mean Pearson correlation of 0.75 (SD= 0.23) which confirms that the DiPALM method of defining expression patterns makes sense in the context of this analysis.

      Line 681. Should be supplemental Figure 6 not 9.

      1. References to most supplemental figures are not the correct number.

      2. Labels above the plots in Supp Fig5 do not match the legend.

      We apologise for these mistakes. We realize that we had mistakenly submitted an earlier draft of the Supplementary materials file, which was missing Supplementary Figure 5, 6 and 9 which therefore shifted the order of the remaining figures. This is now updated.

      1. Suppl table 7 should be as a separate .csv file or similar to be able to see the full table.

      This is a good suggestion, and we have added this.

      1. Line 723 should be B. rapa not B. napus.

      Thank you for catching this! Corrected.

      1. Figure 4. There is no explanation for what the black boxes represent in the figure legend.

      Thank you for your comment. Figure 4 (new Figure 5) has now been updated.

      Reviewer #1 (Significance (Required)):

      This study provides new insight into the circadian regulation of the transcriptome in a new allopolyploid. It adds a valuable resource to a growing collection of circadian studies in important crops and will greatly improve our efforts to learn more about the circadian control of important crop improvement traits. The dataset will be of interest to other plant circadian biologists as well as the general plant biology community who focus on monocot crops. My expertise is more on the transcriptomic side and I do not have the expertise to evaluate the phylogenetic work presented in this study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary Rees et al. present an RNAseq time course of bread wheat. Its recent polyploidisation is one motivation for this study as gene expression dosage is known to be important for clock function in other plants. The time course covers 3 days at sampling intervals of 4h of 2-week old wheat plants (all aerial tissues), in triplicates. The subsequent analysis of the RNAseq data includes analysis of the generated data by itself (e.g. GO analysis, rhythmicity, period and phase analysis, rhythmicity of transcription factor families as well as TF binding sites) as well as thorough comparison with published datasets of other species (Arabidopsis, Brassica rapa, Brachypodium dystachion). One of the key findings is that the mean period length and the period spread are larger in wheat than in these other species). Circadian clock genes largely have similar dynamics in wheat compared to Arabidopsis. In addition, one focus is the analysis of the dynamics of three genes of one triad and imbalance / balance of such triads. To the surprise of the authors, circadian regulated and clock genes were not necessarily balanced. Silencing is one of their explanation for imbalance of circadian genes as arrhythmic genes of one triad are typically those with the lowest expression level. Finally, the authors point out more examples of rhythmic processes and genes (photoreceptors and signalling, auxin, carbon metabolism) and their commonalities and differences with Arabidopsis.

      Major comments - The key conclusions and the data are convincing

      We thank the reviewer for their supportive comments.

      • line 120 and figure 1: In my opinion, q > 0.05 is not a good definition of arrhythmicity as non-significant q-values can result from either noise in spite of rhythmicity or from arrhythmicity. A more statistically sound way to detect arrhythmicity could for example be two-one-side tests (for example in the R package 'equivalence', e.g. see usage for time courses by Noordally et al. 2018, https://www.biorxiv.org/content/10.1101/287862v1).

      Thank you for pointing us in the direction of this package, we agree that choosing methods for circadian quantification and q-value cut-offs is always tricky and different approaches will perform better for noisier or non-sinusoidal waveforms. For future work, we will investigate the application of the suggested method in circadian rhythmicity analysis. However, we believe that the criteria used in this paper for rhythmicity quantification is suitable for addressing our questions, and overall, we are satisfied that rhythms with a q-value of >0.05 would also be classified by eye as being arrhythmic, and rhythms with a q-value Many other studies have used meta2d B.H q-values as a metric of rhythmicity: e.g. (https://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-022-03565-1 , https://link.springer.com/content/pdf/10.1186%2Fs12915-022-01258-7 , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8782462/pdf/pcbi.1009762.pdf )

      • lines 480-484 and intro: In the introduction, the authors write that expression levels of clock components are important for the function of the clock, and that this is one motivation for the current study where polyploidisation is expected to affect the expression levels of clock genes and their outputs. I wonder what answers or speculations this study provides in the end, or whether such answers / speculations should be made clearer. For example, do the authors think that the higher variability of periods in wheat could be a consequence of lower robustness (in addition to possible spatial differences that are mentioned) due to polyploidisation? Is anything known about the period of rhythms of close wheat relatives that did not undergo polyploidisation? Did you look at dampening over the time course in wheat vs. Arabidopsis?

      The point above is an interesting one, and we thank the reviewer for raising it. We agree that the high variability of periods in wheat may be a product of polyploidisation, as functional redundancy between homoeologs may allow a tolerance for less tightly regulated, non-dominantly expressed circadian transcripts. We have now added this hypothesis to our discussion: Ln536-550.

      In our comparative analysis of period distributions, we looked at periods of transcripts from a diploid relative of hexaploid wheat, Brachypodium distachyon. In Brachypodium, period lengths have around the same SD as in Arabidopsis but the mean period length is slightly longer (Supplementary table 2). We have now edited our results to make the relationship between wheat and Brachypodium clearer (ln 109-110).

      Minor comments:

      Introduction - lines 49: it is unclear what is meant by ppd-1 at this position of the sentence

      We agree this was unclear and have revised it to “notably the ppd-1 locus within TaPRR3/7” Ln 52

      • line 54/55: clarify that this refers to Arabidopsis thaliana

      Corrected.

      Results - line 69 and 76: cite references for these tools here (not only in the methods section)

      Corrected.

      • line 90-93: Why wouldn't the same thing happen on subsequent subjective evenings?

      Thank you for your comments. We have now combined our hypothesis for why wheat transcripts expressed at dusk have longer periods with the discussion about longer periods upon transfer to constant light. We think that the two processes are likely to be connected and have now placed them together in Ln 126-131.

      The behaviour of mean period lengths of wheat transcripts upon transfer to constant light was unexpected and we believe is quite interesting. One explanation is that the influence of the ongoing light zeitgeber when dusk was expected causes a delay in the expression of evening peaking genes which are delayed by the continuous light signal. Then, on subsequent evenings the influence of the diel dusk signal is ‘forgotten’ as the governance of the endogenous clock takes over. The very long period observed at 0-24h (28.61h) may be due to a phase shift rather than an intrinsic lengthening of period per se. Whether this trait is unique to wheat or can also be seen in other plant species is, to our knowledge, unknown.

      • line 118: what is your defined cutoff for significance of the Chi square test (p=0.03 not regarded significant?)

      The reviewer is completely right, we have now clarified this. Ln 145-149

      • figure 1h,i: In order for the reader to see whether A and D (Figure 1h) or A (figure 1i) are indeed arrhythmic, one would need to see plots with a normalisation as done in figure 1m for 1l.

      We have now removed the triad showing one rhythmic gene and two arhythmic genes (as Fig. 1h already illustrates this type of circadian imbalance) and replaced this with a side by side comparison of how imbalance in rhythmicity differs from imbalance in relative amplitude as suggested.

      • figure 1h-m (and others with circadian time course traces): could a measure of variation (e.g. SD, SEM, confidence interval) be plotted as a shaded region around the curves (unless they're so small that they are there but not visible)?

      We have now added error bars to these plots to show standard deviation between replicates, in Fig. 1 h, j, k and l. We could not think of an accurate way to display this information for the mean normalised data (Fig 1. i and m) so have not put error bars on these plots.

      • line 139 (also in 737 and 450): give reference to Ramirez-Gonzalez et al in the same style as the rest of the manuscript (number)

      Thank you for raising this, we believe we have corrected all in-text citations (both narrative and fully parenthetical form) for consistency with the APA format used by the majority of Review Commons affiliate journals.

      • Clustering (modules): What is the reason for choosing 9 clusters? Was this number optimised or chosen for other reasons?

      WGCNA uses an unsupervised clustering algorithm that works within the supplied parameters to determine the optimum number of clusters to explain the dataset, without prior specification of the number of clusters. We have amended the manuscript text to clarify this Ln237-239.

      • lines 280 - 284: The TaELF3-1D phenotype could be explained a bit better to the non-wheat specialist, for example by mentioning in the beginning of this set of sentences.

      Done (Ln 314-318).

      • The authors present an analysis of TF binding sites. Can they say something about binding sites in a less sophisticated manner, such as on some very well-known motifs in promoters like the evening element?

      We agree that this is a very interesting question, and one that we may investigate in more detail with our data in the future. In this paper, we performed a global analysis of wheat TFBS predicted from orthologous Arabidopsis TF targets. These targets have been experimentally validated in Arabidopsis using DAP-seq, but we have not validated that these binding sites exist in wheat promoters. We therefore took a tentative approach, and presented only enrichments at the superfamily level rather than talking about specific regulatory motifs.

      The evening element would fit most likely fit within the MYB or MYB-related TFBS superfamily, however the diversity of transcription factors in this family means that there is significant enrichment of these TFBS in multiple modules throughout the day (Supplementary Figure 11). In summary, a more in depth TFBS analysis of known circadian motifs is of great interest, but we feel would be a substantial work in its own right.

      • Figure 1h-l: If known or meaningful, it would be interesting to know the gene identities behind the triads shown, as in supplementary figure 5.

      These triads were selected as case studies to exemplify the ways in which we were defining imbalanced circadian triads. They have no particular relevance to the figure, but out of curiosity, these are the closest Arabidopsis orthologs for the triads displayed in Fig. 1:

      Triad 408 has highest identity to a hypothetical protein (AT4G26415).

      Triad 2454 is similar to AT3G07600, a heavy metal transport/detoxification superfamily protein

      Triad 13405 is similar to AT3G22360, encoding an ALTERNATIVE OXIDASE 1B, AOX1B

      Triad 10854 is similar to NSE4A, a δ-kleisin component of the SMC5/6 complex, possibly involved in synaptonemal complex formation (AT1G51130).

      Information about wheat gene names in each triad and their Arabidopsis orthologs can be viewed in Supplementary Table 11, so that readers can search for genes of particular interest to them.

      • Figure 4 and text: The illustration of starch metabolism is very helpful. However, I think the paper would benefit from giving a better reason for the selection of this specific set of processes, for example by relating these findings to functional differences in starch metabolism in the two species (in contrast to Arabidopsis, wheat stores little starch in leaves but uses fructans as main reserve carbohydrate)? Are there known differences in the dynamics of starch degradation during the night?

      The reviewer raises an interesting point, and we have now clarified in our results that the stated differences between starch regulation in Arabidopsis and wheat was part of the motivation behind studying this pathway. Starch is at the centre of plant primary metabolism as a carbon storage source and is arguably one of the most important features that breeders look for in regard to grain filling and yields. Additionally, it is of interest to circadian biologists as starch (as well as sucrose) have been shown to transiently cycle and to be regulated by the circadian clock. However, in wheat, carbon storage primarily uses sucrose rather than starch, and we have now added sucrose to Figure 5 to place it in this context. We think your suggestion has now improved our explanation for why we focused on starch in the manuscript, and we are grateful for your input (Ln 408-421).

      We also agree that the differences in the ways that Arbaidopsis and wheat utilise starch versus sucrose, and perhaps the role that fructans have in as a reserve carbohydrate and in protection against freezing in wheat may be one of the reasons we are seeing differences in circadian regulation of starch. We have now added this to our discussion (Ln 584-592).

      • Figure 4: triose-phosphates can be transported in and out of the chloroplast, as is illustrated in the figure. However, the illustration looks as though they are converted to hexose phosphates during the transport process. In order to be consistent with other transport processes of the figure (maltose and glucose), triose-phosphate should be repeated on the cytosolic side.

      We have now amended this (new Fig. 5). Thank you for your feedback.

      Methods - line 543: if I understand correctly that triplicates were collected and analysed for each time point, '18 samples' is mis-leading (18 time points would be more accurate).

      We agree this was badly worded. Changed Ln 615.

      Supplementary - Supplementary figure 3: x axis label very small and contains typo

      Now corrected. Also enlarged axis for Supplementary Figure 2.

      • Supplementary table 1: Romanowski et al 2020 (add year), or use ref. number citation style as in the rest of the manuscript

      Thank you for raising this, we have now hopefully corrected all in text citations (both narrative and fully parenthetical form) to be consistent with APA format used by the majority of Review commons affiliate journals.

      • Supplementary table 9, primary metabolism: does bold highlighting of Arabidopsis accession numbers have a meaning or is it accidental?

      We apologise that this was unclear. We have corrected this. Supplementary Table 10 now also has a “Read me” tab which explains that table.

      Reviewer #2 (Significance (Required)):

      I believe this is a precious, carefully generated and analysed dataset which many biologists will benefit from, beyond wheat or circadian specialists. The dataset expands the knowledge of circadian transcriptome regulation to an important crop and contributes a resource of which only a handful of others exist in other species. Many high impact papers on RNAseq include some follow-up on candidates, for example in Romanowski et al 2020, which is admittedly easier to do in Arabidopsis than wheat due to the availability of genetic resources.

      My expertise: Plant circadian clock (Arabidopsis), dataset analysis (but not specifically for RNAseq)

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript is based on the analysis of a single experiment consisting in transcriptomic profiling of one (hexaploid) wheat genotype along 3 days (samples taken every 4 hours). The experiment is performed in constant light conditions, allowing detection of transcripts controlled by the circadian clock. The bioinformatic analysis studies the dynamics of the different homoeologous transcript in the polyploid genome and compares cycling transcripts in wheat with what is known from Arabidopsis.

      The manuscript is well written, the methods are correct, the analysis performed is sufficiently extensive and the figures are clear. The manuscript finds interesting expression patterns among homeologous genes, and goes into detail on important differences in circadian regulation of relevant gene families between Arabidopsis and wheat. The work is purely descriptive and does not aim at associations with physiological phenotypes, but the bioinformatic analysis is very thorough and uncovers interesting examples.

      Only one caveat: For what I gather, there is no replication in the RNA-seq experiment, although the exact method does not appear in the text. From the Methods section: "tissue was sampled every 4h for 3 days (18 samples in total)" and "At each timepoint, we sampled the entire aerial tissue from 3 replicate plants". Whether these samples were pooled or not is not described. The "Data Availability" section links to 18 RNA-seq paired end libraries, which suggest that the replicates were pooled, although some type of barcoding might have been used. The text should mention if the replicates were pooled or not, and, if so, what was the method used for poling (tissue, RNA or libraries). Even in the case of no biological replication the manuscript brings interesting insights into wheat transcriptomics and circadian biology. The editor (or the rules of the journal) should decide if they accept articles with no "real" biological replication (I am sure we all understand by now the benefits and limitations of pooling biological replicates into a single RNA-seq library).

      There was replication within the RNA sequencing experiment, and we apologise that this was unclear from our manuscript. Each timepoint consisted of three independent biological replicates. We have now created a new “Experimental context” section in the results to explain this (Ln 74-82) and have clarified in the methods how our data was processed (Ln 609-615 and 636-638).

      We have now included an additional matrix with TPMs at the replicate level to assist readers in looking at specific genes of interest (Supplementary Table 12).

      Minor comments:

      The description of the experimental setup in the first sentence of the Results section is too brief. Could you please talk about for how long the experiment was running? At what intervals the samples were taken? What conditions were used?

      We apologise that this was unclear. We hope that the new Experimental Context section, added in response to comments from several reviewers, makes this much clearer, alongside the clarification in the methods (Ln 609-615 and 636-638).

      Line 280: "...due *to* an introgression..."

      Corrected. Ln 315

      The legend of Figure 3l says elf4 instead of elf3

      We thank the reviewer for noticing this mistake that we have now corrected.

      Line 306 "says Supplementary Note 7 instead of Supplementary Note 7

      We are not sure what is to be corrected here!

      Reviewer #3 (Significance (Required)):

      This works advances our knowledge on how genome wide expression levels are controlled by the circadian clock in polyploids. Although previous works had performed similar analyses in other polyploid plants, this is the first time this is done in an hexaploid. This work is a starting step to understand gene regulation in this important crop, and have interest for researchers working in fundamental and applied plant biology.

      Thank you for your positive comments and your feedback in improving this manuscript. We would like to clarify that to our knowledge, this work presents the first analysis of a circadian transcriptome in a polyploid crop. The work by Greenham et al, although undoubtably providing insight into circadian regulation of ancient paralogs, was performed in the diploid Brassica rapa.

    1. • About 99% of the time, the right time is right now. • No one is as impressed with your possessions as you are. • Dont ever work for someone you dont want to become. • Cultivate 12 people who love you, because they are worth more than 12 million people who like you. • Dont keep making the same mistakes; try to make new mistakes. • If you stop to listen to a musician or street performer for more than a minute, you owe them a dollar. • Anything you say before the word “but” does not count. • When you forgive others, they may not notice, but you will heal. Forgiveness is not something we do for others; it is a gift to ourselves. • Courtesy costs nothing. Lower the toilet seat after use. Let the people in the elevator exit before you enter. Return shopping carts to their designated areas. When you borrow something, return it better shape (filled up, cleaned) than when you got it. • Whenever there is an argument between two sides, find the third side. • Efficiency is highly overrated; Goofing off is highly underrated. Regularly scheduled sabbaths, sabbaticals, vacations, breaks, aimless walks and time off are essential for top performance of any kind. The best work ethic requires a good rest ethic. • When you lead, your real job is to create more leaders, not more followers. • Criticize in private, praise in public. • Life lessons will be presented to you in the order they are needed. Everything you need to master the lesson is within you. Once you have truly learned a lesson, you will be presented with the next one. If you are alive, that means you still have lessons to learn. • It is the duty of a student to get everything out of a teacher, and the duty of a teacher to get everything out of a student. • If winning becomes too important in a game, change the rules to make it more fun. Changing rules can become the new game. • Ask funders for money, and they’ll give you advice; but ask for advice and they’ll give you money. • Productivity is often a distraction. Don’t aim for better ways to get through your tasks as quickly as possible, rather aim for better tasks that you never want to stop doing. • Immediately pay what you owe to vendors, workers, contractors. They will go out of their way to work with you first next time. • The biggest lie we tell ourselves is “I dont need to write this down because I will remember it.” • Your growth as a conscious being is measured by the number of uncomfortable conversations you are willing to have. • Speak confidently as if you are right, but listen carefully as if you are wrong. • Handy measure: the distance between your fingertips of your outstretched arms at shoulder level is your height. • The consistency of your endeavors (exercise, companionship, work) is more important than the quantity. Nothing beats small things done every day, which is way more important than what you do occasionally. • Making art is not selfish; it’s for the rest of us. If you don’t do your thing, you are cheating us. • Never ask a woman if she is pregnant. Let her tell you if she is. • Three things you need: The ability to not give up something till it works, the ability to give up something that does not work, and the trust in other people to help you distinguish between the two. • When public speaking, pause frequently. Pause before you say something in a new way, pause after you have said something you believe is important, and pause as a relief to let listeners absorb details. • There is no such thing as being “on time.” You are either late or you are early. Your choice. • Ask anyone you admire: Their lucky breaks happened on a detour from their main goal. So embrace detours. Life is not a straight line for anyone. • The best way to get a correct answer on the internet is to post an obviously wrong answer and wait for someone to correct you. • You’ll get 10x better results by elevating good behavior rather than punishing bad behavior, especially in children and animals. • Spend as much time crafting the subject line of an email as the message itself because the subject line is often the only thing people read. • Don’t wait for the storm to pass; dance in the rain. • When checking references for a job applicant, employers may be reluctant or prohibited from saying anything negative, so leave or send a message that says, “Get back to me if you highly recommend this applicant as super great.” If they don’t reply take that as a negative. • Use a password manager: Safer, easier, better. • Half the skill of being educated is learning what you can ignore. • The advantage of a ridiculously ambitious goal is that it sets the bar very high so even in failure it may be a success measured by the ordinary. • A great way to understand yourself is to seriously reflect on everything you find irritating in others. • Keep all your things visible in a hotel room, not in drawers, and all gathered into one spot. That way you’ll never leave anything behind. If you need to have something like a charger off to the side, place a couple of other large items next to it, because you are less likely to leave 3 items behind than just one. • Denying or deflecting a compliment is rude. Accept it with thanks, even if you believe it is not deserved. • Always read the plaque next to the monument. • When you have some success, the feeling of being an imposter can be real. Who am I fooling? But when you create things that only you — with your unique talents and experience — can do, then you are absolutely not an imposter. You are the ordained. It is your duty to work on things that only you can do. • What you do on your bad days matters more than what you do on your good days. • Make stuff that is good for people to have. • When you open paint, even a tiny bit, it will always find its way to your clothes no matter how careful you are. Dress accordingly. • To keep young kids behaving on a car road trip, have a bag of their favorite candy and throw a piece out the window each time they misbehave. • You cannot get smart people to work extremely hard just for money. • When you don’t know how much to pay someone for a particular task, ask them “what would be fair” and their answer usually is. • 90% of everything is crap. If you think you don’t like opera, romance novels, TikTok, country music, vegan food, NFTs, keep trying to see if you can find the 10% that is not crap. • You will be judged on how well you treat those who can do nothing for you. • We tend to overestimate what we can do in a day, and underestimate what we can achieve in a decade. Miraculous things can be accomplished if you give it ten years. A long game will compound small gains to overcome even big mistakes. • Thank a teacher who changed your life. • You cant reason someone out of a notion that they didn’t reason themselves into. • Your best job will be one that you were unqualified for because it stretches you. In fact only apply to jobs you are unqualified for. • Buy used books. They have the same words as the new ones. Also libraries. • You can be whatever you want, so be the person who ends meetings early. • A wise man said, “Before you speak, let your words pass through three gates. At the first gate, ask yourself, “Is it true?” At the second gate ask, “Is it necessary?” At the third gate ask, “Is it kind?” • Take the stairs. • What you actually pay for something is at least twice the listed price because of the energy, time, money needed to set it up, learn, maintain, repair, and dispose of at the end. Not all prices appear on labels. Actual costs are 2x listed prices. • When you arrive at your room in a hotel, locate the emergency exits. It only takes a minute. • The only productive way to answer “what should I do now?” is to first tackle the question of “who should I become?” • Average returns sustained over an above-average period of time yield extraordinary results. Buy and hold. • It’s thrilling to be extremely polite to rude strangers. • It’s possible that a not-so smart person, who can communicate well, can do much better than a super smart person who can’t communicate well. That is good news because it is much easier to improve your communication skills than your intelligence. • Getting cheated occasionally is the small price for trusting the best of everyone, because when you trust the best in others, they generally treat you best. • Art is whatever you can get away with. • For the best results with your children, spend only half the money you think you should, but double the time with them. • Purchase the most recent tourist guidebook to your home town or region. You’ll learn a lot by playing the tourist once a year. • Dont wait in line to eat something famous. It is rarely worth the wait. • To rapidly reveal the true character of a person you just met, move them onto an abysmally slow internet connection. Observe. • Prescription for popular success: do something strange. Make a habit of your weird. • Be a pro. Back up your back up. Have at least one physical backup and one backup in the cloud. Have more than one of each. How much would you pay to retrieve all your data, photos, notes, if you lost them? Backups are cheap compared to regrets. • Dont believe everything you think you believe. • To signal an emergency, use the rule of three; 3 shouts, 3 horn blasts, or 3 whistles. • At a restaurant do you order what you know is great, or do you try something new? Do you make what you know will sell or try something new? Do you keep dating new folks or try to commit to someone you already met? The optimal balance for exploring new things vs exploiting them once found is: 1/3. Spend 1/3 of your time on exploring and 2/3 time on deepening. It is harder to devote time to exploring as you age because it seems unproductive, but aim for 1/3. • Actual great opportunities do not have “Great Opportunities” in the subject line. • When introduced to someone make eye contact and count to 4. You’ll both remember each other. • Take note if you find yourself wondering “Where is my good knife? Or, where is my good pen?” That means you have bad ones. Get rid of those. • When you are stuck, explain your problem to others. Often simply laying out a problem will present a solution. Make “explaining the problem” part of your troubleshooting process. • When buying a garden hose, an extension cord, or a ladder, get one substantially longer than you think you need. It’ll be the right size. • Dont bother fighting the old; just build the new. • Your group can achieve great things way beyond your means simply by showing people that they are appreciated. • When someone tells you about the peak year of human history, the period of time when things were good before things went downhill, it will always be the years of when they were 10 years old — which is the peak of any human’s existence. • You are as big as the things that make you angry. • When speaking to an audience it’s better to fix your gaze on a few people than to “spray” your gaze across the room. Your eyes telegraph to others whether you really believe what you are saying. • Habit is far more dependable than inspiration. Make progress by making habits. Dont focus on getting into shape. Focus on becoming the kind of person who never misses a workout. • When negotiating, dont aim for a bigger piece of the pie; aim to create a bigger pie. • If you repeated what you did today 365 more times will you be where you want to be next year? • You see only 2% of another person, and they see only 2% of you. Attune yourselves to the hidden 98%. • Your time and space are limited. Remove, give away, throw out things in your life that dont spark joy any longer in order to make room for those that do. • Our descendants will achieve things that will amaze us, yet a portion of what they will create could have been made with today’s materials and tools if we had had the imagination. Think bigger. • For a great payoff be especially curious about the things you are not interested in. • Focus on directions rather than destinations. Who knows their destiny? But maintain the right direction and you’ll arrive at where you want to go. • Every breakthrough is at first laughable and ridiculous. In fact if it did not start out laughable and ridiculous, it is not a breakthrough. • If you loan someone $20 and you never see them again because they are avoiding paying you back, that makes it worth $20. • Copying others is a good way to start. Copying yourself is a disappointing way to end. • The best time to negotiate your salary for a new job is the moment AFTER they say they want you, and not before. Then it becomes a game of chicken for each side to name an amount first, but it is to your advantage to get them to give a number before you do. • Rather than steering your life to avoid surprises, aim directly for them. • Dont purchase extra insurance if you are renting a car with a credit card. • If your opinions on one subject can be predicted from your opinions on another, you may be in the grip of an ideology. When you truly think for yourself your conclusions will not be predictable. • Aim to die broke. Give to your beneficiaries before you die; it’s more fun and useful. Spend it all. Your last check should go to the funeral home and it should bounce. • The chief prevention against getting old is to remain astonished.

      So much wisdom and stuff to think about here.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The connectivity patterns along the anterior-posterior hippocampal axis broadly follow an anterior-posterior cortical bias, such that posterior regions, e.g. the visual cortex, are preferentially connected to the hippocampal tail, and anterior regions, e.g. the temporal pole, are preferentially connected to the hippocampal head. The authors focus on the twenty regions with the highest connectivity profiles, which appears to capture the majority of all connections. However, some of the present structural connectivity patterns differ in interesting ways from previously described cortical networks reported in resting-state fMRI studies. Most notably, the medial PFC and orbitofrontal regions combined account for less than 1% of all connections in the present investigation (Table S1 & S2). This is an interesting contrast to functional investigations which tend to find that these regions cluster with the aHPC (e.g., Adnan et al. 2016 Brain Struct Func; Barnett et al. 2021 PLoS Biol; Robinson et al. 2016 NeuroImage). In contrast, the present DWI results suggesting preferential pHPC-medial parietal connectivity dovetail with those observed in fMRI studies. It seems important to discuss why these differences may arise: whether this is a differentiation between structural and functional networks, or whether this is due to a difference in methods.

      We thank Reviewer 1 for making this important point and agree that these observations are deserving of further expansion. We have now included additional text where we place the surprising observation of sparse connectivity between PFC regions and the hippocampus more firmly in the context of recent evidence and argue that these observations suggest a potential differentiation between structural and functional networks.

      We have included the following text in the discussion (pp. 16-17, lines 439-457);

      “While many of our observed anatomical connections dovetail nicely with known functional associations, patterns of anatomical connectivity strength did not always mirror well characterised functional associations between the hippocampus and cortical areas. For example, a surprising observation from our study was that only weak patterns of anatomical connectivity were observed between the hippocampus and the ventromedial prefrontal cortex (vmPFC) and other frontal cortical areas. This lies in contrast to well documented functional associations between these regions (46-48). Our observation, however, supports a growing body of evidence that direct anatomical connectivity between the hippocampus and areas of the PFC may be surprisingly sparse in the human brain. For example, Rosen and Halgren (49) recently reported that long range connections between the hippocampus and functionally related frontal cortical areas may constitute fewer than 10 axons/mm2 and more broadly observed that axon density between spatially distant but functionally associated brain areas may be much lower than previously thought. Our observation of sparse anatomical connectivity between the hippocampus and PFC mirrors this recent work and suggests a potential differentiation between structural and functional networks as they relate to the hippocampus. It remains possible, however, that methodological factors may contribute to these differences. We return to this point later in the discussion. A future dedicated study aimed at assessing whether the well characterised functional associations between the hippocampus and vmPFC are driven by sparse direct connections or primarily by intermediary structures is necessary to address this issue in an appropriate level of detail.”

      2) While the analytic pipeline is described in sufficient detail in the Methods, it is somewhat unclear to a non-DWI expert what the major methodological advance is over prior approaches. The authors refer to a tailored processing pipeline and 'an advance in the ability to map the anatomical connectivity (p. 5), but it's not immediately clear what these entail. It would be useful to highlight the key methodological differences or advances in the Introduction to help with the interpretation of the similarities and differences with previous connectivity findings.

      We have now included a brief description in the Introduction highlighting the key methodological advances used in the current study.

      We have included the following text in the Introduction (pp. 4-5, lines 130-144);

      “In typical fibre-tracking studies, we cannot reliably ascertain where streamlines would naturally terminate, as they have been found to also display unrealistic terminations, such as in the middle of white matter or in cerebrospinal fluid (39). While methods have been proposed to ensure more meaningful terminations (40), for example, with terminations forced at the grey matter-white matter interface (gmwmi), this approach is still not appropriate for characterising terminations within complex structures like the hippocampus. A key methodological advance of our approach was to remove portions of the gmwmi inferior to the hippocampus (where white matter fibres are known to enter/leave the hippocampus). This allowed streamlines to permeate the hippocampus in a biologically plausible manner. Importantly, we combined this with a tailored processing pipeline that allowed us to follow the course of streamlines within the hippocampus and identify their ‘natural’ termination points. These simple but effective methodological advances allowed us to map the spatial distribution of streamline ‘endpoints’ within the hippocampus. We further combined this approach with state-of-the-art tractography methods that incorporate anatomical information (40) and assign weights to each streamline (41) to achieve quantitative connectivity results that more faithfully reflect the biological accuracy of the connection’s strength (39).”

      3) Related to the point above, it was a bit unclear to me how the present connections map onto canonical white matter tracts. In Fig., 4A, the tracts are shown for a single participant, but it would be helpful to map or quantify know how many of the connections for a given hippocampal subregion are associated with a given tract to provide a link to prior work or clarify the approach. A fairly large body of prior research on hippocampal white matter connectivity has focused on the fornix, but it's a little difficult to align these prior findings with the connectivity density results in the current paper.

      We thank Reviewer 1 for this comment and agree this would be an interesting avenue to pursue. However, the reliable segmentation of white matter fibre bundles is currently an area of contention in the DWI community. This pervasive and problematic issue was highlighted in a recently published large multi-site study that revealed a high degree of variability in how white matter bundles are defined, even from the same set of whole-brain streamlines (Schilling et al., 2021, Neuroimage. Nov; 243:118502. https://pubmed.ncbi.nlm.nih.gov/34433094/). This means that, even if we were to choose a particular method to segment white matter bundles, our results would not be readily translatable to those reported in previous DWI studies. This significantly limits meaningful comparison and/or interpretation. Indeed, such an approach may paradoxically take away from the detailed characterisations we have achieved in the current study. As highlighted in that study, it is now paramount that consensus is reached in this field to define criteria to reliably and reproducibly define white matter fibre bundles. Once that is achieved, we plan to conduct a follow-up study to characterise this in more detail, with bundles that will be able to be reliably reproduced by others.

      4) Finally, on a more speculative note: based on the endpoint density maps, there seems to be a lot of overlap between the EDMs associated with different cortical regions (which makes sense given the subregion results). Does this effectively mean that the same endpoints may be equally connected with multiple different cortical regions? Part of the answer can be found in Fig. 3D showing the combined EDM for three different regions, but how spatially unique is each endpoint? This is likely not a feasible question to address analytically but it might be helpful to provide some more context for what these maps represent and how they might relate to differences across individuals.

      The primary aim of the current analysis was to characterise broad patterns of endpoint density captured by our averaged group level analysis. However, Reviewer 1 is astute in assuming that, although there is overlap in the group averaged endpoint density maps (EDMs) associated with different cortical areas, at the single participant level, there are both overlaps and spatial uniqueness in the location of individual endpoints. For example, while group level analysis revealed that area V1 and area V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus, when visualising individual endpoints associated with each of these areas at the single participant level, we can see that some endpoints overlap while others display spatially unique patterns (see image below). Although a more in-depth analysis of individual variability in these patterns was beyond the scope of this investigation (as noted on Page14; Lines 379-381), we agree with Reviewer 1 that this is an important point to note in the manuscript. We have, therefore, included additional text touching on this and have included a new Supplementary Figure (Page 42; also see below) to emphasise that, at the single participant level, different cortical areas display both overlapping and spatially unique endpoints within specific regions of the hippocampus (using areas V1 and V2 as an example).

      We have included the following text in the Results section (pp. 14, lines 370-379);

      “Finally, while we observed clear overlaps in the group averaged EDMs associated with specific cortical areas, a closer inspection of individual endpoints at the single participant level revealed that endpoints associated with different cortical areas displayed both overlapping and spatially unique characteristics within these areas of overlap. For example, at the group level, areas V1 and V2 showed preferential connectivity with overlapping regions of the posterior medial hippocampus (see Supplementary Figure S5) while, at the single participant level, individual endpoints associated with each of these areas display both overlapping and spatially unique patterns (see Supplementary Figure S6). This suggests that, while specific cortical areas display overlapping patterns of connectivity within specific regions of the hippocampus, subtle differences in how these cortical regions connect within these areas of overlap likely exist.”

      Reviewer #2 (Public Review):

      Dalton and colleagues present an interesting and timely manuscript on diffusion weighted imaging analysis of human hippocampal connectivity. The focus is on connectivity differences along the hippocampal long axis, which in principle would provide important insights into the neuroanatomical underpinnings of functional long axis differences in the human brain. In keeping with current models of long-axis organisation, connectivity profiles show both discrete areas of higher connectivity in long axis portions, as well as an anterior-to-posterior gradient of increasing connectivity. Endpoint density mapping provided a finer grained analysis, by allowing visualisation of the spatial distribution of hippocampal endpoint density associated with each cortical area. This is particularly interesting in terms of the medial-lateral distribution with hippocampal head, body and tail. Specific areas map to precise hippocampal loci, and some hippocampal loci receive inputs from multiple cortical areas.

      This work is well-motivated, well-written and interesting. The authors have capitalised on existing data from the Human Connectome Project. I particularly like the way the authors try to link their findings to human histological data, and to previous NHP tracing results.

      Many thanks.

      1) There are some important surprises in the results, particularly the relatively strong connectivity between hippocampus and early visual areas (including V1) and low connectivity with areas highly relevant from functional perspectives, such as the medial prefrontal cortex (rank order by strength of connectivity 7th and 78th of all cortical structures, respectively). This raises a concern that the fibre tracking method may be joining hippocampal connections with other tracts. In particular, given the anatomical proximity of the lateral geniculate nucleus to the body and tail of the hippocampus, the reported V1 connectivity potentially reflects a fusion of tracked fibres with the optic radiation. In visualizing the putative posterior hippocampus-to-V1 projection (Figure 4B, turquoise), the tract does indeed resemble the optic radiation topography. Although care was taken to minimise the hippocampus mask 'spilling' into adjacent white matter, this was done with focus on the hippocampal inferior margin, whereas the different components of the optic radiation lie lateral and superior to the hippocampus.

      We agree with Reviewer 2 that our observations relating to area V1 could be the result of limitations inherent to current tracking methodology. Indeed, probabilistic tracking can result in tracks mistakenly ‘jumping’ between fibre bundles. Unfortunately, primarily due to limitations in image resolution, we do not believe that we can categorically rule this possibility out in the current dataset beyond the measures we have already taken in our analysis pipeline. We have now included additional text in the Discussion acknowledging and emphasising this possible limitation of our study.

      We have included the following text in the Discussion section (Page 25; Lines 694-699);

      “Also, we cannot rule out that some connections observed in the current study may result from limitations inherent to current probabilistic fibre-tracking methods whereby tracks can mistakenly ‘jump’ between fibre bundles (e.g. for connections between the posterior medial hippocampus and area V1 due to the proximity to the optic radiation), especially in “bottleneck” areas. Again, future work using higher resolution data may allow more targeted investigations necessary to confirm or refute the patterns we observed here.”

      Beyond the possibility of tracks jumping between fibre bundles, we feel it is important to emphasise that an integral part of our analysis was the detailed attention we took to minimise mask ‘spillage’ of the entire hippocampus mask. It is not the case that we primarily focussed on inferior portions of the hippocampus as stated by Reviewer 2. Equal focus was paid to medial, lateral and superior portions of the mask which lie adjacent to visual thalamic nuclei, the optic radiation posteriorly and a number of other structures. We can see that our description relating to this lacked the necessary detail to convey this important point clearly and we apologise for the confusion. We have, therefore, included additional text in the Methods section clarifying this further.

      We have included the following text in the Methods section (Page 26; Lines 751-755);

      “We took particular care to ensure that all boundaries of the hippocampus mask (including inferior, superior, medial and lateral aspects) did not encroach into adjacent white or grey matter structures (e.g., amygdala, thalamic nuclei). This minimised the potential fusion of white matter tracts associated with other areas with our hippocampus mask.”

      These points notwithstanding, our results support recently observed structural and functional associations between the posterior hippocampus and early visual processing areas. We agree that these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with primary sensory cortices in the human brain and we have now included a brief comment relating to this in the Discussion.

      We have included the following text in the Discussion (Page 23-24; Lines 638-644);

      “However, this observation supports recent reports of similar patterns of anatomical connectivity as measured by DWI in the human brain (38) and functional associations between these areas (43, 60). Collectively, these findings are potentially of great conceptual importance for how we think about the hippocampus and its connectivity with early sensory cortices in the human brain and open new avenues to probe the degree to which these regions may interact to support visuospatial cognitive functions such as episodic memory, mental imagery and imagination.”

      2) A second concern pertains to the location of endpoint densities within the hippocampus from the cortical mantle. These are almost entirely in CA1/subiculum/presubiculum. It is, however, puzzling why, in Supp Figure 2, the hippocampal endpoints for entorhinal projections is really quite similar to what is observed for other cortical projections (e.g., those from area TF). One would expect more endpoint density in the superior portions of the hippocampal cross section in head and body, in keeping with DG/CA3 termination. I note that streamlines were permitted to move within the hippocampus, but the highest density of endpoints is still around the margins.

      We agree with Reviewer 2 that, in relation to the entorhinal cortex, we would expect to see more endpoint density in areas aligning with the dentate gyrus (DG) and CA3 regions of the hippocampus. We noted in the discussion that “Despite the high-quality HCP data used in this study, limitations in spatial resolution likely restrict our ability to track particularly convoluted white-matter pathways within the hippocampus and our results should be interpreted with this in mind”. We believe that this limitation applies to pathways between the entorhinal cortex and DG/CA3. We have now included additional text specifically noting that this limitation likely affects our ability to track streamlines as they relate to DG/CA3. A targeted investigation of this effect using higher resolution diffusion MRI data may help address this issue, and this will be the subject of future work.

      We have included the following text in the Discussion (Page 25; Lines 690-693);

      “Indeed, this may explain the surprising lack of endpoint density observed in the DG/CA4-CA3 regions of the hippocampus where we would expect to see high endpoint density associated with, for example, the entorhinal cortex which is known to project to these regions. Future dedicated studies using higher resolution data are needed to assess these pathways in greater detail.”

      3) On a related point, the use of "medial" and "lateral" hippocampus can be confusing. In the head, CA2/3 is medial to CA1, but so are subicular subareas, just that the latter are inferior.”

      We agree that applying the terms ‘medial’ and ‘lateral’ to our three-dimensional representations can lead to some ambiguities and confusion. We have included a new description defining our use of these terms in the Results section.

      We have included the following text in the Results section (Page 10; Lines 268-273).

      “In relation to nomenclature, our use of the term ‘medial’ hippocampus refers to inferior portions of the hippocampus aligning with the distal subiculum, presubiculum and parasubiculum. Our use of the term ‘lateral’ hippocampus refers to inferior portions of the hippocampus aligning with the proximal subiculum and CA1. In instances that we refer to portions of the hippocampus that align with the DG or CA3/2 we state these regions explicitly by name”.

    1. Discussion, revision and decision


      Decision

      Verified with reservations: The content is scientifically sound, but has shortcomings that could be improved by further studies and/or minor revisions.

      Dr. Bañuelos: Verified manuscript

      Dr. Morris: Verified with reservations


      Revision

      Response to Reviewer 1 (Dr. Bañuelos)

      1. Most importantly, I would like to see an introduction that explains the authors’ general arguments about grading changes – including the trajectory of these changes at Dalhousie and why this arc contributes to our knowledge of the history of higher education more broadly. Then, the authors might continually remind us of the arc they present at the outset of their paper – especially when they are highlighting a piece of evidence that illustrates their central argument. To me, the quotes from students and faculty responding to grading changes are among the most interesting parts of the paper and placing these in additional context should make them shine even more brightly!

      Our Response: Thank you so much for your thoughtful review. We have added a larger new introduction section of the paper (paragraphs 1-5 in the latest draft are new) that outlines the general importance of the topic, the Canadian context, details on Dalhousie University, and our overall thesis statement (i.e., most decisions were to improve the external communication value of grades). Moreover, we have added three new student quotes form the Dalhousie Gazette to build a stronger picture for student reactions, and to build a better case for our overall thesis statement (i.e., that changes in grading were often to increase the external communication value of grades). Moreover, throughout we have added some details on the overall funding trajectory for institutions in Canada that created some pressure to standardize grading. We think that these changes have improved the manuscript.

      1. I’d like to read a little more about Dalhousie itself – why it is either a remarkable or unremarkable place to study changes in grading policies. Is it representative of most Canadian universities and thus, a good example of how grading changes work in this national context? Is it unlike any other institution of higher education and thus, tells us something important about grades that we could not learn from other case studies? I don’t think this kind of description needs to be particularly long, but it should be a little more involved than the brief sentences the authors currently include (p.3, paragraph 1) and should explain the choice of this case.

      Our Response: This comment revealed that two additional pieces of context were needed for the introduction: (a) some national context for higher education policy in Canada and (b) some extended description of Dalhousie University when compared to other universities in Canada. To this end, two new paragraphs have been added to the paper (paragraphs 2 & 3 in the current draft).

      Notably, Jones (2014) notes that “Canada may have the most decentralized approach to higher education than any other developed country on the planet” (pg 20). With this in mind, any historical review of education policy is by necessity specific to province and institution – that is, the information can be placed in its context, but resists wide generalization to the country as a whole. In the newest draft, we tried to describe the national, provincial, and institutional context in some more detail in paragraphs 2 & 3.

      1. I’d also like to know more about the archival materials the authors used. The authors mention that they drew from “Senate minutes, university calendars, and student newspapers” (p. 3), but what kinds of conversations about grades did these materials include? At various points, the authors engage in “speculation” (e.g. p.4) about why a particular change occurred. This is just fine and, in fact, it’s good of the authors to remind us that they are not really sure why some of these shifts happened. But, they might go one step further and tell us why they have to speculate. Were explicit discussions of grading changes – including in inter- and intradepartmental letters and memo, reports, and other documents – not available in these archives? Why are these important discussions absent from the historical record?

      Our Response: We have added a new paragraph (paragraph 4) to the paper discussing the sources in some more detail. It is true that the verbatim discussions are frequently absent from the record, especially earlier in history – or if they exist, we have not found them! Instead, we frequently are reviewing meeting minutes or committee reports, which are summaries of discussions. As we now note in the paper, “Thus, the sources used showed what policy changes were implemented, when they were implemented, and a general sense of whether there was opposition to changes; however, there were notable gaps in faculty and student reactions to grade policy changes, as these reactions were frequently not written down and archived.”

      This gap was most apparent in the Senate minutes around the 1940s, where I (the first author) could not find any direct discussions of why changes were implemented. Under the 1937-1947 heading, we more clearly indicate that the rationale for the changes was absent from the Senate minutes during this period. I add some further speculation on why these records might be absent, based on summaries from Waite (1998b); specifically, the university president of the time often made unilateral decisions, circumventing Senate, which might account for why the changes are absent from the records.

      This will hopefully make the limitations of what can be learned from this approach more apparent.

      1. At various points, the authors make references to the outside world – for example, WWII (p. 5), the Veteran’s Rehabilitation Act (pp. 6-7), and British versus American grading schemas (p. 6). But, these references are brief and seem almost off-handed. I know space is limited, but putting these grading changes in their broader context might help make the case for why this study is interesting and important. Are the changes in the 1940s, for example, related to the ascendance of one national graduate education model over another (e.g. American versus British)? Are there any data on how many Canadian undergraduates enrolled in British versus American graduate programs over time? If so, I would share any information you might have on these broader trends.

      Our Response: To our knowledge, there isn’t any comparable report to what we’ve written here documenting the transition from British “divisions” to American “letter grades” in Canadian Universities, making our report novel in this regard. It might well be that a similar historical arc exists in many of the 223 public and private universities in Canada, but we don’t believe such data exists in any readily accessible way – excepting perhaps undergoing a similar deep dive into historical documents at each respective institution! So, we do not have the answer to your question: “Are there any data on how many Canadian undergraduates enrolled in British versus American graduate programs over time?” However, we did add one reference which provided a snapshot point of comparison in 1960, noting in the paper “Baldwin (1960) notes that the criteria for “High First Class” grades in the humanities was around 75-80% at Universities of Toronto, Alberta, and British Columbia in 1960, suggesting that Dalhousie’s system was similar to other research-intensive universities around this time.” That said, there are a few major national events related to the funding of universities in Canada that we have elaborated on in the text to address the spirit of your recommendation for describing the national context:

      a) In the “Late 1940s” section of the paper, we added: “Though Dalhousie had an unusually high proportion of veterans enrolled relative to other maritime universities during this period (Turner, 2011), the Veteran’s Rehabilitation Act was a turning point for large increases in enrollment and government funding Canada-wide, at least until the economic recession of the 1970s (Jones, 2014).”

      b) In the 1990s, there were major government cuts to funding, creating challenging financial times for the university. We discuss the funding pressures that likely contributed to standardization of grading during this time by saying the following in the 1980s-2000s section: “Starting in in the 1980s-1990s there were major government cuts to university funding nation-wide, with the cuts becoming more severe in the 1990s (Jones, 2014; Higher Education Strategy Associates, 2021). Because of the nature of the funding formulas, cuts in Nova Scotia were especially deep. Beyond tuition increases, university administrators knew that obtaining external research grants, Canada Research Chairs, and scholarship funding was one of the few other ways for a university to balance budgets, so there was extra pressure to be competitive in these pools. […] The increased standardization was likely related to increased financial pressures at this time – standardization is an oft-employed tool to deal with ever-increasing class sizes with no additional resources.”

      c) In the 2010s section of the paper, we added context to how universities in country-wide have become increasingly dependent on tuition fees for funding: “Following the 2008 recession, federal funding decreased again (Jones, 2014; Higher Education Strategy Associates, 2021); however, this time universities tended to balance budgets by increasing tuition and international student fees. This trend towards increased reliance on tuition for income is especially pronounced in Nova Scotia, which has the highest tuition rates in the country (Higher Education Strategy Associates, 2021). Thus, the university moved closer to a “consumer” model of education, so it makes sense that a driving force for standardization was student complaints.”

      1. This is a very nitpicky concern that doesn’t fit well elsewhere, so please take it with a grain of salt. I was surprised at the length of the reference list – it seemed quite short for a historical piece! I wonder, again, if more description of the archival material - including why you looked at these sources, in particular, and what was missing from the record – would help explain this and further convince the reader that you have all your bases covered.

      Our Response: In the introduction section, paragraph 4, we describe our sources in more detail including what is likely missing from the record and why we used them. Regarding the length of the reference list, we did add ~12 new references to the list in the course of making various revisions, which partially addresses your concern. Beyond this though, it’s worth noting that some of the sources more extensive than they seem, even though they don’t take up much space in the reference list (e.g., there is one entry for course calendars, but this covers ~100 documents reviewed!). Moreover, there were many dead-ends in the archives that are not cited (e.g., reviewing 10 years of Senate minutes in the 1940s produced little of relevance), so the reference list is curated to only those sources where relevant materials were found.

      Reviewer response to revisions

      The new introduction to the piece addresses many of my previous questions about the authors’ general arguments, the Dalhousie context, and the source material. Thank you for addressing these! Reading this version, it is much clearer that the key argument is that standardized, centralized grading practices were “to improve the external communication value of the grades, rather than for pedagogical reasons” (p. 6). I also really enjoyed the added quotes from students in the Dalhousie Gazette.

      The authors’ response to Reviewer 2 really gave me a better sense of why they wrote this piece and also helped me to more clearly put my finger on what was troubling me in the first round. It still reads a little like a report for an internal audience – which is just fine and, in fact, can be extremely useful for historians of the future. But, as Reviewer 2 notes, this means it does not really seem like a piece of historical scholarship. I do worry that shaping it into this form would take an extensive revision and might not be in the spirit of what the authors intended to do.

      A different version of this article might start with this idea that grades were standardized for external audiences and in response to financial pressures. It would then develop a richer story behind the sudden importance of these external audiences and the nature (i.e. source, type) of financial pressures Dalhousie was facing. It would highlight the impact such changes had on students and their future careers/graduate experiences. It could then connect these trends to other similar changes for external audiences and the increasing interconnectedness of American, Canadian, and British systems through graduate education. It might even turn to sociological theories of organizational change and adaptation and make an argument for when (historically) similar forms of decoupling were likely to occur in the Canadian higher education system. Finally, it might connect these grading changes to current trends – including accusations of grade inflation and accepted best practices for measuring learning outcomes.

      But, it doesn’t seem that the authors necessarily want to do this, which I can understand and respect. I think there is enormous value in a piece of scholarship like this existing – both for internal audiences and for future historians. Indeed, imagine if every university had a detailed history of its grading policies like this available somewhere online! Comparing such practices across institutions would certainly tell us a lot about why grading currently looks the way it does.

      Decision changed

      Verified manuscript: The content is scientifically sound, only minor amendments (if any) are suggested.


      Response to Reviewer 2 (Dr. Morris)

      The authors dove headfirst into Dalhousie’s archives, unpacking the subtle shifts in grading policy. Their work seems to be comparable to archaeologists, digging deep beneath mountains of primary sources to find nuggets of clues into Dalhousie’s grading evolution. I particularly liked when the authors were able to link these changes to student voices, as seen in moments when they referenced student publications.

      Ultimately, I kept coming back to one main comment that I wrote in the margins: “So what?” I would humbly suggest that the authors reflect on why this history matters to them. Granted, they do this in the conclusion, where they touch on Schneider & Hutt’s argument that grades evolved to increasingly be a form of external communication with audiences beyond school communities. Sure. But I want more. I wanted to see a new insight that this microhistory of Dalhousie significant to the history of Canada or the history of education more generally.

      If the authors are so inclined, there might be several approaches to transform this manuscript. I would suggest the following. First, instead of tracing the entire history of grading at the institution, choose one moment of change that you think is the most important. Perhaps in the 1920s and the lack of transparency in grading, or the post-war shift toward American grading. Second, show me – don’t tell me – what Dalhousie was like at this moment. Paint a picture of the institution with details about student demographics, curriculum, educational goals, the broader town, etc. Make the community come alive. Show me what makes Dalhousie unique from other institutions of higher ed. Once you establish that picture, perhaps you could link the change in grading practices to subtle changes at the university community, thereby establishing a before and after snapshot. This will require considerable amounts of work, and the skills of a historian. You will have to find primary and secondary sources that go far beyond what you’ve relied on thus far.

      In the end, I found myself wanting the authors to humanize this manuscript, meaning I wanted them to show me that changes in grading practices have tangible effects on real-life human beings. A humanization of their research would mean going narrower and deeper; or, in other words, eliminating much of what they have documented.

      However, if that is too tall of an order, I would ask that the authors clarify for themselves who this manuscript is for. Is this a chronicling of facts for an internal audience at Dalhousie’s faculty, alumni, and students? Fine. But my guess is that even members of the Dalhousie community want to read something relatable.

      I am suggesting revisions, although not because of objective errors. History is more of an art, in my opinion. With that in mind, I would suggest that the authors paint a more vivid picture (metaphorically) of Dalhousie, showing me how changes one moment of change in grading practices impacted the lives of human beings.

      Our Response: Thank you very much for taking the time to read our paper and provide your thoughts and recommendations. It may be helpful to begin by describing why I (the first author) decided to write this paper. Ultimately, I wrote this paper to satisfy my own personal curiosity and to connect with other people at my own place of employment by exploring our shared history. At present day, Dalhousie has a letter grading scheme with a standardized percentage conversion scheme that all instructors used. I wanted to know why this particular scheme was used, but I quickly realized that nobody at Dalhousie really knew how we ended up grading this way! There was an institutional memory gap, and a puzzle that was irresistible to me. So, I wrote this paper for the most basic of all academic reasons: Pure curiosity. I do very much recognize that the subject matter is very niche, perhaps too niche for a traditional journal outlet. Thus, my publishing plan is to self-publish a manuscript to the Education Resources Information Center (ERIC) database and a preprint server as a way of sharing my work with others who might be interested in what I found. Nonetheless, I believe in the importance and value of peer review, especially since I am writing in a field different than most of my scholarly work. That is why I chose PeerRef as a place to submit, so that I could undergo rigorous peer review to improve the work while still maintaining the niche subject matter and focus that drives my passion and curiosity for the project. Of course, if you feel the whole endeavor is so flawed that it precludes publication anywhere, then we can consider this a “rejection” and I will not make any further edits through PeerRef.<br /> The core of your critique suggested that I should write a fundamentally different paper on different subject matter. While I don’t necessarily disagree that the kind of paper you describe might have broader appeal, it would no longer answer the core research question I wanted an answer to: How has Dalhousie’s grading changed over time? So, I must decline to rewrite the paper to focus on a single timeframe as recommended. All this said, I did try my best to address the spirit of your various concerns to improve the quality of the manuscript. Below, I will outline the various major changes to the manuscript that we made to improve the manuscript along the lines you described, while maintaining our original vision for the structure and focus of the paper. The specific changes are outline below:

      a) Two new paragraphs (now paragraphs 1-2 of the revised manuscript) were added to explain the “so what” part of the question. Specifically, we describe why we think the subject matter might be of interest to others and summarize the general dearth of historical information on grading practices in Canada as a whole.

      b) Consistent with recommendations from the other reviewer, we now state a core argument (i.e., that most major grading changes were implemented to improve the external communication value of the grades) earlier in the introduction in paragraph 5 and describe how various pieces of evidence throughout the manuscript tie back to that core theme.

      c) In an attempt to “humanize” the manuscript more, we added more student quotes from the Dalhousie Gazette throughout the paper so that readers can get a better sense of how students thought about grading practices at various times throughout history. Specifically, three new quotes were added in the following sections: 1901-1936, late 1940s, 1950s-1970s. We also added this short note about the physical location where grades used to be posted: “Naturally, this physical location was dreaded by students, and was colloquially referred to as “The Morgue” (Anonymous Dalhousie Gazette Author, 1937).”

      d) Early in the paper, we describe why we chose Dalhousie and the potential audience of interest: “As employees of Dalhousie, we naturally chose this institution as a case study due to accessibility of records and because it has local, community-level interest. The audience was intended to be members of the Dalhousie community; however, it may also be a useful point of comparison for other institutions, should similar histories be written.”

      e) We have described some of the limitations of our sources in paragraph 4, which may explain why the manuscript takes the form it does – it has conformed to the information that is available!

      f) We have linked events at Dalhousie to the national context in some more detail, by detailing some national events related to the funding of universities in Canada. See our response to Reviewer 1, #4 above for more details on the specific changes.

      g) Consistent with your stylistic recommendations, we have changed various spots throughout the paper from the present tense (e.g., “is”) to the past tense (e.g., “was”), and were careful in our new additions to maintain the past tense, when appropriate. If there are any spots that we missed, let us know the page number / section, and we will make further changes, as necessary.

      h) We retained the first person in our writing – this may be discipline-specific, but in Psychology (the first author’s home discipline), first person is acceptable in academic writing. If you feel strongly about this, we can go through the manuscript and remove all instances of the first person, but we would prefer to keep it, if at all possible.

      Hopefully this helps address the spirit of your concerns, and I look forward to hearing your thoughts in the second round of reviews.

      Decision changed

      Verified with reservations: The content is scientifically sound, but has shortcomings that could be improved by further studies and/or minor revisions.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the reviewer for a very constructive evaluation of our work and for a fair summary of its main strengths. We have addressed her/his main concerns as follows:

      1) The experiments involve an invasive neurosurgical procedure used to perform hippocampal imaging, which removes the ipsilateral overlying somatosensory cortex, and it is not possible to evaluate from the data provided that this surgery does not disrupt network function, especially given the focus on movement-related activity patterns.

      We thank the reviewer for bringing up this important issue. Indeed, our experimental access to early hippocampal activity with 2-photon calcium imaging relies on a quite invasive procedure. However, the many control experiments we have performed indicate that early hippocampal dynamics were not significantly altered by the surgery. First, our extracellular electrophysiological recordings from a sample of 6 mice (ranging from P6 to P11, Figure 1- figure supplement 1C) show that the frequency of early sharp waves (eSW) was slightly but not significantly reduced in the ipsilateral hemisphere compared to the contralateral one. Of note, a similar “non-significant” decrease had been previously reported by another group (Graf et al 2021 Fig S6C). As suggested by the reviewer, we can speculate that this slight decrease may result from a reduction of the sensory feedback re-afference originating from the right limbs. Indeed, we observed that movements of the right limbs (contralateral to the window implant) elicited a slightly smaller response than those from the left limbs. This observation has been added to Figure 1 - Supplement 1E and described in the results (lines 128-134) and discussion (lines 314-320).

      We have performed additional control experiments using EMG nuchal electrodes in two pups aged P5 and P6. We observed that, an hour following the surgery (corresponding to the recovery time in our experimental procedure), the composition of the sleep-wake cycle (with 70 to 80 % of active sleep) was comparable to previous reports (Jouvet-Mounier, 1969, Fig 4). This quantification was added to Figure 1- figure supplement 1B (lines 82-86).

      2) State-dependent parameters are not adequately described, controlled, and examined quantitatively to ensure that data from similar behavioral states is being used for analysis across ages. Network activity from wakefulness, REM/active sleep and NREM/quiet sleep should not be presumed to be indistinguishable.

      We would like to point out that our analysis across ages focused on the population response following animal movements, and not across all behavioral states. That said, it is true that two types of movements can be distinguished, namely the twitches and the complex ones. To take this behavioral heterogeneity into account, we have now separately quantified the hippocampal activation following twitches (movement during active sleep) and complex movement (during wakefulness). We show in Figure 2 - figure supplement 1B that the hippocampal response to twitches and complex movements is similar across ages. Thus, even if the amount of time spent in each behavioral state is modified over the developmental period that we have studied, we are pretty confident that it does not impact the transition we have described in the relationship between animal movements and hippocampal activity. Additionally, we were able to combine in one P5 mouse pup 2p-imaging with nuchal EMG recordings and separately computed the PMTH for movements observed during REM or wakefulness (Figure 2 - figure supplement 1C). We show that CA1 hippocampal neurons were activated time-locked to movement in both behavioral states, with only the amplitude of the population response differing between wakefulness than during REM. This point is now included in the result section (lines 148-152) and discussed (lines 324-327).

      3) Currently employed statistics are not rigorous, unified, or sensitive, and do not support all of the authors' claims. Data shown suggest potentially significant changes that have not been identified due to suboptimal statistical approach and/or underpowering.

      We obviously agree with this reviewer that rigorous statistics should be employed and can certify that the data analyzed in the submitted manuscript was carefully examined following that principle. We feel that his/her strong criticism regarding that point was not fully justified. In particular, we do not understand why statistical tests should be “unified” across different figures of the paper. Rather, statistical tests should be adapted to the sample size and distribution. Of course, the same tests were used for similar datasets. This revised manuscript now contains further description and justification of all the tests included in every figure panels.

      4) The authors use an artificial neural network approach to infer cell classification (pyramidal cell vs. interneuron). From the data provided, it is not possible to adequately evaluate whether these 'inferred' interneurons represent the same population as conventionally labeled interneurons.

      We thank the reviewer for this important remark and apologize for the lack of detailed description of our method to ‘infer’ interneurons. This method was previously published (Denis et al., 2020), and designed to identify interneurons from their calcium fluorescence signals in the absence of a reporter. Most importantly, this cell type classifier was trained and tested on a dataset in which interneurons were labeled using a reporter mouse line (GAD 76-Cre). This dataset is included in this article. This means that all the ‘labelled’ interneurons included here were also used for the training and the test dataset. As for the activity classifier, the training and test data sets covered all the developmental ages used in the study. Thus, the previously published statistics (accuracy/sensitivity) of this classifier should well account for the present analysis. This method is now described in better detail in the results (line 183) and methods parts (lines 616-619). We now also illustrate in the figures how this classifier can infer interneurons with 91% precision (split up of prediction vs ground truth in test data are reported from Denis et al) and that these ‘infered’ interneurons are activated with movement just as genetically ‘labeled’ interneurons (Figure 3 - figure supplement 1B-E).

      5) Functional GABAergic activity is not assessed across development (only at P9-10), limiting mechanistic conclusions that can be drawn.

      We thank the reviewer for this comment that reveals some lack of clarity in the previous description of our experiments. Indeed, functional GABAergic activity was also assessed before P9, however, given that there are no GABAergic axons in the CA1 pyramidal layer at early stages (for both CCK cf. Morozov and Freund 2003, and prospective PV cells cf. Figure 4A,B), there is no signal to be measured either. We have now added a new figure (Figure 4 - figure supplement 1) to clarify this point. In agreement with our Syt2 longitudinal quantification, we show, using tdTomato expression in the Gad67cre driver mouse line, that GABAergic perisomatic innervation is only visible after p9. This matches as well our attempted imaging experiments using axon enriched GCaMP in mice before P9.

      6) The present analyses are almost exclusively focused on movement-related epochs, substantially limiting conclusions that can be drawn as to what neural dynamics are actually occurring during epochs that the authors propose comprise internal representations.

      We agree with this reviewer that our study is focusing on movement-related episodes and that we are not assessing hippocampal representations, especially since the pups are recorded in conditions that minimize external environmental influences. Still, we observe that there is a switch in the distribution of spontaneous activity in CA1 after P9, with most activity occurring outside from the synchronous calcium events and detached from movement. The exact nature of this activity remains to be studied, however, it is most likely not evoked by extrinsic phasic inputs and rather represents local dynamics. We have now removed reference to ‘internal representations” or “internal models” in the two previous instances of use i(abstract and discussion) and replaced them, when possible by “self-referenced” representations alluding to self-generated-movement-triggered activity.

      Reviewer #2 (Public Review):

      The study by Dard et al aims to uncover the post-natal emergence of mature network dynamics in the hippocampus, with a particular focus on how pyramidal cells and interneurons change their response to spontaneous limb movement. Several previous studies have investigated this topic using electrophysiology, but this study is the first to utilize 2-photon calcium imaging, enabling the recording of hundreds of individual neurons, and discrimination between pyramidal cell and interneuron activity. The aims of the study are of broad interest to all neuroscientists studying development (including neurodevelopmental disorders) and the basic science of network dynamics.

      The main conclusions of the study are that (1) in early life, most pyramidal cell activity occurs in bursts synchronized to spontaneous movement, (2) by P12, pyramidal cell activity is largely desynchronized from spontaneous movement, and indeed movement triggers an inhibition in the pyramidal network (approximately 2-4sec following movement), (3) unlike pyramidal cells, interneuron activity remains positively modulated by movement, throughout the period P1-P12, (4) the changes in pyramidal cell activity are achieved by means of increases in perisomatic inhibition, between P8 and P10.

      It should be noted that conclusion (1) and to some extent conclusion (2) have already been reported, by previous studies using electrophysiology (as clearly acknowledged by the authors).

      A principal strength of this manuscript is the extremely high quality of the data that the authors are able to use in support of (1) and (2), with very large numbers of neurons being analyzed to clearly delineate the relationship between neural activity and movement. The finding that pyramidal cells become inhibited following movement is novel, I believe. Furthermore, this study offers the first description of the development of interneuron activity, in this experimental context.

      The main weakness of the manuscript is that the authors cannot provide direct functional evidence for the conclusion (4). As shown by the analysis in support of conclusion (3), interneuron activity with respect to movement does not actually change during the developmental period being studied, making it prima facie unlikely that this is the cause of changes in pyramidal network responses to movement. To overcome this, the study describes the activity of GABA-ergic axon terminals in the pyramidal cell layer at P9-10, but it appears that due to technical problems this was not possible in younger animals. It, therefore, remains unknown if the functional inhibitory inputs to pyramidal cells are changing over the ages studied.

      We thank this reviewer for acknowledging the broad interest of the study, its novelty, and the high quality of our dataset. The main concern raised by this reviewer (lack of axonal activity experiments in younger pups) was in fact a misunderstanding of the experiments performed and we apologize for this lack of clarity. Reviewer #2 is correct in that the relationship between interneuron activity and movement does not change over the developmental period studied. However, we have only included GABAergic axonal imaging after P9, not due to a technical problem but rather because there are no GABAergic axons in the pyramidal layer before (we see GABAergic neurites only outside the layer). We have now dedicated a new supplementary figure (Figure 4 - figure supplement 1) to explain why we could not image GABAergic axons in the pyramidal cell layer at earlier developmental stages.

      The study does describe increases in the protein synaptotagmin-2, in the pyramidal cell layer, between P3 and P11, but in my opinion, this molecular evidence for increases in perisomatic inhibition does not match the (very high) standards of neuronal function/activity reported elsewhere in the manuscript.

      In the absence of parvalbumin expression in early development, synaptotagmin-2 has been described as the best marker of prospective PV boutons in the cortex (Someijer et al. 2012). This molecular marker has been used in other studies (Modol et al. Neuron 2020, Sigal et al. PNAS 2019). We respectfully disagree with this reviewer, and think that quantification from immunohistochemistry experiments is as high of a standard as functional imaging as it is the only way to describe the anatomical structure of active neuronal processes.

      Reviewer #3 (Public Review):

      Dard and colleagues use both in vivo calcium imaging and computational modelling to explore the relationship between the early movement of CA1 hippocampal activity in neonatal mice.

      The manuscript represents a significant technical advance in that the authors have pioneered the use of multiphoton imaging to record activity in the hippocampus of awake neonates. Overall the presentation of the data is convincing although I would recommend a number of tweaks to the figures and the inclusion of some raw data to better direct and inform non-expert readers. I also believe that the assessment of long-range inputs using pseudo-rabies virus should be present in the main body of the manuscript as opposed to supplemental material. The computational modeling supports their idea but does not exclude other possibilities. Further, it is not clear to what extent the strengthening of local excitatory input onto the interneurons - the dominant route of recurrent input in the hippocampus, is important; something that the authors acknowledge in the discussion.

      Overall, I believe the paper adds to our knowledge of the timeline of development and further identified the postnatal day (P)9-P10 window as important in emergent cortical processing. The fact that this is linked to an increase in GABAergic innervation has implications for our understanding of both normal and dysfunctional brain development.

      We thank the reviewer for his constructive comments and helpful suggestions. As suggested, this revised version now includes some raw-data and better descriptions to guide non-expert readers. Regarding the inclusion of rabies-tracing experiments in the main part of the MS, we would like to state here that there are still a number of limitations with the use of this method during development (incubation time, spatial precision of the injection site, etc. ) that limit the interpretation and quantification of the results. As a result, we have decided to remain only qualitative, focusing on identifying the brain regions that could send projections onto CA1 pyramidal cells and interneurons. We believe that this type of description is more suited for a supplementary figure than a principal figure, but will be happy to change this, if the reviewer and editors think otherwise.

    1. Some students do as well in online courses as in in-person courses, some may actually do better, but, on average, students do worse in the online setting, and this is particularly true for students with weaker academic backgrounds.

      I think this statement is important because it shows that the argument is not as simple as, "Online courses are bad and in-person classes are good". It shows that, while plenty of students do just fine learning online, the online courses themselves lack a lot of the edge that an in-person course can give a student. This is an important observation because we can use this research to optimize the way we learn online moving forward!

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01219

      Corresponding author(s): Rajan, Akhila

      1) General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      The goal of this study is to:

      • Define how prolonged exposure to a high-sugar diet (HSD) regime alters both the lipid landscape and feeding behavior.
      • Determine how changes in lipid classes within the adipose tissue regulates feeding behavior. Key findings:

      In this study, by taking an unbiased systems level and genetic approach, we reveal that phospholipid status of the fat tissue controls global satiety sensing.

      Impact of Key findings:

      By uncovering a critical role for adipose tissue phospholipid balance as a key regulator of organismal feeding, our work raises the possibility that the rate-limiting enzymes in phospholipid synthesis, including Pect, are potential targets for therapeutic interventions for obesity and feeding disorders.

      Peer review comments:

      This study has immensely benefited from the thoughtful peer-review of three reviewers. As per their recommendations, we have performed a major revision by performing additional experiments (see summary table below in next section) and strived to address the major concerns raised. Based on our reading, there were two major concerns that overlapped between all three reviewers raised. They are as follows:

      • Does the genetic disruption of Pect in fly fat body alter phospholipid levels? Two reviewers (#2 and #3) recommended that we perform lipidomic analyses on adult flies with adipose tissue specific knockdown of For the revised version, we have completed this lipidomic experiment, and present results as a new main Figure 6, Supplemental S7 and S9.
      • Is the dampened HSD induced hunger-driven feeding (HDF) behavior because of increased baseline feeding (#1 and #3)? In addition, reviewer #1, asked us whether HSD flies experience an energy-deficit? In other words, we were asked to uncouple whether what we observed was HSD-driven allostasis or indeed, as we had interpreted, that HSD dampened hunger-driven feeding response.

      Hence, they recommended that we:

      1. Re-analyze our hunger-driven feeding datasets and present non-normalized data (also requested by Reviewer #3) and show baseline feeding behavior on HSD. To address this, we have completed this analysis and present our results in Figure 1B-D and S1.
      2. Determine whether the HSD fed flies display an energy deficit on starvation. To this end, we performed an assayed starvation-induced fat mobilization on HSD, results for this are now presented on Figure 1E-G and S2. Conclusions after the revision:

      First, it is important to note here that the additional experiments have not caused a significant revision of the major conclusions of the original version of our study. In fact, we hope that the revised version provides clarity and further substantiation to our original arguments.

      • The lipidomics experiments on Pect fat-specific knock-down flies show that reducing Pect in fat-body causes a significant reduction in certain PE lipid species (PE 36.2 specifically- Figure 6B). This is consistent with a prior report on lipidomics of the Pect null allele by Tom Clandinin’s group (PMID: 30737130). Furthermore, we note that when Pect is knocked down in the fat body, there is a significant increase in two other classes of phospholipids LPC and LPE (Figure 6A). Together, this suggests that an imbalance in phospholipid composition in the absence of Pect activity in fat.
      • The starvation-induced fat mobilization experiments show that despite being fed a prolonged HSD, adult flies sense starvation and effectively mobilize fat stores, at a level comparable to Normal food (NF) fed adult flies, suggesting that even despite HSD exposure, adult flies experience an energy deficit on starvation.
      • In our non-normalized data, we find that the baseline feeding events are not significantly altered between HSD and NF-fed flies (Figure 1D). This suggests that the effects we observe are not due to an increase in the “denominator”, but a dampening of hunger-driven feeding on HSD. With regard to our original version, all three peer-reviewers found that the study was interesting, significant, important, and novel – Reviewer #1: “The work is potentially novel and interesting”; #2 : “I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The conclusions are mostly convincing”; #3: “This manuscript demonstrates how fat body Pect levels affect HSD induced changes in hunger-driven feeding response. I agree with all the reviewers points; potentially very interesting”. But had requested that we provide further substantiation and clarification.

      We sincerely hope that the peer-reviewers find that our revised version with additional new experimental datasets, improved data visualization, and the presentation of non-normalized raw data points, makes this study clear, compelling, and well-substantiated.

      • Point-by-point description of the revisions This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Below we summarize in Part A, the key experiments that were performed to address the major concerns. In Part B, we provide a point-point response to each reviewer with embedded datasets.

      Part a:

      We performed several new experiments, including:

      • To address the primary concern of Reviewer #1 regarding whether the HSD flies have a similar energy deficit to Normal food (NF) fed flies, we performed analysis of stored neutral fat Triacylglycerol (TAG) reserves and how HSD fed flies mobilized fat stores on starvation. We present these results in Figure 1E-G, S2. These results show that HSD-flies despite accumulating more TAG (S2), breakdown a similar amount of fat reserves as NF-fed flies on starvation at any time-point (Figure 1E-G). This suggests that HSD-fed flies do sense and respond to energy deficit.
      • To address concerns of reviewer #2 and #3 on whether Pect genetic manipulation affects specific phospholipid classes, we performed lipidomic analyses. The table below summarizes the new 3 new figures and 4 supplemental figures (blue text are all new figure numbers and figure panels) and three new Supplementary files as per reviewer’s request.

      Figure #

      Main point

      New datasets in revision

      Companion Supplement

      1

      HSD alters feeding behavior, but flies still breakdown TAG on starvation.

      TAG storage and breakdown over longitudinal HSD shows that HSD and NF fed flies show similar levels of TAG breakdown on starvation, despite consistently elevated TAG on HSD. This supports the idea that flies do sense starvation even on HSD, but there is a uncoupling of the feeding behavior after Day 14. Revised the data representation of Figure 1 to show non-normalized data over time. S1 and S2 companions are new in the revision. Panels 1D to 1E are new for the revision.

      S1- Raw data of feeding events plotted.

      S2 Elevated TAG at all time points.

      2

      HSD causes insulin resistance

      S3A added to show that insulin transcript levels remain the same in response to reviewer #3’s concerns.

      S3

      3

      Phospholipid concentration raw data from lipidomic on Day 7 and Day 14 HSD suggest that PC, PE levels are increased on Day 14 HSD.

      Figure 3 revamped to show new data visualization and non-normalized raw data to address Reviewer #2’s major concerns. S4A and S4B added. In addition Supplementary File 1 and 2 provided with raw lipidomics data as per reviewer #2’s request.

      S4.

      S4A- non normalized raw data of all other lipid classes on HSD.

      S4B- fatty acid species data on Day 14 added as per request of rev.#2.

      4

      HSD regulate Apo-I levels in the IPCs and phenocopies Pect KD.

      Added Figure 4A to show that HSD phenocopies Pect-KD in terms of delivery to brain

      S5 showing the validation of the Apo-I antibody.

      S6 validation of Pect KD and over-expression and Pect mRNA levels dysregulation on HSD.

      5

      Pect RNAi is insulin resistant

      N/A

      N/A

      6

      Pect knockdown shows significant increase in LPC and LPE, and a non-significant reduction in PC, PE levels. Specifically, the PE lipid class PE36.2 is downregulated.

      Fig 6, S7, S9 are completely new based on reviewer #2 and #3 requests. In addition Supplementary File 3 provided with raw lipidomics data as per reviewer #2’s request

      S7, S8, S9#.

      S7- new Pect KD other classes

      S8- new PE classes for day 14 and Pect associated classes.

      S9- Pect OE lipidomics

      7

      Pisd and Pect activity in adipocytes are required for hunger-driven feeding behavior in normal diets

      Pisd RNAi data was moved from supplement to main figure.

      N/A

      Note on revised text: We have revised text not only in the results section, but also as per reviewer #2’s recommendation, we have revamped our introduction and discussion as well. Since the manuscript has been significantly revised to include a main figure 6, fully altered Figure 1 and 3, multiple new supplemental figures, the changes in text are extensive. Hence, they are unmarked in the main text. Nonetheless, we hope that the reviewers will be able to evaluate these changes, as we have provided the specific locations in text and embed key figures in the point-point response below.

      __Part B: __Point-Point responses to reviewer comments.

      Reviewer #1 comments in Blue, author response in black.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Kelly et al. show that the difference between the feeding behavior of fed and starved flies (hunger-driven feeding; HDF) is absent in animals fed a high-sugar diet (HSD) for two weeks or more. The disappearance of HDF with HSD coincides with changes in phospholipid profiles caused by HSD. Furthermore, RNAi-mediated downregulation of Pect in the fat body-a key enzyme in the PE biosynthesis pathway-phenocopies physiological effects of HSD. Moreover, downregulation or overexpression in the fat body abolishes or induces HDF, respectively, abolishes or induces HDF, respectively, independent of HSD treatment.

      Overall, the manuscript is well-written and the phenotypes are clear. However, I have major concerns regarding the authors' interpretation of the data and their conclusion. Most importantly, while it is clear that the authors' high-sugar dietary treatment affects feeding behavior and physiology, I am not convinced that the changes can be considered "hunger-driven"-which is central to the main point of the manuscript. Therefore, it is my recommendation that the authors substantially revise the manuscript by either showing additional/re-analyzed data that rule out alternative hypotheses, or rewriting the manuscript keeping alternative interpretations in mind.

      We are thankful to this reviewer for their thoughtful critique, and constructive and specific suggestions on how we can redress these concerns. We have taken on board the concerns of this reviewer regarding our interpretation of whether the changes in feeding behavior can be considered hunger-driven or not. Based on their advice, we have made significant changes by addressing: i) does HSD increased baseline feeding- we now show non-normalized raw data and data supports conclusion that baseline feeding is not higher; ii) whether HSD- fed flies can sense an energy deficit at levels similar to NF fed flies- we show that HSD flies sense energy deficit. We have provided detailed response below, and we hope the reviewer finds the additional datasets and re-analyzed data are consistent with the interpretation that prolonged HSD dampens starvation induced feeding. In addition to this key concern this reviewer has made a many other salient points that we have addressed with additional data or by clarifying the text.

      Major comments: 1) The data do not sufficiently show that the long-term HSD regime disrupts "hunger-sensing." The manuscript should address alternative hypotheses by showing raw instead of normalized data, rewriting the manuscript with a new central conclusion, or running additional experiments that actually show a defect in hunger-driven response. a. The main results that the authors rely on for the argument is that the ratio of feeding events that the starved and non-starved flies eat is different between the groups fed normal or HSD. However, because the authors only show normalized data (normalized to non-starved flies; Fig. 1), it is difficult to tell whether the change is due to a chronically increased feeding in non-starved HSD flies-maybe in perpetual hunger-like allostasis-or dampened starvation response. Indeed, the data shown in Fig S1 show that flies fed HSD for as short as 5 days show more frequent feeding events compared to age-matched controls fed normal food. It is possible that because the HSD-fed flies eat more than NF-fed flies, even without being starved, the ratio of starved/non-starved feeding is lower in the HSD-fed group-due to changes in the denominator, rather than the numerator.

      We have taken onboard this concern regarding presenting only normalized data, and that clouded the interpretation and left open other possibilities. In the completely revised figure 1 and S1. We now show non-normalized data, as a function of time. First we note that HSD-fed flies, do not show higher baseline feeding that NF fed flies, except on Day 10 of HSD, when there is a modest but significant elevation (Figure 1D).

      Nonetheless, on Day 10 HSD, flies still display increased hunger-driven feeding HDF (Figure 1C), it is only after Day 14 HSD that HSD dampens the starvation induced feeding.

      1. It is also possible that the HSD-fed flies are simply not in as big an energy deficit physiologically, due to the increased fat deposits they've accumulated (as the authors show later in the manuscript). It may take longer for the fat HSD flies to reach substantial energy deficiency than the NF flies, but they still may eventually be able to appropriately respond to hunger, just like NF flies. In such case, it would be a misnomer to call this behavioral change a 'defect in hunger-driven feeding behavior.' Maybe an experiment with a dose-response curve of "hunger driven feeding response" as a function of duration of starvation would help? Prompted by this reviewers question, we asked whether HSD fed flies, that have a higher baseline neutral fat store (Triacylglycerol-TAG) level, and if HSD-fed flies can sense energy deficit. For this, we revisited the longitudinal assays for neutral fat triacylglycerol (TAG) storage that our lab had generated, along with the HSD-HDF studies. We now present this evidence as Figure 1E-1G and Figure S2. Overall, our experiments point to the idea that adult flies fed HSD, are able to sense and mobilize TAG stores effectively throughout the 28-day time point that we analysed.

      First as shown in Figure S2, flies fed HSD display an increase in TAG levels. But it is to be noted that while TAG stores increase, the increase is not linear with time. This suggests that adult flies exposed to HSD store excess energy as TAG, but the increased TAG stores stay within a certain range despite the length of HSD exposure. This suggests that adult flies on HSD still display TAG homeostasis.

      Next, to directly address the reviewers point about HSD fed flies not sensing an energy deficit, we subject HSD-fed flies to an overnight starvation, same regime as used in the overnight feeding experiments, and asked whether they mobilize TAG. We noted that flies exposed to HSD breakdown TAG throughout the 28-day exposure at statistically significant levels for Day 3- Day 28, except on 14 and 21 days (Figure 1F). While there is TAG mobilization on Day 14 and 21, the difference is not statistically significant. Nonetheless, we note the same levels TAG breakdown for normal lab food (NF) fed flies on Day 14 and 21 (Figure 1E). Overall, HSD fed flies sense and display energy deficit, as measured by TAG store mobilization, throughout the 28 days of HSD exposure, at levels comparable to NF-fed flies (Figure 1G).

      Taken together, these results suggest that while HSD-fed flies experience an energy deficit on starvation, at levels comparable to NF-fed flies, throughout the 28-day time point assayed. But, their starvation driven feeding-response is dampened by Day 14 and by Day 28, the HSD-fed flies display more feeding events than HSD starved flies. These results are consistent with the interpretation that in HSD-fed flies the starvation-induced feeding behavior becomes desynchronized from the starvation induced TAG-mobilization, suggesting that there is an absence of hunger-driven feeding.

      2) How can you be sure that lower Dilp5 immunofluorescence is indicative of increased Dilp5 secretion? Wouldn't decreased production of dilp5 also have the same results?

      It has been shown previously in HSD fed larvae are hyperinsulinemic, i.e., they have 55% increase in circulating Dilp2 ( PMID: 22567167). Additionally, we have shown that ectopic activation of the insulin-producing neurons by expressing TRPA1, an ion channel that activates neurons, reduces Dilp5 accumulation without a change in Dilp5 mRNA levels (PMID: 32976758), suggesting that reduced Dilp5 accumulation, without alterations to mRNA levels is a proxy for increased secretion. Now, in response to this concern, in the revised manuscript, we have added qPCR data of Dilp2 and 5 (Figure S3A), which show no difference in expression levels after 14 days on HSD. Therefore, there is no dip in Dilp5 mRNA production. Given that Dilp2 and Dilp5 mRNA levels remain the same, but we see reduced Dilp5 accumulation, we interpret this to mean that Dilp5 secretion is increased.

      1. Also, the authors should state in the main text that it is Dilp5, not just any Dilp. Thanks for this suggestion and we have fixed this and referred to Dilp5 specifically throughout the text in the results section.

      3) Data presentation: a. Sometimes the data are normalized to NF (Fig 4B-C), sometimes not (ex. Fig 4A, S4C). Unless there is a specific rationale for the data transformation, it would be more appropriate to show untransformed data (ex. Fig 4A, S4C), especially as the authors use two-way ANOVA to determine significance. Only showing the differences implies comparison against a hypothetical mean (i.e. μ0=0), not between two group means.

      We thank the reviewers for bringing this issue to our attention. We updated all the figures to show untransformed data in the revised manuscript.

      1. Some figures show both individual data points and summary statistics (mean, SD, ... ex. Fig 2A)-which I believe is ideal-but some show only one or the other (ex. Fig 2B, no summary statistics; Fig. 3, no data points. The manuscript would read more convincing if data visualization is consistent across figures. We thank the reviewers for their feedback. We have made changes to all the figures in the revised manuscript to improve visual consistency.

      Minor comments: 1) High sugar diet: what is the actual sugar concentration in the NF v. HSD diets? The authors write that the HSD diet contains "30% more sugar" than the NF, but providing the final sugar concentrations-sucrose or others-would be informative for other scientists studying the effect of high sugar diets.

      We thank the reviewer for their suggestion and now we have updated the methods to include this sentence. After 7 days, flies were either maintained on normal diet or moved to a high sugar diet (HSD), composed of the same composition as normal diet but with an additional 300g of sucrose per liter”.

      1. Additionally, the definition of HSD is inconsistent. Main text (Page 5, line 17) states that their HSD is "60% more sugar than normal media," whereas the figure legend (Fig 1) and the Methods state that the HSD contains "30% more sugar." We apologize for this egregious typo in the figure legend! We have now fixed this to say 30% HSD. Only 30% HSD was used throughout this study.

      2) Starvation medium: please provide justification for why the authors used 1% sucrose/agar for starvation medium, instead of plain agar/water that most labs use. At least clarify and provide a reference for the claim that the 1% sucrose/agar "is a minimal food media to elicit a starvation response."

      We are very grateful for this reviewer identifying this this methods description error and bring it to our attention. We used 0% sucrose agar for overnight starvation in this study as most labs do. The error occurred because we were using another manuscript from the lab to help draft the methods section (PMID: 29017032). In that study, where we assayed the effect of chronic starvation our lab used: “1% sucrose agar for 5 days at 25C”. However, in this current study, because we are testing acute effects of overnight starvation, we are using 0% sucrose agar.

      3) Pect mRNA level is higher with HSD. This is surprising because not only, as authors mention, is increased PC32.2 with HSD suggests lower Pect activity, but also because Pect RNAi phenocopies long-term HSD in HDF behavior, lipid morphology, FOXO accumulation in fat body. The authors speculate that the data "likely shown an upregulation in an attempt to mediate the Pect dysregulation occurring at the protein level." If that were true, a western blot may be informative. Zhao and Wang (2020, PLoS Genetics) generated a Pect antibody that seems compatible with western blot applications. That being said, I don't think such data is critical for the manuscript. I mention this simply as a suggestion for the authors. a. page 8, line 22-23, did you mean to write "Given how PC32.2 is elevated after 14 days of exposure to HSD, we assumed that Pect levels would be low for flies under HSD," not "high?" Otherwise the subsequent 2 sentences don't make sense.

      We agree that the most confusing aspect of the study was that Pect mRNA levels being very high on Day 14 HSD, but nonetheless the effects of Pect-KD phenocopied HSD. To resolve this, we have now performed lipidomic analyses on whole adult flies, when Pect is knocked-down (KD) by RNAi in the fat tissue. We now present a new dataset in Figure 6. Two striking changes occur. They are:

      1. Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3).
      2. Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding increase in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). In contrast, PE 36.2 trends upwards on 14 day HSD (Figure S7C) though not significantly. On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      We agree that a western blot would be informative as well, but we were unable to obtain the reagent from Dr. Wang’s group, precluding us from performing this request. See email snapshot.

      To ensure that we appropriately discuss and clarify this issue, we have now included a section in the discussion - Page 14 Lines 26-34- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9) , but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      Reviewer #1 (Significance (Required)):

      The work is potentially novel and interesting, but at this stage it's difficult to interpret what the phenotype signifies. Although the manuscript could be revised simply by modifying the text, experimentally addressing the concerns would significantly improve the work.

      In sum, we hope we have addressed the key concern for Reviewer #1 as to whether the behavior we report here is indeed a dampening of starvation-induced feeding, or an effect of increase in baseline feeding. We hope that by reviewing our non-normalized data, they can appreciate that it is the former. Also, we hope that Reviewer #1 appreciates that we have strived to address the concerns by additional experiments, to clarify our findings and improve the impact of the work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This intriguing manuscript by Kelly and colleagues uses the fruit fly Drosophila melanogaster as a model to understand how diet-induced obesity alters the feeding response over time. In particular, the authors findings indicate that chronic exposure to a high-sugar diet significantly alters the starvation-induced feeding response. These behavioral studies are complemented by a lipidomics approach that reveals how a chronic high sugar affects many lipid species, including phospholipids. The authors then pursue mechanistic studies that indicate phospholipid metabolism within the fat body appears to remotely affect insulin secretion from the insulin producing cells. Moreover, the changes in phospholipid abundance are associated with changes in insulin-signaling, including increased insulin secretion from the IPCs and elevated levels of FOXO within the nucleus.

      I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The conclusions are mostly convincing, but a few follow-up experiments are required:

      We are grateful for the reviewers constructive, detail-oriented, and balanced feedback, and their recognition of the value of this study. Now, we have performed additional experiments to address the key concerns raised by all reviewers. We hope that on reading the revised version of our study, that the reviewer continues to feel positive about the message of this study and its potential impact.

      1. The key conclusions from the manuscript assume that manipulation of Pect expression levels alters phosphatidylethanolamine (PE) levels. However, the authors make no attempt to verify that the genetic experiments described herein actually affect PE levels. At a minimum, changes in PE levels should be verified for the Pect knockdown and overexpression lines. Similarly, there is no evidence that manipulation of either EAS or Pcyt2 induces the expected metabolic effects. I'm not asking that the longitudinal feeding experiments be repeated, simply that the authors measure the relevant lipid species, preferably with a targeted LC-MS approach.

      Prompted by this reviewer, we performed targeted LC-MS on whole adult flies, on normal diet, to assess lipid levels for fat-specific Pect-KD and overexpression. We decided to focus on Pect, as its knock-down even on normal diet causes a dampened hunger-driven feeding behavior (Figure 7A) and phenocopied a 14-day HSD feeding phenotype.

      We now present a new dataset in Figure 6. Two striking changes occur:

      They are:

      Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding decrease in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). It is to be noted that though overall levels of all PE species trend downwards, like the Clandinin lab study on Pect (PMID: 30737130), we did not find a significant change in the overall PC and PE levels.

      • Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3). On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      Finally, fat-specific Pect-OE did not cause significant changes to lipid species (Figure S9). This could either be due to the fact that in fat-specific Pect-OE flies under normal food and that we were assaying whole body lipid levels and not fat-specific lipid changes. But to counter that, even a 60% reduction in Pect mRNA levels (Figure S6A), was sufficient to produce an effect on whole body phospholipid balance (Figure 6). Hence, we speculate that by maintaining a basally higher (7-fold higher Pect mRNA level Figure S6A), might allow 14-day HSD-fed flies to buffer the negative effects of HSD and we predict that it might take longer to disrupt the phospholipid balance and HDF response.

      We have now included a section in the discussion - Page 14 Lines 26-34- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9), but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      A central hypothesis in the study is that the HSD over a period of 14 days results in insulin resistant and that these changes are leading to changes in hunger dependent feeding. I would encourage the authors to determine if Foxo mutants are resistant to these HSD-induced effects on HFD.

      We thank the reviewers for this suggestion. However, given that dFOXO nuclear localization rather than expression levels regulate insulin sensitivity, we feel that disrupting dFOXO levels via mutation or knockdown will produce a plethora of indirect effects including developmental abnormalities (PMID: 24778227, PMID: 16179433, PMID: 29180716, PMID: 12893776). Our data suggest that chronic HSD treatment and Pect affect insulin sensitivity in fat tissue. However, we feel that investigating whether insulin sensitivity/FOXO signaling in fat tissue regulates feeding behavior is outside the scope of our work.

      1. In lines 25-30, the authors draw the conclusion that an increase in unsaturated fatty acid species is associated with the HSD and that these changes results in a more fluid lipid environment. While I agree with the model, the manuscript contains no evidence to support such a model. Either test the hypothesis or move the last line of the section to the discussion.

      We thank the reviewer for this important and insightful comment. We agree that the data we presented and discussed in the original version is at the moment speculative. Addressing the hypothesis that increase in unsaturated fatty acid species result in a more fluid lipid environment will require us to build tools and expertise. Hence, this hypothesis is better suited for exploration in a future study. Given this, we have moved this out of the results section into the Discussion section titled “HSD and fat-specific PECT-KD causes changes to phospholipid profile” (See excerpt below from page 13, lines 24-35).

      In addition to changes in phospholipid classes, we found that HSD caused an increase in the concentration of PE and PC species with double bonds (Figure S4C and S4D). Double bonds create kinks in the lipid bilayer, leading to increased lipid membrane fluidity which impacts vesicle budding, endocytosis, and molecular transport14,92. Hence it is possible that a mechanism by which HSD induces changes to signaling is by altering the membrane biophysical properties, such as by increased fluidity, which would have a significant impact on numerous biological processes including synaptic firing and inter-organ vesicle transport.”

      Also, as per the reviewer’s guidance, given that we are speculating here, we have also shifted this dataset from Main figure 4 to supplement S4C and S4D.

      In addition, lines 25-30 state that FFAs are increased after 14 days of a HSD. Figure 3A shows the exact opposite - FFAs are significantly decreased in 14 day fed animals despite being elevated in the 7 day fed animals. This is an interesting result that warrants discussion. Moreover, I would encourage to examine the lipidomic data more carefully to ensure that the text accurately portrays the lipid profiles.

      We apologize for misstating that FFAs are decreased on 14-day HSD in the lines 25-30. It was an error and we have corrected this. We agree with the reviewer that the reduction of FFA on Day 14-HSD is an intriguing and unexpected observation that needs to be emphasized and further discussed. To this end, we have added figure S4B, wherein we have provided the difference in FFA concentration (by species) after days 7 and 14.

      Furthermore, we have discussed what the potential meaning of reduced FFA at Day 14 implies in page 12, lines 19-27 of the Discussion section titled “HSD and fat-specific PECT-KD causes changes to phospholipid profile”. We have stated the following-

      We speculate that this reduction in FFA maybe due to their involvement in TAG biogenesis (PMID: 13843753). We were interested to see if the decrease in FFA correlated to a particular lipid species, as PE and PC are made from DAGs with specific fatty acid chains. However, further analysis of FFAs at the species level did not reveal any distinct patterns. The majority of FFA chains decreased in HSD, including 12.0, 16.0, 16.1, 18.0, 18.1, and 18.2 (Figure S4B). This data was more suggestive of a global decrease in FFA, likely being converted to TAG and DAG, rather than a specific fatty acid chain being depleted.”

      The processed lipidomics data should also be included as supplementary data table so that they can be independently analyzed by the reader.

      We thank the reviewer for this suggestion. As per the reviewers request, we have included the raw data as an attachment in our supplementary material (Supplementary Files 1-3.), so that interested readers can use the datasets generated in this study for future work and further analysis.

      Beyond these experimental suggestions, the manuscript needs significant editing for clarity. While I won't provide a comprehensive list, the authors need to provide accurate descriptions and annotation of genotypes (including w[1118], which is written as W1118), typos, and formatting. I've listed a few examples below:

      1. Page 3, Line 1 and 2: "...have been shown to impact feeding behavior and metabolism that leads to..." This is an awkward and grammatically incorrect sentence.
      2. Page 3, Lines 7-32 is one very large paragraph but contains concepts that should be broken down over at least three paragraphs.
      3. Page 3, Line 25: A description of the reaction catalyzed by Pect would be helpful for a manuscript focused on Pecte activity.
      4. Page 4, Line 10: "previously characterized method of eliciting diet induced feeding behavior." As stated in the text, the method is previously described yet the manuscript characterizing the method isn't cited.
      5. Figure legend 3 contains a random assortment of capitalized lipid species. Also, the names of lipid species are inappropriately broken into multiple names. Please use correct nomenclature throughout the manuscript.

      The list above is nowhere near comprehensive. The manuscript requires significant editing.

      We are grateful to the reviewer for drawing our attention to these errors. We have made significant edits to the revised manuscript to address the above-mentioned concerns, as well as made additional textual changes throughout and copyedited it. We hope that the reviewer will find the manuscript reads better and the clarity and preciseness is significantly improved.

      Reviewer #2 (Significance (Required)):

      I find the study to be potentially very important - the authors combine a longitudinal study that would be difficult in any other model with the powerful genetic tools available in the fly. The findings will significantly advance our understanding of how lipid metabolism links dietary nutrition with feeding behavior.

      Once again, we are grateful for this reviewer’s thoughtful critique and encouraging words regarding our work and its potential impact.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript uses Drosophila to investigate how diet-induced obesity and the changes in the lipid metabolism of the fat boy modulate hunger-driven feeding (HDF) response. The authors first demonstrate that chronic exposure (14 days) of high sugar diet (HSD) suppresses HDF response. Through lipidome analysis, the authors identify a specific class of lipids to be elevated upon chronic HSD feeding. This coincided with the changes in expression of Pect, an enzyme that regulates the biosynthesis of these lipids. Modulating the expression of Pect specifically in the fat body affected HDF response.

      We thank this reviewer for their rigorous and thoughtful critique and for identifying a key issue with our original study pertaining to a gap in how Pect mRNA levels on 14-day HSD are elevated but the Pect-KD phenocopies the HDF. Now by performing whole-body adult fly lipidomic on fat-specific Pect-KD we have resolved this issue and provided clarity on role of Pect in maintaining phospholipid homeostasis and thus subsequently impacts hunger-driven feeding. We hope the reviewer finds that the revised manuscript provides further clarity to the functional link between Pect’s role in fat-body and hunger-driven feeding.

      Major comments: The author claim that the HDF response in HSD is distinct between early (5d, 7d) and chronic (day 14) HSD feeding. However, the data seem to indicate that HDF response is significantly decreased at all time points in HSD. For example, at day 5 HDF response was increased only 3-fold in HSD (Figure 1C) compared to around 50-fold increase in NF (Figure 1B). The scale of the Y-axis in Figure 1B and 1C is an order of magnitude different. Including the starved data (NFstv and HSDstv) in Figure S1, normalized to NF fed group, would better visualize the overall trends. Related to this, having the source data for the actual number of feeding events would be useful (e.g., to see the baseline changes in feeding in different time points in Figure 1 and the effect of genetic manipulations in Figure 7).

      As per the reviewers request, we now have modified our graphs to show source data (Figure S1) and show the raw feeding events.

      Then in the non-normalized graphs we plot, over a longitudinal time course, baseline and hunger-driven feeding events (Figure 1B-D). We also show that HSD fed flies do not display increased baseline feeding (Figure 1D) suggesting that the effect we see on HDF are no clouded by increased baseline feeding.

      Yes, the reviewer makes an important point that HDF response on HSD fed flies is of a lower magnitude than NF fed flies. We think that is a biologically meaningful observation, as it suggests that flies have a remarkably fine-tuned ability to coordinate food-intake with nutrient store levels.

      ­­Now we have included a paragraph in the Discussion, Page 11 Lines 23-27, that say the following to ensure the readers appreciate this salient point raised by this reviewer.

      *It is to be noted that the HDF response of HSD-fed flies (Figure 1C, Days 3-10) is of lower order of magnitude than the NF-fed flies. This suggests that that in addition to sensing an energy deficit and mobilizing fat stores (Figure 1F, 1G, S1), HSD fed flies calibrate their starvation-induced feeding to compensate only for the lost amount of fat. Overall, this suggests that flies have a remarkably fine-tuned ability to coordinate food-intake with nutrient store levels. *

      The association between fat body Pect level and phospholipid levels is not clear. Day 14 of HSD feeding shows high expression of Pect in the fat body and elevated levels of PC32.0 and PC32.2. The authors assume the high expression of Pect in the fat body is due to the compensatory response, but there are no data indicating downregulation of Pect levels at the earlier time points of HSD feeding. A previous study demonstrated that Pect mutant flies have lower levels of PC32.0 but higher PC32.2 (PMID: 30737130).

      We agree that one puzzling aspect of the original version of this study was that Pect mRNA levels being very high on Day 14 HSD, but nonetheless the effects of Pect-KD phenocopied HSD. To resolve this, prompted by Reviewer #2 and #3 concerns, for this revised version we have now performed lipidomic analyses on whole adult flies, when Pect is knocked down (KD) by RNAi in the fat tissue. We now present a new dataset in Figure 6. Two striking changes occu. They are:

      1. Pect-KD shows increase in the phospholipid classes LPC and LPE (Figure 6A). In contrast, LPE is significantly downregulated on HSD Day 14 (Figure 3).
      2. Pect-KD shows a significant reduction in specific class of PE 36.2 (Figure 6B). Our data regarding increase in PE 36.2 agree with a previous lipidomic analyses of Pect mutant retina (PMID: 30737130). In contrast, PE 36.2 trends upwards on 14 day HSD (Figure S7C) though not significantly. On 14-day HSD consistent with extreme upregulation of Pect mRNA fed flies (Figure S6A; Pect mRNA 200-250 fold), PE trends upwards on 14-day HSD (Figure 3) and PE 36.2 trends higher (Figure S7C). We note that on the surface of it PE and LPE per se are contrasting between 14-day HSD lipidome and fat-specifc Pect-KD. But there is a significant commonality that under both states there is an imbalance of phospholipids classes PE and LPE. Hence, we propose that maintaining the compositional balance of phospholipid classes PE and LPE is critical to hunger-driven feeding and insulin sensitivity. Hence, either increase or decrease, of these key phospholipid species, may lead to abnormal hunger-driven feeding.

      On day 14, HDF response was increased 70-fold in w1118 flies in NF (Figure 1B; w1118), but only 2.5-fold in lpp>LucRNAi control flies in NF (Figure 7A). This suggests that lpp-gal4 driver lines have a significant effect on HDF response. Using a different fat-body specific Gal4 line would be necessary to validate conclusions.

      Regards reduced HDF magnitude, in our experience using UAS-Gal4 reduces HDF response magnitude consistently and cannot be compared to w1118 which is more robust. To account for background differences, we use Uas-Gal4 with control RNAi. It clearly shows differences in HDF response on starvation, but Pect and Pisd RNAi does not (Figure 7A). Hence, given that this experiment internally controls for any changes in HDF response for UAS-Gal4>RNAi, we conclude that HDF response in disrupted in Pect and PISD KD (Figure 7).

      We only presented the Lpp-driver in our study, as this driver is the only fat-specific driver that has no leaky expression in other tissues, and is specific to fat as apolpp promoter used to generate this Gal4 line is only expressed in fat tissue (Eaton and colleagues, PMID: 22844248). Other widely used fat-specific drivers, including the pumpless-Gal4 (ppl-Gal4) driver has leaky expression in gut or other tissues (See Table 2 of this detailed study by Dr. Drummond- Barbosa https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7642949/). If the reviewer is aware of a fat-specific Gal4 line, other than Lpp-Gal4, which has a highly specific expression in the fat tissue without leaky expression in other tissues, then we are happy to take onboard the reviewer’s suggestion and try that fat-specific Gal4 that they suggest.

      HSD feeding promotes Pect expression (Figure S3C) and global changes in phospholipid levels (Figure 3, 4). Therefore, shouldn't Pect overexpression (not Pect RNAi) in a normal diet mimic HSD feeding state and promote loss of HDF response? Conversely shouldn't knockdown of Pect in HSD rescue loss of HDF response?

      We agree that a puzzling aspect is that Pect mRNA levels are significantly elevated in HSD Day-14, but Pect-KD showed displays the inappropriate HDF response. As we have described in our response to this reviewer on Page 19, we believe that Pect-KD and HSD disrupt PE and LPE balance overall but in different ways. Whereas Pect-OE using cDNA expression in fat body does not cause a significant change to any lipid class (Figure S9), and our results suggest that basally higher level of PECT is likely to be protective on HSD with respect to HDF(Figure 7B).

      To ensure that we appropriately discuss and clarify this issue, we have now included a section in the discussion - Page 14 Lines 26-33- under the subtitle “The implications of relationship between Pect levels and HSD”. We have pasted an excerpt from that subsection below for this reviewers assessment.

      Also, we note that over-expression of Pect cDNA in the fat-body does not alter phospholipid balance (Figure S9) and indeed improves HDF on HSD (Figure 7B). While this may appear inconsistent, it is critical to note that over-expression of Pect cDNA using UAS/Gal4 only increases Pect mRNA expression by 7-fold (Figure S6A), whereas HSD causes its upregulation by 250-fold (Figure S6B). Hence, we speculate that an increased ‘basal’ level of Pect such as by that provided by a cDNA over-expression in fat, may be protective to the negative effects of HSD (Figure 7B) without affecting overall phospholipid levels (Figure S9) , but extreme upregulation Pect on HSD affects the PE and LPE balance (Figure 3).”

      We would have liked to test Pect protein expression on HSD, but since we were unable to access antibodies for Pect published in a prior study (PMID: 33064773) from Dr. Wang’s lab (see Page 10-11, of response to Reviewer #1). Hence, we were unable to test how the proteins levels of Pect correlate with the 250-fold increase mRNA expression.

      In conclusion, we hope the reviewer appreciates that our results regarding Pect function are consistent with the main conclusion that achieving the right phospholipid balance between PE and LPE, is critical for an organism to display an appropriate HDF response.

      Minor comments: All graphs should plot individual data points and showed as box and whisker plot as much as possible.

      Thanks for this suggestion, we have added individual data points to the vast majority of figures in the paper. We have made exceptions to graphs such as seen in figure 1 and FigureS4B-D where we find individual data points add an unnecessary layer of complexity. We hope these changes provide additional clarity and strength to the claims made in this manuscript.

      Data for day 14 missing in Figure S4A and S4B.

      We have provided Day 14 for the PC composition and PE composition, due to changes in Figures, they are now S7A and S7B.

      Reviewer #3 (Significance (Required)):

      The interactions between diet-induced obesity, peripheral tissue homeostasis and feeding behavior is an interesting topic that can be addressed using Drosophila. This manuscript demonstrates how fat body Pect levels affect HSD induced changes in hunger-driven feeding response. However, at this point, the functional association between fat body Pect level, global phospholipid level, and loss of hunger-driven feeding response in chronic HSD feeding is not clear.

      We hope the revised data, and discussion of the paper, provides well-substantiated functional association on the importance of maintaining phospholipid balance, driven by Pect enzyme, as a critical regulator of hunger-driven feeding behavior. As stated in the revised discussion, the key take home message of our manuscript is that on prolonged HSD exposure PC, PE and LPE levels are dysregulated, the loss of phospholipid homeostasis coincided with a loss of hunger-driven feeding. Following this lead on phospholipid imbalance, we then uncovered a critical requirement for the activity of the rate-limiting PE enzyme PECT within the fat tissue in controlling hunger-driven feeding.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Kelly et al. show that the difference between the feeding behavior of fed and starved flies (hunger-driven feeding; HDF) is absent in animals fed a high-sugar diet (HSD) for two weeks or more. The disappearance of HDF with HSD coincides with changes in phospholipid profiles caused by HSD. Furthermore, RNAi-mediated downregulation of PECT in the fat body-a key enzyme in the PE biosynthesis pathway-phenocopies physiological effects of HSD. Moreover, downregulation or overexpression in the fat body abolishes or induces HDF, respectively, abolishes or induces HDF, respectively, independent of HSD treatment.

      Overall, the manuscript is well-written and the phenotypes are clear. However, I have major concerns regarding the authors' interpretation of the data and their conclusion. Most importantly, while it is clear that the authors' high-sugar dietary treatment affects feeding behavior and physiology, I am not convinced that the changes can be considered "hunger-driven"-which is central to the main point of the manuscript. Therefore, it is my recommendation that the authors substantially revise the manuscript by either showing additional/re-analyzed data that rule out alternative hypotheses, or rewriting the manuscript keeping alternative interpretations in mind.

      Major comments:

      1. The data do not sufficiently show that the long-term HSD regime disrupts "hunger-sensing." The manuscript should address alternative hypotheses by showing raw instead of normalized data, rewriting the manuscript with a new central conclusion, or running additional experiments that actually show a defect in hunger-driven response.
        • a. The main results that the authors rely on for the argument is that the ratio of feeding events that the starved and non-starved flies eat is different between the groups fed normal or HSD. However, because the authors only show normalized data (normalized to non-starved flies; Fig. 1), it is difficult to tell whether the change is due to a chronically increased feeding in non-starved HSD flies-maybe in perpetual hunger-like allostasis-or dampened starvation response. Indeed, the data shown in Fig S1 show that flies fed HSD for as short as 5 days show more frequent feeding events compared to age-matched controls fed normal food. It is possible that because the HSD-fed flies eat more than NF-fed flies, even without being starved, the ratio of starved/non-starved feeding is lower in the HSD-fed group-due to changes in the denominator, rather than the numerator.
        • b. It is also possible that the HSD-fed flies are simply not in as big an energy deficit physiologically, due to the increased fat deposits they've accumulated (as the authors show later in the manuscript). It may take longer for the fat HSD flies to reach substantial energy deficiency than the NF flies, but they still may eventually be able to appropriately respond to hunger, just like NF flies. In such case, it would be a misnomer to call this behavioral change a 'defect in hunger-driven feeding behavior.' Maybe an experiment with a dose-response curve of "hunger driven feeding response" as a function of duration of starvation would help?
      2. How can you be sure that lower Dilp5 immunofluorescence is indicative of increased Dilp5 secretion? Wouldn't decreased production of dilp5 also have the same results?
        • a. Also, the authors should state in the main text that it is Dilp5, not just any Dilp.
      3. Data presentation:
        • a. Sometimes the data are normalized to NF (Fig 4B-C), sometimes not (ex. Fig 4A, S4C). Unless there is a specific rationale for the data transformation, it would be more appropriate to show untransformed data (ex. Fig 4A, S4C), especially as the authors use two-way ANOVA to determine significance. Only showing the differences implies comparison against a hypothetical mean (i.e. μ0=0), not between two group means.
        • b. Some figures show both individual data points and summary statistics (mean, SD, ... ex. Fig 2A)-which I believe is ideal-but some show only one or the other (ex. Fig 2B, no summary statistics; Fig. 3, no data points. The manuscript would read more convincing if data visualization is consistent across figures.

      Minor comments:

      1. High sugar diet: what is the actual sugar concentration in the NF v. HSD diets? The authors write that the HSD diet contains "30% more sugar" than the NF, but providing the final sugar concentrations-sucrose or others-would be informative for other scientists studying the effect of high sugar diets.
        • a. Additionally, the definition of HSD is inconsistent. Main text (Page 5, line 17) states that their HSD is "60% more sugar than normal media," whereas the figure legend (Fig 1) and the Methods state that the HSD contains "30% more sugar."
      2. Starvation medium: please provide justification for why the authors used 1% sucrose/agar for starvation medium, instead of plain agar/water that most labs use. At least clarify and provide a reference for the claim that the 1% sucrose/agar "is a minimal food media to elicit a starvation response."
      3. PECT mRNA level is higher with HSD. This is surprising because not only, as authors mention, is increased PC32.2 with HSD suggests lower PECT activity, but also because PECT RNAi phenocopies long-term HSD in HDF behavior, lipid morphology, FOXO accumulation in fat body. The authors speculate that the data "likely shown an upregulation in an attempt to mediate the PECT dysregulation occurring at the protein level." If that were true, a western blot may be informative. Zhao and Wang (2020, PLoS Genetics) generated a PECT antibody that seems compatible with western blot applications. That being said, I don't think such data is critical for the manuscript. I mention this simply as a suggestion for the authors.
        • a. page 8, line 22-23, did you mean to write "Given how PC32.2 is elevated after 14 days of exposure to HSD, we assumed that PECT levels would be low for flies under HSD," not "high?" Otherwise the subsequent 2 sentences don't make sense.

      Significance

      The work is potentially novel and interesting, but at this stage it's difficult to interpret what the phenotype signifies. Although the manuscript could be revised simply by modifying the text, experimentally addressing the concerns would significantly improve the work.

      The co-reviewer and I have expertise in Drosophila neurobiology and behavior.

      Referees cross-commenting

      Hi all, although the reviews hit upon some overlapping, but mostly different points, I agree with all of the concerns raised. There's some really interesting stuff here but some of the results, as presented, don't make sense. It's possible this will be clarified by revising the text, although I suspect it's more likely that the authors will have to add a number of the experimental suggestions made by the reviewers.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewers comments in italics *

      We thank all reviewers for their positive and encouraging comments and criticisms to improve our work. Here we present a reviewed version of the manuscript according to the comments risen.

      • Reviewer #1 (Evidence, reproducibility and clarity (Required)): This is an interesting paper that identifies Tns3 as a potential effector of oligodendrocytes differentiation based on an ingenious strategy comparing regulatory binding sites of known master regulators of differentiation, and then shows using in vivo genetics that this role is indeed correct. Next, a potential mechanism is identified by showing co-localization with beta 1 integrin, known to regulate apoptosis of newly-formed oligodendrocytes. The results are well illustrated and the experiments performed with appropriate power using a broad range of techniques that combine in silico, in vitro and in vivo work to great effect.

      I think this represents an important contribution that will be of significant interest to neuroscientists - the mechanisms regulating oligodendrocytes generation remain poorly understood and the evidence that this contributes to adult learning (adaptive myelination) and CNS regeneration makes this a key question. I would suggest that the following are considered before publication: We thank the reviewer for this positive comments and critics to improve the manuscript. The work describing the KO mice that were not used as they proved unsuitable need not be described - it breaks the logical flow.*

      In agreement with the reviewer comment, we have reduced this part to a sort paragraph indicating that our analyses of several Tns3 constitutive KO lines showed developmental lethality and possible genetic compensation in Tns3 expression, leading us to conclude them inappropriate tools to study Tns3 function in oligodendrogenesis. We have summarized the data in Fig. S7 and the description in the method section.

      It would be useful to compare the extent of cell death in the Tns3 cKO mice with that described in the alpha6 integrin KO and the integrin beta1 cKO (the Colognato and Benninger papers). Do they match? If not (and I suspect the Tns3 cKO death is greater) could other mechanisms be downstream of the Tns3?

      In agreement with the reviewer comment, we have added the following paragraph to the discussion:

      ‘Knockout mice for integrin-a6 present a 50% reduction in brainstem MBP+ OLs at E18.5, just before they die at birth, accompanied by an increase in TUNEL+ dying OLs (Colognato et al, 2002). Similarly, conditional deletion of integrin-b1 in immature OLs by Cnp-Cre also leads to a 50% reduction in cerebellar OLs at P5, with a parallel increase in TUNEL+ dying OLs (Benninger et al., 2006). Therefore, given that Tns3-induced deletion in postnatal OPCs also leads to 40-50% reduction in OLs in both grey and white matter regions of the postnatal telencephalon (this study), paralleled by similar increase in TUNEL+ apoptotic oligodendroglia, we suggest that Tns3 is required for integrin-b1 mediated survival signal in immature oligodendrocytes.’

      I'm not sure why the authors argue that the activation of beta 1 would not be informative experiment? This will regulate actin dynamics just as it regulates other integrin signaling pathways. Indeed, I would argue that an integrin activation experiments would be a neat way to prove mechanism (as it would be predicted to rescue the Tns3 cKO phenotype).

      In agreement with the reviewer comment, we have removed this sentence: ‘If so, exogenous activation of integrin a6b1 in cultured OPCs by Mn2+ (Colognato et al., 2004) would not be expected to increase oligodendrogenesis in Tns3-iKO oligodendroglia.’

      In an effort, to understand Tns3 function by acute Tns3-deletion in postnatal OPCs, we have compared the transcriptome of Tns3-iKO oligodendroglia compared to control cells, and we present these results in figure 7 pinpointing deregulated genes leading to reduced oligodendroglial differentiation, integrin dysregulation, increase apoptosis, and conflicting cell cycle signaling, and leaving for further studies the full characterization how the loss of Tns3 leads to the deregulation of these processes.

      Can the authors provide any data on GM oligos and their OPCs? Is the requirement for Tns3 the same, and if so what might the implications be in the adult where new oligodendrocytes are being generated throughout life?

      Indeed, in our analyses of Tns3-iKO mice, we provide quantifications of the cortex as a grey matter territory, showing a similar 40-50% reduction in OLs as in white matter areas (corpus callosum and fimbria, and mixed regions such as the striatum.

      I note in S13 that integrin beta1 is not highly expressed in human oligos at the time in question. Does this call into question the relevance for human disease?

      We realize that scRNAseq plots are never easy to interpret but it is important to note that the levels of expression are coded by the intensity of the color scale, while the surface of the dot plots indicate the experimental sensitivity to detect transcript expression in a larger or smaller proportion of the cells in a given cluster/cell type (due to the drop out limitation of current single cell RNA-seq technologies). Considering this, please note that beyond a stronger expression in neural progenitor cells (NPCs, blue color), integrin-b1 (Itgb1) transcripts are expressed at medium to high levels (green to blue) in human immature OLs (Fig. S13B), similar to their pattern of expression in mouse oligodendroglia (Fig. S13A).

      Reviewer #1 (Significance (Required)): See above

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *In this article, the authors identify and characterise Tensin3 (Tns3) as a target of key oligodendroglial transcription factors driving differentiation in the mouse. They use multiple transgenic models to describe loss of function, and suggest Tns3's action through integrin B1 signalling, with the key function being oligodendroglial survival.

      There is extensive and impressive work here, including identification of Tns3 by ChIPseq, expression of Tns3 in brain development, analysis of human (ES-derived) and mouse scRNAseq to infer timing of expression in the differentiation pathway, generation of V5-tagged Tns3-KI mice to overcome antibody limitations, identification of its expression in mouse remyelination, generation of a new Tns3KO mouse, in vivo Crispr Tns3KO in development, generation of a conditional KO, for deletion in adulthood, and finally some culture work to investigate potential mechanisms of actions. The bottom line is that Tns3 is required for survival of OPCs and immature oligodendrocytes in development/remyelination in mouse at least, and loss leads to apoptosis (through p53 increase and loss of integrin-B1 signalling), leading to a failure of proper differentiation.

      The experiments are carefully done, convincing and the tools generated impressive. There is clearly more to be done on clarifying the mechanism of action of Tns3, but I do not think further experiments on this topic are needed for this paper - they can wait for the next.*

      We thank the reviewer for the positive and encouraging reviewing comments. In an effort, to understand Tns3 function by acute Tns3-deletion in postnatal OPCs, we have compared the transcriptome of Tns3-iKO oligodendroglia compared to control cells, and we present these results in figure 7 pinpointing deregulated genes leading to reduced oligodendroglial differentiation, integrin dysregulation, increase apoptosis, and conflicting cell cycle signaling, and leaving for further studies the full characterization how the loss of Tns3 leads to the deregulation of these processes.

      My only query is whether the expression of Tns3 is also in immature OLs in human brain (rather than human ES-derived OLs). This should be easily checked with interrogation of online Shiny apps from already published snRNAseq from various groups on human post mortem adult brain, but if not present then in also baby/fetal brain. This would be interesting and may well be different from the ES_derived cells which tend to be very immature and would add interest to the possible translational impact.

      According to the suggestion of the reviewer, we analyzed 69,174 snRNAseq GW9-GW22 from fetal cerebellum,; Aldinger & Miller, 2021; https://doi-org.proxy.insermbiblio.inist.fr/10.1038/s41593-021-00872-y), which we present now in Figure S3, finding a cluster of cells expressing iOL markers, including NKX2-2, TNS3, ITPR2, and BCAS1, similar to the hiPSCs-derived iOL1/iOL2 clusters and mouse iOL1/iOL2 clusters shown in Fig. S2.

      We also analyzed other datasets without finding iOLs given their age or numbers, including:

      • Immunopanned PDGFRA+ cells from human cortex GW20-GW24 (2690 cells, Huang and Kriegstein, Cell 2020) finding OPCs but not iOLs.

      -The recently published dataset from GW8-GW10 human forebrain oligodendroglia (van Brugen & Castelo-Branco, Dev Cell 2022; https://doi.org/10.1016/j.devcel.2022.04.016) containing OPCs but not iOLs.

      -The GW17 to GW18 human cortex (40,000 cells, Polioudakis & Geschwind, 2019, https://doi.org/10.1016/j.neuron.2019.06.011) containing OPCs but not iOLs.

      Reviewer #2 (Significance (Required)): This work extends our knowledge of oligodendroglial differentiation, links it to the ECM and provides interest in manipulating this in diseases including glioma. My expertise: myelin, oligodendroglia, remyelination, human neuropathology

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      see below Reviewer #3 (Significance (Required)): Using purified oligodendrocytes target genes of key regulators of oligodendrocyte differentiation were analyzed, which led to the identification of Tensin-3. The authors performed a detail characterization of Tensin-3 expression. They found that Tensin-3 is highly expressed in immature mouse and human oligodendrocytes. Interestingly, Tensin-3 is selectively enriched in immature oligodendrocytes, and not present at detectable levels in OPCs and mature oligodendrocytes. Subsequently, the authors characterized Tensin-3 function by a series of knockdown approaches in vitro and in vivo. These series of experiments revealed an essential function of Tensin-3 in supporting oligodendrocytes survival. In the absence of Tensin-3 a large fraction of oligodendrocytes undergo apoptosis while differentiating to mature oligodendrocytes. This is a remarkable study applying an impressive array of methods that led to an important discovery in the field of oligodendrocyte biology. The main advances for the field are: 1) identification of a novel marker for premyelinating oligodendrocytes, 2) elucidation of Tensin-3 as a pro-survival factor in oligodendrocytes differentiation, 3) evidence of link of Tensin-3-integrin signal in survival of oligodendrocytes. The data is well presented and organized, and the paper well written. I recommend publication with only minor suggestions for a revision:

      • *

      We thank the reviewer for this positive comments and critics to improve the manuscript.

      In Figure 2, only images are shown, and the data is referred to as highly expressed or strong co-localization. Even if the data looks clear, the authors should provide some quantification of the data in the figure.

      We thank the reviewer for his comment and we have now provided a quantification of the fraction of Tns3+ cells expressing different markers of oligodendrocyte lineage progression/stages, and the percentage of each stage expressing Tns3.

      Figure 3 is given too much weight in the manuscript text. I would recommend to shorten the text in the result section, and to move this figure to the supplement as it does not advance the story. It mainly shows that the KO mice still express transcripts in the brain. Were the transcripts lost in peripheral tissue?

      • *

      As mentioned above, in agreement with the reviewers #1 and #3 comments, we have reduced this part to a sort paragraph indicating that our analyses of several Tns3 constitutive KO lines showed developmental lethality and possible genetic compensation in Tns3 expression, leading us to conclude them inappropriate tools to study Tns3 function in oligodendrogenesis. We have summarized the data in Fig. S7 and the description in the method section.

      Page 11: the authors describe in the text how the floxed allele was generated. This should be shifted to the supplement.

      According to reviewers suggestion, we have moved the description of Tns3 floxed allele generation to the Methods section. Page 16: the authors refer to Bcas1 as a problematic marker for immature oligodendrocytes, because the transcript is also expressed in mature oligodendrocytes. The authors are correct that the transcript is expressed in mature oligodendrocytes. However, the proteins changes its localization when oligodendrocytes mature. On protein level, it is valuable and a selective marker, as antibodies only label pre-myelinating and actively myelinating cells. In mature oligodendrocytes, antibodies against Bcas1 do not label the cell, only myelin. The text is misleading and needs to be corrected.

      In agreement with reviewers comment we have modified the text as follows: ‘An optimized protocol for immunodetection using Bcas1-recognizing antibodies has been shown to label iOLs (Fard et al., 2017).’

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Tran et al. describes the mechanism by which IFNa treatment prevents the development of liver CRC metastasis in several mouse models. They show how continuous administration of IFNa strength liver vascular barrier by a direct effect on endothelial cells and avoids the trans-sinusoidal migration of tumour cells.

      Major points:

      1. Authors use an elegant orthotopic model of liver metastasis to confirm the effect of continuous IFNa on hepatic colonization (Fig.3). Although they extensively characterize the metastatic lesions, they do not show data on the potential impact of IFNa treatment in the primary caecum tumour. Authors should clarify if the described effects are taken place in the liver or/and in the caecum. It would be interesting to show if IFNa affects the primary tumour size, the extravasation of cancer cells and the immune infiltration since all these factors could have an impact in the number of liver lesions.

      We thank the reviewer for acknowledging the importance of our results particularly in the context of the orthotopic mouse model we developed. We agree that displaying the results of continuous IFNα therapy on primary intracecal tumors, as well as the results pertaining to the few mice that develop microscopic or macroscopic liver metastasis, is important for the interpretation of our work. Thus, we evaluated the dimension of primary intracecal CRC lesions (Fig 3D,E) and we performed additional IHC characterization of the primary tumors (Fig S4A,B). The analysis showed that the dimension of the primary lesions and the markers we analyzed were non significantly modified by continuous IFNα therapy (Fig 3D,E and Fig S4A,B). These results favor the hypothesis that IFNα therapy does not modify the number of cells that spread from the primary tumors and seed into the liver, but it rather impinges on the intravascular containment of CRC cells circulating within the liver (Fig 3F). As said earlier, the data also highlight the possibility that CRC tumors may become refractory to IFNα or that the dose and schedule we adopted does not significantly affect the growth of established liver CRCs at late time points. The data are also consistent with results obtained with MC38Ifnar1_KO CRC cells indicating that continuous IFNα therapy does not require Ifnar1 expression by tumor cells to exert its antimetastatic function (Fig 4A,C-D). This is also in line with the high IFNα concentrations required to activate the "tunable" direct antiproliferative functions of this cytokine that exceed those achieved in our system (Catarinella et al, 2016; Schreiber, 2017). Text has been added in the revised manuscript at lines 175-197 and in the discussion lines 425-431.

      1. Figure 3f right shows liver images without any obvious metastatic lesion. Since authors are analysing the effect of IFNa treatment in proliferation, vascularization and immune composition in liver tumours, they may show and quantify images with metastatic lesions and restrict the analysis to the tumour area.

      Since the main finding of our manuscript regards the prevention of hepatic colonization by continuous IFNα therapy, we think that the original data presented in Fig 3G,H are representative of the overall efficacy of our strategy that confers protection in up to 60% of the mice carrying intramesenteric tumors of increasing dimensions (Fig 3H). We have thus maintained our original results, adding the quantification of all IHC data on groups of Sham control livers (n=6), as suggested. In any case, we also included the same IHC characterization of the few and small intrahepatic lesions that have bypassed the intravascular antimetastatic barrier (Fig S4C,D). Indeed, in agreement with the results observed in primary intracecal lesions, these metastatic lesions that developed in IFNαtreated mice showed similar markers of cell proliferation, neoangiogenesis, F4/80 macrophages and CD3+ T cells, as control lesions detected in NaCl-treated mice. Once again, the results highlight the possibility that CRC tumors, once established as micro/macroscopic metastases, may become refractory and resistant to IFNα therapy by downregulating the Ifnar1 in various components of the tumor microenvironment (Boukhaled et al., 2021; Katlinski et al., 2017). Text has been added in the revised manuscript at lines 175-197 and in the discussion lines 496-515.

      1. Authors analyse the recombination efficiency of different mouse CRE lines by non-quantitative methods (PCR of hepatic genomic DNA and GFP expression by immunofluorescence in healthy liver). Since PDGFRβ-Cre/ERT2 and CD11c-Cre lines are used to exclude a role of IFNa on the targeted cells, authors should provide stronger evidences to support this. They may consider studding the ablation of Ifnar1 in FACS sorted fibroblasts and myeloid cells. Moreover, it would be important showing the proportion of GFP+ cells in the sorted populations to understand how broadly these stromal populations are targeted.

      We thank the referee for raising this important issue, which is related to the relative efficiency of Ifnar1 recombination in each of the Cre-expressing mouse models we have used in the study. To this regard, we newly performed an extensive colocalization analysis quantifying the percentage of GFP+ cells that colocalize with cell specific markers (i.e., PDGFRβ, CD11c, F4/80 and CD31) of the various mouse models (PDGFRβCreERT2, CD11cCre and VeCadCreERT2, respectively) crossed with RosaZsGreen reporter mice. Colocalization analysis of GFP in the different systems was performed using the ImageJ “colocalization” algorithm developed by Pierre Bourdoncle (Institut Jacques Monod, Service Imagerie, Paris; 2003–2004). The method allows the generation of unsupervised profiles of co-localized pixels between two channels. This methodology has been included in the section Methods and Protocols, line 806-809. Of note, we observed an almost complete recombination in liver fibroblast (GFP+/PDGFRβ+), with about 98.2 ± 0.72% hepatic stellate cells that co-expressed GFP+ and PDGFRβ+ signals (see the new Fig S5E). Similarly, hepatic DCs (GFP+/CD11c+) had 94.17 ± 2.16% colocalization, while F4/80+ KCs or LCMs (GFP+/F4/80+) colocalized in 78.14 ± 5.03% (see the new Fig S5E). Finally, HECs, including LSECs, (GFP+/CD31+) showed 85.3 ± 5.03% colocalization (see the new Fig S5E,F), with no expression of GFP signals in cells other than CD31+. Note that these values indicate an almost complete colocalization of the Cre recombinase in the target cell types analyzed (see representative IF shown in Fig S5E). Text has been added in the revised manuscript at lines 225-233. Moreover, DEGs analysis between NaCl-treated VeCadIfnar1_KO and Ifnar1fl/fl HECs showed a significant downregulation of Ifnar1 expression in CD31+ VeCadIfnar1_KO cells, with a log2 fold-change of -0.387 and an adjusted p-value of 0.033, further confirming Cre recombination in HECs isolated from VeCadIfnar1_KO mice (as depicted in the heatmap of Fig 6B; the 12th gene of the Type I IFN response is Ifnar1). We have prepared all source images at higher dimension to better appreciate the colocalization within liver microvasculature. In addition, we performed several flow cytometry analyses to identify liver cell populations of Cre-recombinant mice that express Ifnar1. Unfortunately, the predicted low cellular surface expression of this molecule coupled with the experimental conditions needed to extract viable non-parenchymal cells from the liver have prevented us from obtaining informative results.

      1. Ifnar1 ablation in VeCad+ cells prevents the effect of IFNa on tumour growth (Fig. 4d), suggesting the existence of anti-tumour mechanisms beyond the effects on hepatic colonization. Authors may consider checking proliferation, vascularization and immune infiltration in these tumours to enhance their conclusion.

      We fully agree with the referee’s concern and as above mentioned, we have followed his/her suggestion and examined the existence of antitumor mechanisms beyond the effects on hepatic colonization in VeCadIfnar1_KO mice treated with NaCl or IFNα. To this end, 4 NaCl-Ifnar1fl/fl, 7 IFNα-Ifnar1fl/fl, 4 NaCl-VeCadIfnar1_KO and 4 IFNα-VeCadIfnar1_KO mice were intrasplenically injected with MC38 CRC cells (Fig S7A,B). Twenty-one days after injection, mice were euthanized and their livers analyzed for tumor size, proliferation, signs of angiogenesis (as denoted by CD34 staining) and immune infiltration (F4/80+ macrophages and CD3+ T cells). Consistent with data presented in Fig 4D, histological analysis showed that Ifnar1fl/fl mice did not develop liver metastases in IFNα-treated mice. Furthermore, metastatic lesions detected in VeCadIfnar1_KO mice treated or not with IFNα did not show significant differences in Ki67 positivity, CD34 staining or the amount of F4/80+ resident macrophages and CD3+ T cells. This further supports that the antimetastatic potential of IFNα therapy may be primarily depend on the inhibition of hepatic trans-sinusoidal migration, a limiting step in the metastatic cascade that could secondarily influence colonization and outgrowth (Chambers et al, 2002). Corresponding text has been added at lines 248-252.

      1. Immune properties of LSECs are analysed in vivo by using a mouse CRE line that targets all endothelial cells, including those ones located in lymphoid organs, and evaluating T cell composition in the spleen. I found difficult to conclude that these properties are exerted directly by LSECs and not by other endothelial cells in vivo. To clarify the local effect of LSECs in modulating anti-tumour immunity, T cell composition and activation should be checked in tumours shortly after tamoxifen administration.

      We thank the reviewer for pointing out this issue, which cannot not be tested directly because - as also mentioned by reviewer 2 - LSEC-specific Cre-recombinant driver mice do not exist . As also indicated in the cited literature, central memory T cells accumulate after peripheral priming in secondary lymphoid organs such as the spleen (Sallusto et al, 2004; Stone et al, 2009; Yu et al, 2019). To this end, the generation and regulation of antitumor immunity is a highly orchestrated multistep process involving the uptake of tumor-associated antigens by professional APCs, their time-consuming migration to draining lymph nodes and the generation of protective T cells. Unlike other APCs, HECs/LSECs do not need to migrate to draining lymph nodes to activate effector T cells, leading to a rapid intrahepatic CD8+ T cell activation. In this context, LSECs must not only efficiently uptake, process and present CRC-derived antigens coming from intravascularly contained tumor cells, but they also require the attraction and retention within the liver micro-vasculature of T cell populations necessary for the generation of effective antitumor immune responses, where chemokines play an important role (Lalor et al, 2002). As shown in Fig 6A-C, two prominent chemokines (Cxcl10 and Cxcl9) required for T cell recruitment to the liver are specifically upregulated only in HECs/LSECs from IFNα-treated Ifnar1fl/fl mice, whereas HECs from VeCadIfnar1_KO mice maintained low expression of these chemoattractants in both NaCl- and IFNα-treated mice. These data are also consistent with the in vitro cross-priming results (see Fig 7A,B) showing that in the absence of IFNα, HECs have a low capacity to prime naïve T cells (Katz et al, 2004), indicating that LSEC-primed by tumor-derived antigens coming from apoptotic intravascular CRC metastatic cells play an important role in inducing tolerance (Berg et al, 2006; Katz et al., 2004), especially when CRC cells quickly extravasate and position within the space of Disse, likely becoming less accessible to intravascular patrolling by naïve and effector T cells (Benechet et al, 2019; Guidotti et al, 2015). On the contrary, in IFNα-treated Ifnar1fl/fl mice, CRC cells are rapidly contained in the liver microvasculature (Fig 5A,B) with CRC-derived antigens that could be immediately taken up by LSECs due to their anatomical proximity and efficient endocytosis capacity, which is among the highest of all cell types in the body (Sorensen, 2020). Here, the continuous sensing of IFNα by LSECs upregulates several genes related to antigen processing and presentation pathways (Fig. 6B,D), leading to efficient cross-priming of tumor-specific CD8+ T cells to the same extent as professional APCs, such as splenic DCs (Fig 7B). Text has been added in the revised manuscript at lines 496-515. Finally, regarding the suggestion to analyze the role of HECs/LSECs in inducing antitumor T cell immunity shortly after tamoxifen administration, while we agree that it would be interesting to analyze HEC/LSEC-mediated T cell activation by treating NaCl- and IFNαtreated Ifnar1fl/fl and VeCadIfnar1_KO mice with tamoxifen after CRC cell injection, we would like to point out that tamoxifen treatment will not only induce Cre recombination and Ifnar1 loss on endothelial cells but it may also induce several “off-target” effects complicating the interpretation of the results. Indeed, tamoxifen is known to i) inhibit the in vitro proliferation of several CRC cell lines (Ziv et al, 1994), ii) impair the growth of CRC liver metastases in vivo (Kuruppu et al, 1998) and iii) modify matrix stiffness to reduce tumor cell survival (Cortes et al, 2019). Further, as IFNα modifies the hepatic vascular barrier and the accessibility of antigens by LSECs, the specific timing of tamoxifen treatment could also affect the immunological consequences of Ifnar1 deletion making these experiment impractical. For these reasons, we’d like not to perform the suggested experiment with tamoxifen.

      Reviewer #1 (Significance):

      The conclusions of this study are consistent with previously published literature and the biological insights are potentially useful to the cancer biology community.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study Dr. Sitia's group investigated the effect of IFNα1 as perioperative agent preventing liver metastasis formation of colorectal carcinoma (CRC). To this end, various mouse models were used such as liver colonization models, i.e. intrasplenic and mesenterial injections of MC38 and CT26 CRC cell lines. Besides, spontaneous metastasis of CRC was analyzed by orthotopic injection of MC38 into the cecum. To study the influence of IFNα1 in these settings mini-osmotic pumps releasing IFNα1 were used. Moreover, conditional mouse models with a cell-type specific deficiency of Ifnar1 were compared. Altogether, the application of IFNα1 led to a reduction in liver colonization of CRC in all models studied. This was ascribed to decreased trans-sinusoidal migration of CRC and increased cross-priming by LSEC entailing in T cell activation.

      Major comments:

      Overall the study is well performed and the major conclusions seem to be drawn well. However, there are certain points I like to address:

      • First, the authors started their experiments with MC38 and CT26 CRC cell lines. At the end they just applied MC38. The rational behind this should be clearly stated. Second, as in their previous publication (Catarinella et al, 2016) F1 hybrids of C57BL/6 x BALB/c mice were used for the experiments. However, I believe that the genetic heterogeneity might be strongly increased by this approach which might lead to difficult reproducibility of the results.

      We thank the referee for raising this important issue; additional text describing the reason of our choice has been introduced at lines: 203-205. We respectfully disagree with the comment that CB6F1 hybrids may increase genetic heterogeneity and impair reproducibility of our results. Each CB6F1 hybrid individual is genetically identical to its littermates, sharing 50% of genes of each parental mouse line and being tolerant to reciprocal MHC-I genes (thus permitting the correct engraftment of both cell lines). We agree that the use of mismatched backcrosses after the F1 generation would increase genetic heterogeneity and thus may affect outcome. This is also the reason why we could not perform experiments with CT26 in the Ifnar1fl/fl conditional lines that are in C57BL/6 background and would have needed at least 10 generations of backcrossing in the BALB/c background before being suitable to such experiments. Finally, all experiments described in Fig 4, 5, 6 and 7 were performed in C57BL/6 mice using MC38 CRC cells with results that reproduced those obtained in CB6F1 hybrids, and very similarly to what we have previously reported with MC38 in C57BL/6 mice (see Fig 5 (Catarinella et al., 2016)).

      • At page 16 the authors conclude that "patients suffering from chronic liver fibrotic disease... display lower incidence of hepatic metastases". In the community there is contradictory data (see Kondo et al, BJC, 2016, https://www.nature.com/articles/bjc2016155). This should be precisely discussed, otherwise this claim should be removed.

      We thank the referee for raising this issue and modified the discussion accordingly. Text has been added in the revised manuscript at lines 455-457.

      We agree with the reviewer's suggestion and added new text to recognized the interplay between different cell types such as dendritic cells within the hepatic niche (see new text at lines 505-515).

      • Last, multiple times the authors write about data that is "not shown". Please either include these data in the manuscript or delete corresponding phrases because it is not possible for the reader to scrutinize it.

      We fully agree with the referee’s concern and displayed all “not shown results” in Fig S1E and Fig S9C-I.

      • Besides, I suggest additional experiments further substantiating the study:
      • To see if this effect of IFNα1 is cell type-specific liver metastasis of other solid tumors such as breast cancer or melanoma should be investigated.

      We agree with the reviewer's suggestion, as also indicated in our original discussion. We believe that additional experiments with other solid tumor cell lines would be important to generalize the potential of perioperative IFNα therapy. In particular, we believe that pancreatic ductal adenocarcinoma (PDAC), a highly lethal disease that most commonly metastasizes to the liver (Lambert et al, 2017), may benefit from our approach. It should be noted, however, that the pleotropic nature of IFNα allows this cytokine to inhibit tumor growth by several mechanisms. Above all, the ability of IFNα therapy to directly reduce tumor growth depends on the relative surface expression of Ifnar1 on each tumor cell and the ability to maintain such expression in the harsh tumor microenvironment during IFNα therapy. As the degradation of Ifnar1 by CRC tumors has been well described (Katlinski et al., 2017), it is possible that CRC tumors thus escaping the antitumor properties of endogenous type I interferons may respond less efficiently to therapeutic IFNα regimens such as those herein described. This notion is consistent with our data on primary orthotopic tumors (Fig. 3D,E), which are no longer responsive to continuous IFNα therapy as early as 7 days after implantation of CT26LM3 cells. In addition, the definition of the HEC/LSEC antimetastatic barrier has been possible only because CRC cells are not directly susceptible to the IFNα antiproliferative activity, which we observed in vitro at extremely high IFNα dosages (Catarinella et al., 2016) but not in vivo (as formally demonstrated by using MC38Ifnar_ko cells, Fig 4A). At any rate, we followed the reviewer’s suggestion and performed an additional experiment in which we intramesenterically injected the PDAC cell line Panc02 (H-2b, C57BL/6-derived) (Soares et al, 2014) into C57BL/6 mice 7 days after of NaCl or IFNα therapy initiation. As shown below, MRI analysis at day 21 showed that none of the IFNα-treated Panc02 challenged mice developed metastatic lesions, while NaCl controls displayed a high metastatic burden that required euthanization for ethical reasons of about 67% of these mice shortly after MRI analysis. These data indicate that perioperative IFNα therapy completely curbs metastatic development in IFNα-treated PDAC animals. The notion that these cells may be more IFNα-susceptible than CRCs may well depend on the relative capacity of the former cells to maintain Ifnar1 expression, as suggested by others (Zhu et al, 2014). Properly addressing the reviewer’s comment would thus require extensive investigations involving the establishment of new mouse models of metastases from other solid tumors, starting from the in vitro and in vivo regulation of surface Ifnar1 expression in each tumor cell. We strongly believe that this work has merit but we think that it should be reported separately.

      • The authors applied a broad range of cell type-specific mice. However, a thorough characterization of the deletion of Ifnar1 in the corresponding cell types is missing. This is crucial for the manuscript.

      We fully agree with the referee’s concern and as previously mentioned, we have improved the characterization of Ifnar1 deletion (see response to the same critique received from reviewer 1, comment 3).

      • The capillarization of the hepatic vascular niche is a crucial point in this story. I believe that the hepatic endothelium should be further characterized by additional vascular markers.

      In response to the reviewer’s suggestion, we have included in our analysis the characterization of Lyve-1, a marker of hepatic capillarization (Pandey et al, 2020; Wohlfeil et al, 2019). Indeed, IFNα treatment of Ifnar1fl/fl mice significantly increased the expression of Lyve-1, whereas IFNα treatment of VeCadIfnar1_KO mice showed no effect (Fig S9A,B), further corroborating our findings. Text has been added in the revised manuscript at lines 291-294. To better aid readers, we have prepared high-resolution images for each IF channel and have provided these data as source date for Fig S9A.

      • Last, the data and methods appear adequately presented and experiments seem to be reproducible. Just in Figure 4 the exact number of mice and replicates are not clearly presented. Otherwise, everything is fine.

      We thank the reviewer for raising this issue, which apparently was not properly described in our original submission. We have now included the exact number of mice in each experimental group in the figure legend to Fig 4.

      Minor comments:

      Overall the text and figures are accurately presented. However, I would like to add further minor comments:

      • In Fig. 1 you present the IFNα dosing regimen. How do you explain the decrease in serum IFNα after day 2? Besides, the data points at day 0 should be excluded since measuring startet from day 2! Why did you decide to treat for seven days until the start of the experiment? One could think 2 days might already be enough.

      We thank the reviewer for raising these important points. Regarding the pharmacokineticpharmacodynamic (PK-PD) behavior of our approach, we do not believe that MOP reduced its pumping efficacy after day 2 (Theeuwes & Yum, 1976), nor that counterregulatory mechanisms, such as the induction of anti-IFNα blocking antibodies, occurred in such a short time frame (Wang et al, 2001). It is neither feasible that IFNα treatment significantly downregulated Ifnar1 in the liver (as demonstrated by pSTAT1 activation after MOP treatment in Fig S1E). Rather, our results reflect the PK-PD behavior of other long-lasting formulations of IFNα, which depend on intrinsic pharmacological properties of IFNα already described in (Jeon et al, 2013). Text has been added in the revised manuscript at lines 110-112. We also corrected the figures in which we quantified serum IFNα. Indeed, blood was drawn one day before MOP implantation rather than on the same day of surgery to avoid additional blood loss, which could be a source of unnecessary stress for the animals. Therefore, we corrected the results section and Fig S1A-C and Fig 1A,B. The decision to start treatment 7 days rather than 2 days before seeding was made for several reasons: i) this study follows our previous gene/cell therapy approach, in which the time interval between reconstitution of the transduced bone marrow with Tie2-IFNα and tumor challenge was at least 7-8 weeks. We therefore thought that 7 days might be a sufficient/necessary time period to induce similar phenotypes in the liver after continuous IFNα administration; ii) 7 days is a time frame compatible with the perioperative period in humans (Horowitz et al, 2015). Furthermore, the side effects that patients may experience after IFNα therapy are generally limited to the first few days after administration, allowing patients to benefit from IFNα-induced vascular antimetastatic barriers at the time of surgery without potential side effects of IFNα. Because oncologic guidelines recommend starting adjuvant chemotherapy at least 4 weeks after surgery in stage 2-3 CRC patients at risk of later developing liver metastases (Engstrand et al, 2019; van Gestel et al, 2014), our proposed perioperative time frame does not even conflict with these indications (Van Cutsem et al, 2016). We have included additional text in the lines 131-132 to motivate the timing of our regimens.

      • Fig. 2: Did you check for metastases in other organs than the liver at the timepoint of euthanization, e.g. lungs. In the discussion section you talk about a potential influence of IFNα1 on other organs. Therefore, I think that the mice should be thoroughly analyzed and the data presented. The manuscript will benefit from it.

      We thank the reviewer for this valuable comment. Indeed, we always check for dissemination of CRC metastases on MRI analysis and necroscopy. As stated at lines 146-147 and 158 CRC tumors seeded in the liver vasculature after colonizing the liver do not spread to other organs such as the lungs. Indeed, CRC cells intravascularly seeded in the portal circulation, are trapped at the beginning of hepatic sinusoids because their diameter is bigger than that of liver sinusoids (Fig S8A,B). These micro-anatomic peculiarities are also thought to impede the spreading of tumor cells from periportal to centrilobular areas and to the general circulation (Catarinella et al., 2016; Vidal-Vanaclocha, 2008), and this is consistent with studies showing that in CRC patients undergoing surgery the majority of CRC-derived circulating tumor cells are found in the portal vein (Deneve et al, 2013).

      • Overall, MRI pictures and pictures of IHC or IF are sometimes too small to see. Please provide pictures with larger magnification or enlarge the images.

      We thank you for this suggestion and we have indeed increased the size of all MRI, IHC, and IF images to the maximum that will fit within the figure. In addition, we presented the images at the highest magnification available, without making digital enlargements that would significantly reduce resolution.

      • Fig. 3 F, G: immune cell infiltration in the liver was analyzed. Please compare it to untreated, tumor-free wildtype liver tissue.

      We appreciated the reviewer's suggestion and included the results of six Sham mice per each marker in our analysis. The text was added on the figure legends to Fig 3H and Fig S4B,D.

      • Fig. 6: the graphs are too small to be read, especially the volcano plot and the gene names of the heatmap.

      We increased the font size of genes in the volcano plots and heatmap in Fig 6A,B, as suggested.

      • Fig. S6: Pictures of co-immunofluorescences are presented. For the reader it is really hard to distinguish the stainings and to identify colocalized areas. Please provide pictures with one channel to better compare the marker expression.

      We thank the reviewer for pointing this out and we have tried to make each panel as large as possible to fit into a two-column figure. We have also prepared high magnification images of each channel for all immunofluorescence images, which we provide as source data. We hope that this is sufficient to help readers to interpret our results without increasing the number of main or supplementary figures.

      • From page 8 onwards (section about transgenic mice) LSEC was used as kind of synonym for hepatic endothelial cells. Since there is still no LSEC-specific driver mouse, it should be stated "hepatic endothelial cells" instead.

      We agree with this suggestion and thus have indicated that the results refer to HECs but include a large majority of LSECs. Indeed, LSECs make up the majority (~89%) of the total HEC population (Su et al, 2021). In addition, some SEM and TEM analyses were performed only on LSECs, as well as the IF analyses. Therefore, we believe that LSECs play an important role in this process. Although not specifically suggested, we have also changed the title of our manuscript to reflect the reviewer's suggestion. Thus, we propose "Continuous sensing of IFNα by hepatic endothelial cells shapes a vascular antimetastatic barrier" as new title.

      • P. 11: there is a typo: Fig. Fig. S6G,H

      We corrected this typo.

      • P. 13: the authors describe Gata4 as inhibitor of subendothelial matrix deposition. This should be precisely written, since Gata4 originally is described as master-regulator of liver sinusoidal differentiation which leads to liver fibrosis development upon loss of Gata4.<br /> Besides, I came across a study of the same group that investigated the role of Notch signaling in hepatic CRC and melanoma metastasis (Wohlfeil et al, Cancer Res, 2019, https://aacrjournals.org/cancerres/article/79/3/598/638600/Hepatic-Endothelial-Notch-Activation-Protects). Similar to your study they tie the reduction in hepatic metastasis to capillarization of the hepatic microvasculature.

      We agree with this suggestion and modified text accordingly. We are also glad that our results agree with previous reported literature that has now been correctly cited at lines 351-356 and in the discussion lines 474-476.

      • The discussion reads like paraphrasing the results section. The manuscript would clearly benefit if the discussion section had been rewritten short and concisely.

      We agree with this suggestion, and we have modified discussion accordingly. We are also willing to shorten the discussion by removing the schematic model that could possibly be used as a graphical abstract.

      References

      Benechet AP, De Simone G, Di Lucia P, Cilenti F, Barbiera G, Le Bert N, Fumagalli V, Lusito E, Moalli F, Bianchessi V et al (2019) Dynamics and genomic landscape of CD8(+) T cells undergoing hepatic priming. Nature 574: 200-205

      Berg M, Wingender G, Djandji D, Hegenbarth S, Momburg F, Hammerling G, Limmer A, Knolle P (2006) Cross-presentation of antigens from apoptotic tumor cells by liver sinusoidal endothelial cells leads to tumor-specific CD8+ T cell tolerance. Eur J Immunol 36: 2960-2970

      Boukhaled GM, Harding S, Brooks DG (2021) Opposing Roles of Type I Interferons in Cancer Immunity. Annu Rev Pathol 16: 167-198

      Catarinella M, Monestiroli A, Escobar G, Fiocchi A, Tran NL, Aiolfi R, Marra P, Esposito A, Cipriani F, Aldrighetti L et al (2016) IFNalpha gene/cell therapy curbs colorectal cancer colonization of the liver by acting on the hepatic microenvironment. EMBO Mol Med 8: 155-170

      Chambers AF, Groom AC, MacDonald IC (2002) Dissemination and growth of cancer cells in metastatic sites. Nat Rev Cancer 2: 563-572

      Cortes E, Lachowski D, Robinson B, Sarper M, Teppo JS, Thorpe SD, Lieberthal TJ, Iwamoto K, Lee DA, Okada-Hatakeyama M et al (2019) Tamoxifen mechanically reprograms the tumor microenvironment via HIF-1A and reduces cancer cell survival. EMBO Rep 20

      Deneve E, Riethdorf S, Ramos J, Nocca D, Coffy A, Daures JP, Maudelonde T, Fabre JM, Pantel K, Alix-Panabieres C (2013) Capture of viable circulating tumor cells in the liver of colorectal cancer patients. Clin Chem 59: 1384-1392

      Engstrand J, Stromberg C, Nilsson H, Freedman J, Jonas E (2019) Synchronous and metachronous liver metastases in patients with colorectal cancer-towards a clinically relevant definition. World J Surg Oncol 17: 228

      Guidotti LG, Inverso D, Sironi L, Di Lucia P, Fioravanti J, Ganzer L, Fiocchi A, Vacca M, Aiolfi R, Sammicheli S et al (2015) Immunosurveillance of the liver by intravascular effector CD8(+) T cells. Cell 161: 486-500

      Horowitz M, Neeman E, Sharon E, Ben-Eliyahu S (2015) Exploiting the critical perioperative period to improve long-term cancer outcomes. Nature reviews Clinical oncology 12: 213-226

      Jeon S, Juhn JH, Han S, Lee J, Hong T, Paek J, Yim DS (2013) Saturable human neopterin response to interferon-alpha assessed by a pharmacokinetic-pharmacodynamic model. Journal of translational medicine 11: 240

      Katlinski KV, Gui J, Katlinskaya YV, Ortiz A, Chakraborty R, Bhattacharya S, Carbone CJ, Beiting DP, Girondo MA, Peck AR et al (2017) Inactivation of Interferon Receptor Promotes the Establishment of Immune Privileged Tumor Microenvironment. Cancer cell 31: 194-207

      Katz SC, Pillarisetty VG, Bleier JI, Shah AB, DeMatteo RP (2004) Liver sinusoidal endothelial cells are insufficient to activate T cells. Journal of immunology 173: 230-235

      Kuruppu D, Christophi C, Bertram JF, O'Brien PE (1998) Tamoxifen inhibits colorectal cancer metastases in the liver: a study in a murine model. Journal of gastroenterology and hepatology 13: 521-527

      Lalor PF, Shields P, Grant A, Adams DH (2002) Recruitment of lymphocytes to the human liver. Immunol Cell Biol 80: 52-64

      Lambert AW, Pattabiraman DR, Weinberg RA (2017) Emerging Biological Principles of Metastasis. Cell 168: 670-691

      Pandey E, Nour AS, Harris EN (2020) Prominent Receptors of Liver Sinusoidal Endothelial Cells in Liver Homeostasis and Disease. Front Physiol 11: 873

      Sallusto F, Geginat J, Lanzavecchia A (2004) Central memory and effector memory T cell subsets: function, generation, and maintenance. Annu Rev Immunol 22: 745-763

      Schreiber G (2017) The molecular basis for differential type I interferon signaling. J Biol Chem 292: 7285-7294

      Soares KC, Foley K, Olino K, Leubner A, Mayo SC, Jain A, Jaffee E, Schulick RD, Yoshimura K, Edil B et al (2014) A preclinical murine model of hepatic metastases. J Vis Exp: 51677

      Sorensen KK, Smedsrod, B. (2020) The Liver Sinusoidal Endothelial Cell: Basic Biology and Pathobiology. In: The Liver: Biology and Pathobiology, Sixth Edition pp. 422-434. John Wiley & Sons Ltd. :

      Stone JD, Chervin AS, Kranz DM (2009) T-cell receptor binding affinities and kinetics: impact on T-cell activity and specificity. Immunology 126: 165-176

      Su T, Yang Y, Lai S, Jeong J, Jung Y, McConnell M, Utsumi T, Iwakiri Y (2021) Single-Cell Transcriptomics Reveals Zone-Specific Alterations of Liver Sinusoidal Endothelial Cells in Cirrhosis. Cell Mol Gastroenterol Hepatol 11: 1139-1161

      Theeuwes F, Yum SI (1976) Principles of the design and operation of generic osmotic pumps for the delivery of semisolid or liquid drug formulations. Ann Biomed Eng 4: 343- 353

      Van Cutsem E, Cervantes A, Adam R, Sobrero A, Van Krieken JH, Aderka D, Aranda Aguilar E, Bardelli A, Benson A, Bodoky G et al (2016) ESMO consensus guidelines for the management of patients with metastatic colorectal cancer. Ann Oncol 27: 1386-1422

      van Gestel YR, de Hingh IH, van Herk-Sukel MP, van Erning FN, Beerepoot LV, Wijsman JH, Slooter GD, Rutten HJ, Creemers GJ, Lemmens VE (2014) Patterns of metachronous metastases after curative treatment of colorectal cancer. Cancer Epidemiol 38: 448-454

      Vidal-Vanaclocha F (2008) The prometastatic microenvironment of the liver. Cancer microenvironment : official journal of the International Cancer Microenvironment Society 1: 113-129

      Wang DS, Ohdo S, Koyanagi S, Takane H, Aramaki H, Yukawa E, Higuchi S (2001) Effect of dosing schedule on pharmacokinetics of alpha interferon and anti-alpha interferon neutralizing antibody in mice. Antimicrob Agents Chemother 45: 176-180

      Wohlfeil SA, Hafele V, Dietsch B, Schledzewski K, Winkler M, Zierow J, Leibing T, Mohammadi MM, Heineke J, Sticht C et al (2019) Hepatic Endothelial Notch Activation Protects against Liver Metastasis by Regulating Endothelial-Tumor Cell Adhesion Independent of Angiocrine Signaling. Cancer research 79: 598-610

      Yu X, Chen L, Liu J, Dai B, Xu G, Shen G, Luo Q, Zhang Z (2019) Immune modulation of liver sinusoidal endothelial cells by melittin nanoparticles suppresses liver metastasis. Nat Commun 10: 574

      Zhu Y, Karakhanova S, Huang X, Deng SP, Werner J, Bazhin AV (2014) Influence of interferon-alpha on the expression of the cancer stem cell markers in pancreatic carcinoma cells. Exp Cell Res 324: 146-156

      Ziv Y, Gupta MK, Milsom JW, Vladisavljevic A, Brand M, Fazio VW (1994) The effect of tamoxifen and fenretinimide on human colorectal cancer cell lines in vitro. Anticancer Res 14: 2005-2009

      Reviewer #2 (Significance):

      • Since liver metastases of various tumor are tremendously hard to treat and mediates therapy resistance, the authors focus on a very important field of research - prevention of liver metastasis formation.
      • This study adds insights into the mechanisms of action of IFNα1 in the hepatic microenvironment. It extends previous findings of Toyoshima who described anti-tumoral effects of IFNα1 released by dendritic cells in the liver.
      • The study is well designed and will be of great interest for the scientific community. Besides, it will be appreciated by physicians, However, as mentioned in the discussion, further clinical studies by physicians are needed to translate its findings into the clinic.
      • The author of this review works as physician and often deals with liver metastasis. It is one field of focus of her/his research.
    1. Author response


      • A comment on the overall organization of the paper. Figure 2 has a major location in the paper, but it seems that its main takeaway is that these MAPs aren't really involved in the main process this paper is probing. While these are important findings, it might be more satisfying to move some of the central results earlier.

      We agree that this figure displays mostly negative results. However, most work on anaphase B microtubule dynamics from our group and others has focused on the effect that motors and MAPs may have on microtubule dynamics (EB1 and kinesin-8 in budding yeast, klp9 in fission yeast). Therefore, we consider it is important to clearly show that previously proposed candidates are not required for the observed decrease in microtubule growth speed, prior to introducing the unexpected effect of the membrane.

      *A model schematic might drive home the main finding of the paper, and be particularly useful for readers who are not experts in microtubule or spindle dynamics. That said, the Discussion does an excellent job of summarizing the findings and explaining the takeaway message(s), even for the non-expert.

      We have added a model schematic and we have referred to it in the main text.

      Specific comments

      • ‘In higher eukaryotes’ - Suggest avoiding the terms higher and lower when describing organisms, and instead, directly defining which organisms, for instance in animals/metazoans that would be a better description.

      We have removed this terminology.

      • Figure 1 E-F - It is hard to see the difference in the distribution, maybe a different color could be used instead of stars.

      We have used a different color.

      • Figure 1 Data shown in pink in G comes from 832 midzone length measurements during anaphase, from 60 cells in 10 independent experiments - The pink here does not correspond to the pink coding in D, consider colour choice for clarity across panels.

      We have changed this.

      • Finally, yeasts undergo closed mitosis - How does this relate to the findings in the Dey paper (cited here) which shows it was somewhat semi-closed or semi-open. According to the Dey paper, the membrane disassembles locally twice, at the SPB and the bridge.

      Membrane disassembly at the nuclear membrane bridge occurs at late anaphase, and leads to the disassembly of the spindle, presumably by the action of cytoplasmic factors (Dey et al. 2020). We do not believe the membrane disassembly itself has a role in spindle elongation or microtubule dynamics, as when it happens the spindle is then disassembled. However, the fact that les1D reduces the decrease in microtubule growth speed associated with internalisation of microtubules in the nuclear membrane bridge suggest that the organisation of the nuclear membrane bridge required for its local disassembly at late anaphase might affect microtubule growth (see section “Formation of Les1 stalks […]”).

      • ‘vertical comets in kymographs (Fig. 1C) do not correspond to non-growing microtubules, but rather microtubules that grow at a speed matching the sliding speed’- For clarity, it might be nice to add: "(as the SPB moves away from the plus end in the kymograph)".

      We have included this useful clarification.

      • ‘significantly shorter than in interphase, where growth events last more than 120 seconds on average [42, 43]. Microtubule shrinking speed did not change during anaphase either (Fig. 1-Supplement 1D), and was on average 3.56±1.75 μm/min, also lower than in interphase (~8 min/μm)’ - This comment concerns the comparison of growth and shrinking rate as well as growth duration. The authors did not measure microtubule dynamics in interphase in this manuscript but compared their numbers to literature values. The comparison raises some questions for three reasons: 1) the microscopy method used is different in this paper and the two references provided, 2) the sample is mounted differently compared to the two references provided - 1) and 2) combined could lead to different levels of stress on the cells which could affect MT dynamics-, 3) (probably the most important caveat) the experiments are done at different temperatures: 27C in this paper versus 25C in the references provided. Microtubule dynamics are sensitive to temperature so this could explain part of the differences observed. Also, there are multiple values published for MT dynamics in interphase depending on the strain used and the microscopy method used. Suggest that the authors measure microtubule dynamics in interphase cells at 27C in SIM to ensure that the differences are not due to the technical parameters employed. Small item - should ‘8 min/μm’ read “8 μm/min"?

      We have measured microtubule growth speed and growth event duration using GFP-Mal3 during interphase and anaphase B in the same conditions as proposed (see Figure 1 – Supplement 2). Unfortunately, shrinkage speed cannot be measured using GFP-Mal3, so we cannot confirm that the difference between our measurements and the literature values would be observed.

      • ‘we observed two populations of microtubules (fast and slow growing)’ - Does this statement about thistle fast and slow growing populations refer to the data in Fig. 1C and 2A?

      Yes, we have added reference to this figures in the next sentence (mentioned below).

      • ‘In some cells, all microtubules seemed to switch to the slow growing phase simultaneously (Fig. 1C), while in others fast and slow growing microtubules co-existed (Fig. 2A)’ - This is a very interesting observation, could we know how many cells (%) were detected in each case? Is it that in 90% of the cells the switch is simultaneous, and hence the microtubule growth is somehow synchronized? Or is it more random, e.g. around 50%?

      This was just to point the reader to two kymographs and show that a clear point where all microtubules change speed is not present in all kymographs, as one may think from Fig. 1C. Later in the paper, we show that the change in growth depends on whether the microtubule rescue occurs inside or outside the nuclear membrane bridge, so it is a matter of where microtubules are rescued once the dumbbell transition occurs, which is a stochastic process. We have added another sentence pointing the reader to examples in the kymograph (see line 152, This representation captures…).

      • On such a plot, the data points visibly cluster in two separate clouds and the variation of growth speeds can be fitted by an error function (Fig. 1F)’ - It is unclear that there are two distinct clusters, maybe the assertion should be toned down, or some sort of cluster analysis provided.

      We acknowledge that the data is widely spread across the y axis, and given that the magnitude “distance to the closest pole at rescue” is continuous the transition is not a clear cut. However, we consider the fact that the averaged curve closely matches the error function fit to be sufficient evidence for the existence of two populations of microtubule growth. Additionally, R2 of the fit is ~0.5 indicating that half of the variance is explained by this model. In any case, we show later that these two populations do exist (Fig. 3D), and why plotting microtubule growth against distance to the closest pole at rescue is a good way to segregate them (Fig. 3E).

      • ‘speed of interphase microtubules (~2.3 μm/min)’ - It would be interesting to see the dynamics in a les1 mutant (Dey Nature 2020) paper. Just as a control for presence/absence of the bridge?

      We thank the reviewers for kindly suggesting this interesting experiment. We have included it after the ase1 section. Les1 forms stalks at the edges of the nuclear membrane bridge that restrict nuclear membrane disassembly to the center of the bridge at the end of mitosis (Dey at al. 2020). While les1 deletion does not prevent the formation of the nuclear membrane bridge, it has been proposed that Les1 stalks may constitute sites of close interaction between the nuclear membrane and the spindle. Therefore, these sites may influence microtuble growth. Indeed, we have found that removing these Les1 stalks by either deleting les1 or nem1 leads to a smaller decrease in microtubule growth speed when plus ends enter the nuclear membrane bridge (see section “Formation of Les1 stalks […]”)

      *‘Figure 2, Transition from fast to slow microtubule growth occurs in the absence of known anaphase MAPs’ - It looks like the overlap zone is larger on the mal3 kymograph. Is the size of the midzone changed in some of the mutants? It could be important to report. Related to it, is the spindle length changed in some of the mutants? (It does not look like it from the kymographs displayed).

      The midzone is indeed longer in mal3D strains, now this can be seen in Fig. 2 – Supp. 2 and it is mentioned in the main text in line 272. As for the spindle length, diverse kinds of alterations in spindle length have been previously reported for the mutants that we used in this study. For instance, ase1D /cls1off cells have shorter spindles at anaphase onset (Loiodice et al. 2005 and data not shown), and klp5Dklp6D have longer spindles at anaphase onset (Syrivatkina et al. 2013). klp9D / clp1D / dis1D cells have lower spindle elongation velocity and may not reach the wild-type spindle length by the end of anaphase (Kruger et al. 2019). Despite these differences, the decrease in microtubule growth as a function of distance to the closest pole has a similar tendency across conditions, suggesting that the mentioned differences in spindle length are unlikely to have an important effect.

      • Additionally, adding the data about rescue localization in the mutant (equivalent of Fig 1 G) would be interesting to better describe the role of these different proteins. Figure 2, Panel G to L - Could the authors indicate the value for the average +/- error in each bin for the WT and the mutants? Also, it is hard to say from the plots, but it looks like the WT average speed in the first bin is different in every panel, that would be good to know to have an idea of the reproducibility/variability.

      We have added a figure with the rescue distribution (see Fig. 2 – Supp. 2). This apparent difference in the wt speed in different experiments might have come from looking at normalised data. The new way of representing the data in fig. 2H and J shows that the microtubule growth velocity in the wild-type is very consistent across experiments. We have added a table with microtubule growth velocity values (Table 1), and the source data is available.

      • The dots making up the "thick lines" are centered on 1.5/2.5/etc.. in some panels (G and K) and centered on 1/2/3/etc.. the others (I,J,L). Could the authors provide some clarification?

      We have fixed this inconsistency across the paper.

      • Figure 3 - Can the authors indicate the average values +/- error for each of the distributions in Fig. 3D? Maybe on the plot itself, in the legend or as a table. This would make them easily available without having to infer them from the Y axis. This comment is also valid for Fig 4I and 4J.

      We have added tables with average values and confidence intervals in the appendix.

      • Figure 3E ‘Distance from the plus-end to the nuclear membrane bridge edge at rescue as a function of distance from the plus-end to the closest pole at rescue’ - The Y axis reads as "distance to the bridge edge" but it shows negative values, could this be "position to the bridge edge" instead? (same item throughout the text).

      We have fixed this.

      • Figure 3 ‘Number of events: 442 (30 cells) wt, 260 (27 cells) klp9OE, 401 (35 cells) cdc25-22, from 3 independent experiments’ - P values this small raise a concern. Presumably the number of degrees of freedom in the regression analysis should not exceed the number of independent experiments. Instead, the DoF listed under "error" in the analysis output is hundreds or thousands instead of 3. To address this, the regression analysis should use either the "Error" function in R or a linear mixed-effects model to account for the nesting of the repeated measurements within each independent experiment. Alternatively, it is also possible to just calculate summary means for each independent experiment, and calculate p values based on that N=3. See: Lazic. Experimental Design for Laboratory Biologists. p. 157. and the supplemental file of: https://doi.org/10.1371/journal.pbio.2005282 and the additional file 1 of: https://doi.org/10.1186/s12868-015-0228-5 and this for an alternative plotting approach: https://doi.org/10.1083/jcb.202001064 Recommend either recalculating the p values by one of the methods above or removing the reported p values from the paper. The large effects observed in many cases are self-evident without a significance metric, so eliminating the p values would be acceptable here. (This comment applies to other figures through the paper that report p values based on number of cells or number of measurements instead of number of independent samples/experiments.)

      We thank the reviewers for suggesting the improvements to the statistical analysis, as well as for pointing us to useful resources that described the statistical methods and their implementation in detail. We have followed Aarts et al. 2015 and used a linear mixed effects model (see Methods>Statistical Analysis)

      Due to the change in statistical analysis method, to show that some of the differences we had reported previously were significant, we included more cells in the analysis from our existing data. We did this for klp5Dklp6D kymographs (Fig. 2I and Fig.2 – Supp. 1). Spindle dynamics in ase1D (Fig. 5D and Fig. 5 – Supp. 1) and klp9D (Fig. 2 – Supp. 3 A, C). Cell length (Fig. 3 – Supp. 1A).

      For the same reason, we measured anaphase spindle elongation velocity (Fig. 3 – Supp. 1C) from kymographs instead of measuring them from the 1 minute interval movies that we had used previously (from Fig. 3 – Supp 1B). We have reflected this in the methods (see added text in line 800 and deleted text in line 809 in the document with changes highlighted).

      None of these changes has altered our conclusions.

      • Figure 4 - Nice experiment. It brings the question of how cell-shape affects all these dynamics (probably out of the scope of this work). But a for3 mutant for example?

      This is an interesting suggestion, to be tested in the future. Furthermore, we believe that nuclear shape should also have an important effect, since the spindle is confined inside the nuclear membrane. We would expect that mutants that perturb nuclear shape might have effects on microtubule growth. We have observed that the decrease in growth speed associated with internalisation of microtubules in the nuclear membrane bridge is reduced upon nem1 deletion, which increases nuclear membrane surface, and produces membrane ruffling (Fig. 4-Supplement 2). However, nem1 deletion also removes les1 stalks from the nuclear bridge (Dey et al. 2020). It would be interesting to find a perturbation of the nuclear membrane that does not remove the les1 stalks.

      • ‘Ase1 is required for microtubule growth speed to decrease during anaphase B, this is unlikely to be a direct effect’ - If it is unlikely to be a direct Ase1 effect is the title of the section accurate? "Ase1 is required for normal rescue distribution and for microtubule growth speed to decrease in anaphase B"

      Ase1 recruits multiple proteins to the spindle midzone, so the fact that ase1 deletion produces a given phenotype does not necessarily mean that this phenotype results from the absence of Ase1 protein activity. For instance, deleting ase1 perturbs rescue distribution, but it does not mean that Ase1 acts as a rescue factor itself, or at least to a relevant extent, given that deletion of cls1 completely prevents rescue, but ase1 deletion does not. In the discussion we propose some indirect effects of ase1 deletion that may produce this effect. In any case, upon more careful analysis we have found that ase1 deletion does not prevent the decrease in microtubule growth speed during anaphase B, but rather makes it smaller (see section “The decrease in growth speed associated with internalisation of microtubules in the nuclear membrane bridge is reduced upon ase1 deletion”).

      • Figure 5 - What about an ase1 lem1 double mutant?

      We suppose that the intended gene is les1. We have studied the effects of les1 deletion in the new version of the manuscript. However, we do not see the information we would obtain from a double deletion ase1D les1D.

      • ‘In summary, Ase1 is required for rescue organisation and for microtubule growth speed to decrease during anaphase B ‘- In this context it could make sense to discuss the observations from this paper (doi:10.1371/journal.pone.0056808) about the role of Ase1 ortholog's MAP65-1 in coordinating MT dynamics within bundles.

      In the mentioned paper, the authors showed that the presence of PRC1 (ase1 orthologue) in bundles increases microtubule rescue rate, and that it slightly reduces microtubule growth speed.

      We observe a small increase in microtubule growth speed throughout anaphase upon ase1 deletion (Fig. 5), which is consistent with the in vitro observation that PRC1 decreases microtubule growth. However, once more this might not be a direct effect of Ase1, since less Cls1 is recruited if ase1 is deleted, and Cls1 reduces microtubule growth speed (Fig. 2). In addition, this can also be a result of higher concentration of tubulin / MAPs resulting from less polymerised tubulin in ase1 deleted cells, which have less spindle microtubules on average.

      Regarding the increase in rescue rate produced by PRC1 in vitro, it is possible that Ase1 contributes to microtubule rescue in the spindle. However, given that no rescues occur upon inactivation of cls1 (Bratman et al. 2007), we believe Cls1 is the dominant factor, and Ase1 contribution is likely negligible.

      • ‘We initially set the microtubule growth velocity to 1.6 μm/min (early anaphase speed, Fig. 1F), and aimed to reproduce the experimental distribution of positions of rescue and catastrophe at early anaphase (spindle length < 6 μm’ - Kudos to the authors for detailing the model and its parameters in a way that even non-modelling experts can understand.

      Discussion - ‘Our data suggests that microtubule growth speed is mainly governed by spatial cues’ - Is it right to assume that in the cases where fast and slow growing microtubules were simultaneously observed, the fast microtubules were not/had not yet reached the midzone?

      Our data suggests that it’s not about being inside the midzone, but rather inside the nuclear membrane bridge formed after the dumbbell transition. We have elaborated more on this in the main text, pointing the reader to examples in the kymograph, and giving a quantitative argument for distance to the closest pole being a better predictor than anaphase progression or position with respect to the center (which is equivalent to distance to the midzone), see line 152.

      • Methods - ‘PIFOC module (perfect image focus), and sCMOS camera’ - Is this Nikon's "Perfect Focus" autofocus, or some other manufacturer's system? And back-thinned sCMOS.

      We have clarified this in the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      1) In terms of the prior hypothesis here I think the authors justify a prior with respect to striatum and I think the most principled analysis of their hypothesis would be based on volumes of interest in striatum. Figure 1 does show difference in MTsat in striatum between neurotypicals and DLDs but the changes are all in the caudate I think- I cannot see anything in putamen. The authors actually describe changes in only one part of anterior caudate. The authors do describe a number of previous conflicting studies that examine caudate structural changes but that is not their hypothesis. The discussion goes into developmental changes affecting striatum at different times that might be relevant and would require a longitudinal study for a definitive study - as the authors acknowledge.

      The reviewer is correct that at this statistical threshold we only observe MTsat differences in the caudate nucleus. Changes in the putamen did not survive this threshold. Lowering the threshold for MTsat (our maps are openly available on Neurovault), or an ROI analysis (see (https://osf.io/2ba57/)) does not reveal significant statistical differences in the putamen. As we noted in the paper, there are differences in the putamen in R1 (these are also observed in the ROI analysis).

      2) There is a lot of overlap between the caudate signal in the two groups - although the correlation of individual differences is reasonable. The caudate signal would not allow group classification.

      Yes, it is clear that these differences would not be sufficient to allow for group classification of DLD. We have discussed this overlap in the discussion.

      3) Outside of the caudate they do show changes in left IFG and auditory cortex that are hypothesised. But there is a lot else going on - I was struck by occipital changes in figure 1 which are only mentioned once in the manuscript.

      We now discuss these differences in the discussion. Note that we did not have any a priori hypotheses about these regions; to our knowledge, they have not been previously described and are not predicted by any theoretical accounts of DLD.

      4) Should I be concerned by i) apparent signal changes in right anterior lateral ventricle from group comparison in figure 1 ii) signal change correlation in right anterior lateral ventricle in figure 4 (slice 22) and iii) signal change outside the pial surface of the occipital lobe in figure 1?

      No – these may be accounted for by smoothing during analyses. Note, these changes at tissue boundaries are fairly commonly seen in statistical maps following smoothing but are not evident when data are projected onto a 3D surface.

      Reviewer #2 (Public Review):

      This work demonstrates the value that multiparameter mapping imaging protocols can have in uncovering microstructural neural differences in populations with atypical development. Previous studies looking at differences in brain structure have typically used voxel based morphometry (VBM) approaches where differences in volumes can be hard to interpret due to complex tissue compositions. The imaging protocol outlined in this paper can specifically index different tissue properties e.g. myelin, giving a much more sensitive and interpretable measure of structural brain differences. This paper applies this methodology to a population of adolescents with developmental language disorder (DLD). Previous evidence of structural brain differences in DLD is very inconsistent and, indeed, using traditional VBM the authors do not find a difference between children with DLD and those with typical language development. However, they provide convincing evidence that despite no macrostructural differences, children with DLD show clear differences in levels of myelin in the dorsal striatum and in brain regions in the wider speech and language network. This can help to reconcile previous inconsistent findings and provide a useful springboard for both theoretical and empirical work uncovering the nature of the brain bases of language disorders.

      We are grateful for these comments, and to the reviewer for pointing out some key strengths of this work.

      Strengths:

      The imaging protocol is robust and is explained very clearly by the authors. It has been used before in other populations so is an established method but has not been applied to populations of children with DLD before, yielding novel and very interesting results. The authors demonstrate that this is a methodology which could have great value in other populations that display atypical development, increasing the impact of these findings.

      The sample size is large for research in this area which increases confidence in the results and the conclusions.

      Rather than relying solely on group differences in brain microstructure to draw conclusions about neural bases of language development, the authors correlated brain microstructural measures with performance on standardised language tests, allowing stronger inferences to be drawn about the relationships between structure and function. This is often an important omission from developmental neuroimaging work. It gave increased confidence in the finding that alterations in striatal myelin are linked to language difficulties.

      Weaknesses:

      The authors rightly use the CATALISE definition of developmental language disorder, which differs from much of the previous literature by not requiring that children with language difficulties have nonverbal ability that is in the normal range. As can be common when using this definition of DLD, the group with DLD have significantly weaker nonverbal ability than the typically developing group. The authors show that brain microstructural differences correlate with language ability but they don't rule out a correlation with nonverbal or wider cognitive skills. Given the widespread differences in myelination across areas of the brain, including those that weren't predicted e.g. medial temporal lobe, it is plausible that perhaps some of the brain microstructural differences are not linked directly to language impairment but a broader constellation of difficulties. Some of the arguments in the paper would be strengthened if this interpretation could be ruled out.

      To rule out the effect of nonverbal IQ or wider cognitive differences, we have conducted stepwise regression analyses on the quantitative data extracted from the statistical cluster covering the caudate nuclei, assessing the influence of factors such as language proficiency, verbal memory and IQ. We find that language status accounts for the most variance, rather than nonverbal IQ or verbal memory (details are included in the paper).

      We also discuss this point in the discussion, pointing to the presence of co-occurring differences in DLD and how these might account for some of the broader group differences we observe.

      The authors acknowledge in the limitations section that their data cannot speak to whether brain differences are a cause or consequence of language impairment. However, there are some implied assumptions throughout the discussion of the results that brain differences in myelination have functional consequences for language learning. A correlation between structure and function does not indicate this level of causality, particularly in an adolescent population - function could just as easily have had structural consequences or environmental differences could have influenced both structure and function. In my view, the speculations about functional consequences of myelin differences are not fully supported by the data collected.

      The reviewer is correct in saying that the myelin deficit could be either a cause or a consequence of DLD or even that both are caused by a third factor. We specifically address this in the discussion section, and note a longitudinal analysis would be the best way to address this question. Indeed, R3 notes about our paper, “…it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences)”.

      The data suggest that there is much greater variability in left caudate nucleus MTsat values for the DLD group than the other two groups. The impact this may have on the results is not discussed in the interpretation and it is unclear whether this greater variability occurs throughout all of the key MPM measures for the DLD group.

      Thank you for raising this important issue. In figure 1, we only plot the MTsat values from the caudate nucleus for visualisation, and as you note, there we is a considerable degree of variability within the DLD group. However, and crucially, this difference would not influence statistical interpretation of our results. The whole-brain analysis used involves permutation testing, and is robust to a difference in group variability. However, the issue of variability within DLD is important and we now highlight this in our discussion, noting that not every child with DLD will have reduced striatal myelin. Indeed, this variability is even more evident in figure 4. An important challenge for future studies is to understand the link between striatal myelination and the spectrum of language variability.

      Reviewer #3 (Public Review):

      Developmental Language Disorder (DLD) is observed in children who struggle to learn and use oral language despite no obvious cause. It is extremely wide-spread affecting 7-10% of children, and extremely consequential as it persists throughout life and has downstream effects on reading, academic outcomes, and career success. A large number of prior studies have attempted to identify the structural neural differences that are associated with DLD. These have generally shown mixed results, but support a number of candidate regions including left hemisphere language areas (particularly the inferior frontal gyrus), and striatal regions that are possibly linked to learning. However, these studies have suffered from small sample sizes and conflicting results. Part of this may be their reliance on traditional voxel-based-morphometric techniques which estimate cortical thickness and gray matter density. The authors argue that these measures are biologically imprecise; gray matter can be thinner for example, due to synaptic pruning or increased mylenation.

      The authors of this study offer a powerful new tool for understanding these differences. Multi-Parameter Mapping (MPM) is based on standard MRI techniques but offers several measures with much greater biological precision that can be tied specifically to myelination, a key marker of efficient neural transmission. The test a very large number of children (>150) with and without DLD using MPM and show strong evidence for fundamental biological differences in these children.

      This study features a number of key strengths. First, at the level of neuro-imaging, the MPM technique is new in this population and offers fundamental insight that cannot be obtained by other measures. Indeed, the authors wisely use a traditional gray matter approach (voxel based morphometry) and find few if any differences between children with DLD and typical development. This offers a powerful proof of the sensitivity of this approach. Moreover, the authors analyze their data comprehensively, looking at two measures of myelin (MTsat and R1) and their convergence.

      However, at the most important level, I think structural approaches (like MPM, diffusion weighted imaging and so forth) offer tremendous promise for dealing with this as they avoid the ambiguity associated with interpreting functional MRI. Are children showing reduced BOLD because they are less good at language processing? Or do the differences in brain function cause poorer language processing? Structural approaches - and MPM in particular - offer tremendous promise as they unambiguously assess the fundamental neuro-biology.

      Beyond the neuro-imaging this study is also strong in their sample and the measurements of language. The sample size is very large and an order of magnitude larger than existing studies. It is well characterized, and the authors use a large set of well-motivated measures that capture the relevant dimensionality of language. Moreover, the authors treat language both as a clinical category and a continuous measure which is consistent with current thinking on the nature of DLD as potentially the low end of a continuous scale rather than a discrete disorder.

      Finally, the discussion of this paper for the most part does a good job of fitting these neurobiological findings into our broader understanding of DLD. It does an excellent job of mapping the observed brain differences onto functional differences in the child. Importantly, in doing this it does a very good job of avoiding the common trope of assuming neural differences play a causal role in DLD (when in fact, reduced atypical development could cause neural differences).

      We are very grateful for the reviewer for taking the time to read our work so closely and pointing out these strengths in the work.

      Despite these strengths, I have a number of substantive concerns that if addressed will improve the overall impact of this paper.

      First, as the authors are aware, there is a long running and active debate in DLD as to whether DLD is the tail end of continuous distribution of children or a unique disorder (Leonard, 1987, 1991; Tomblin, 2011; Tomblin & Zhang, 1999). The results here offer great promise for informing that debate. And in that vein the authors quite appropriately analyze their data in two ways: once using DLD as a categorical variable and once using continuous measures of language. However, they don't really attempt to wrestle with the differences between the model.

      We have now included a section on the implications of our results for DLD in the discussion.

      Second, I was a little surprised to see the authors highlight left IFG in the discussion to the degree they did. While there was clear evidence for reduced myelin there in the MTsat analysis, this did not hold up in R1 analysis, and even in the MTsat, IFG was clearly not the primary locus. Rather the areas of differences seemed to be centered at Pre- and Post-Central gyrus and extending ventrally (to IFG) and posteriorly from there. Given debate on the role of IFG in language specific processing in general (Diachek, Blank, Siegelman, Affourtit, & Fedorenko, 2020; Fedorenko, Duncan, & Kanwisher, 2013), it was not immediately clear to me why that area was important to highlight. For example, some of the posterior temporal areas (and motor areas) that were found were equally important for perceptual, lexical and phonological processing that are important for other theories of DLD.

      We do see group differences in left IFG in the R1 analysis (see Figure 2) and they were more extensive than those seen in the MTsat analysis with which they overlapped. The reviewer is correct that the differences were limited to the opercular part of the IFG in both analyses whereas they extended more dorsally in the R1 analysis. They also extended ventrally to the anterior insular cortex. We respectfully disagree with the reviewer about the importance of highlighting these differences, given the importance of this region for language processing, and our previous hypotheses about this region. Even so, we agree that the posterior temporal and motor areas are of equal importance and have highlighted these in the discussion.

      The authors rightly point to their differences in the striatum as supporting theories of DLD centered around differences learning. However, as they discuss, there are also large differences throughout the brain in both perceptual, motor and language areas. These would seem to support theories of DLD centered around processing and representation. In particular, the differences in myelination likely are linked to differences in the efficiency of neural coding. This would seem to favor two theoretical views that might be worth mentioning - speed of processing (Miller, Kail, Leonard, & Tomblin, 2001), and approaches based on lexical processing (McMurray, Klein-Packard, & Tomblin, 2019; McMurray, Samelson, Lee, & Tomblin, 2010; Nation, 2014). I was surprised these were not mentioned, given the clear link to the timecourse of processing. Does then suggest that these theories might complement each other? It would be useful to see some more discussion of the implications of these findings for broader theories.

      We have now incorporated mention of these theories in the discussion and discuss implications. We agree with the reviewer that it would be interesting to see whether the different theories could be reconciled.

    1. Reviewer #2 (Public Review):

      Suvorov and colleagues present a well-supported genome-scale phylogeny for 149 Drosophila species based on thousands of single-copy-orthologs. They then use several approaches to estimate the extent of introgression across the phylogeny, and report that it is common both recently and deeper in the past.

      The main strength of this paper is that it uses a scale of sequencing that allows an assessment of genus-wide trends with reasonably good power. It also presents two new analysis approaches, but these represent fairly minor modifications of existing techniques to suit multiple gene alignments, and unfortunately their reliability is not evaluated in this paper. Nevertheless, the main finding that introgression is common appears to be well supported. This finding echoes those of similar recent studies on taxa such as cichlid fishes and Heliconius butterflies. The different approaches used, and different levels of sampling in these different studies do not allow for quantitative comparisons, leaving us with the somewhat vague conclusion that introgression is 'common' in all of these taxa. Perhaps most critically, the present paper does not delve any deeper into the evolutionary impacts of introgression, nor the factors at the species or genomic level that might determine its frequency. Below I describe some areas of concern in more detail.

      1. Extent of introgression

      Perhaps equally as interesting as the frequency of introgression per species across the phylogeny is the proportion of the genome of each species that is affected. Without such estimates, the full extent of introgression is difficult to assess.

      2. Sampling effects

      Since this paper is attempting to make an (admittedly crude) estimate of the extent of introgression in the entire genus, some discussion is needed to address the possible consequences of the fact that only around 10% of species in the genus are represented. For example, if sampling is very even, perhaps most ancient events would be detectable, but more recent events may tend to be missed simply because the species involved are not sampled.

      3. Ancestral structure

      The reasoning provided for dismissing the possible effect of ancestral population structure is unconvincing. First, the authors argue that it "seems less likely" that non-sister taxa would have bred more frequently in the ancestral population. However, this is the entire basis of the problem: it might be unlikely, but it can happen. Eriksson and Manica (2012 https://doi.org/10.1073/pnas.1200567109) provided a very reasonable scenario in which colonisation of a new region can lead to this pattern.

      Second, the authors argue that QuIBL "should not be impacted by ancestral structure because this method searches for evidence of a mixture of coalescence times: one older time consistent with ILS and one time that is more recent than the split in the true species tree and that therefore cannot be explained by ancestral structure." This argument needs clarification. My understanding is that the split in the "true species tree" would also be inflated if there was ancestral structure.

      My view is that ancestral structure leading to discordance between gene trees and species trees is itself an interesting phenomenon. In some ways, it is not conceptually distinct from introgression occurring soon "after" speciation if we consider ancestral structure as the beginning of a continuous speciation process, so I don't think it would weaken the paper to accept this as a possible contributing process.

      4. Discordant count test

      The statistical analysis in the DCT accounts for multiple testing of many triplets for introgression, but there is no mention of the fact that these triplets are non-independent. It is not clear to me whether this makes the correction used more or less conservative.

      If there are any cases where the internal branch is long and the number of ILS gene trees is very small or zero, use of a chi-squared test may not be appropriate.

      5. Branch length test

      The authors acknowledge that the BLT is "conceptually similar" to that of Hahn and Hibbins 2019 https://doi.org/10.1093/molbev/msz178, but to me it seems that the only material difference is the statistical procedure for testing for an significant difference between branch lengths.

      An important consideration that appears to have been ignored is whether selection can impact the distribution of branch lengths, especially since many of the the BUSCO genes used here will be under strong selective constraint.

      6. Intra-locus recombination

      The paper needs to address the possible impact of intra-locus recombination on all of the introgression tests. For the DCT, I imagine that counts would be biased toward the species tree topology if the inferred trees span multiple distinct genealogies (see for example simulations by Martin and Van Belleghem 2017 https://doi.org/10.1534/genetics.116.194720 Figure S7). This might reduce test sensitivity.

      Similarly, for the BLT, I would expect that true introgression would be more difficult to detect in the presence of recombination. It is possible that the block jackknife procedure of Hahn and Hibbins (2019, https://doi.org/10.1093/molbev/msz178) may be more suitable than the comparison of distributions of point estimates for genes used here.

    2. Reviewer #3 (Public Review):

      The authors compiled a collection of published and newly sequenced genomes to assemble the largest collection of Drosophila genomes to date. Using this dataset they extracted a set of single copy orthologs to use for phylogenomic analyses, with a focus on estimating a time-calibrated phylogeny and introgression.

      This new dataset is a valuable resource that will serve the broader community of Drosophila researchers opening many new avenues for future phylogenomics research. The workflow of focusing on BUSCO genes for all comparative analyses is simple in a good way -- it is easy to understand how the data were collected and it should be easily reproducible -- which makes it easy to read past the genomics details and focus on the analyses of these data.

      However, I feel this is an important aspect of the paper that should receive more details, perhaps in the supplement. I may have missed it, but I could not find statistics about this ortholog data set. On average, how long is each locus, how many variable sites are there, how many taxa are missing data for any given locus due to paralogy? Do the BUSCO genes include both introns and exons? It is also unclear from the description exactly how the BUSCO genes were extracted from genomes. Are they extracted from the final assembled genomes, or do you perform variant calling after identifying them to call heterozygous site? If heterozygosity is excluded, how might this impact metrics such as the branch length tests, especially among close relatives? It likely impacts node age estimates as well?

      The authors use this dataset to infer phylogenetic relationships among taxa using both ML concatenation (IQtree) and a two-step MSC approach (Astral) which yielded quite similar topologies, and they examined the impact of filtering loci with treeshrink, which had minimal impact. This new topology represents a substantial step forward for understanding the relationships among major Drosophila clades.

      One of the main results of this study is a new set of node age estimates on the tree. For this they estimated branch lengths in mcmctree from a concatenated matrix of 1000 loci in the presence of fossil calibrations. The fossil calibration scheme selected as the best option includes three fossils, one dating the divergence at the split from mosquitos (uniform 195-230Ma) and two ingroup calibrations (U(43,64) and U(15,43)). To me, the credible intervals on node ages seem incredibly narrow. The authors mention this as an improvement compared to earlier studies, but they also mention later that the total amount of sequence data does not greatly impact node dating. So I'm a bit confused why the node ages are expected to be more accurate here. It seems to me that time calibrations should be most accurate when the greatest number of fossils are available, and when very appropriate Bayesian priors on set on the analysis. The effect of sequence variation is then relatively small. But here there are very few fossils, one of which is hugely distant, and so I would not expect highly precise age estimates. So I guess my question to the authors is, what do you think is going on here? Perhaps further description in the supplement of how the mcmctree method implemented here differs from traditional node dating done in a program like BEAST would help to clarify.

      Considering that this paper aims to infer the new best time calibrated tree for the Drosophila community, I think that the current description of fossil calibration schemes, which primarily refers to other publication names in the supplement, is insufficient. Which fossils are used in those studies, are you using those fossils as calibrations here, or are you implementing secondary calibrations based on their phylogenetic results? The reader should not have to read every one of those papers to understand the basis of the calibrations in this paper.

      Fig.1 shows nodal age posterior probabilities. Are these 95% confidence intervals? The taxon labels are too small in this figure, both on the large tree and especially in the inset figure. The legend refers to fossil taxon names used for calibrations, but because it is still unclear to me where the fossils are placed on the tree. Are the calibrations indicated somewhere in the figure?

      The authors demonstrate evidence of introgression by showing mostly overlapping evidence from two different types of tests. Together, these tests show that most major clades contain significant imbalanced discordance in gene tree counts or branch lengths. The taxon labels in Figure 2 are unfortunately quite unreadable, especially the matrix labels, which makes it difficult to interpret.

      I do not see a reason for presenting new names and acronyms for the introgression tests used in this study. The "DCT" is described as being similar to a suite of existing tests which are also based on comparison of rooted-triplet gene tree frequencies. These methods have been presented in many frameworks (BUCKy, D-stat, f4, etc.) and the only difference here seems to be the precise method used to determine significance. Similarly "the BLT is conceptually similar to the D3 test" could be replaced by just saying we implemented the D3 test which we refer to here as a 'branch length test (BLT)' to clarify that you have not in fact created a new test (e.g., you say "The first method we developed was the discordant-count test...")

      I am not very satisfied with the estimates of the "upper bounds" of introgression used here. It seems that there could possibly be many ways in which admixture edges could be drawn on the tree to explain the matrix of significant test results, and it is better to let formal network inference methods (e.g., SNAQ, Phylonet) infer these edges rather than guess at their placement. The current approach of "placing introgression events between pairs of branches for which most descendant extant taxa show evidence of introgression" leaves significant room for subjectivity.

      The authors did implement phylonet, but not very exhaustively. Why only fit a single edge on the tree instead of multiple? The authors state "networks with more reticulation events would most likely exhibit a better fit to observed patterns of introgression but the biological interpretation of complex networks with multiple reticulations is more challenging". I don't think this type of result is any more complicated to understand than the current approach used by the authors of drawing edges manually. And it is much less subjective. The authors say that it is computationally intractable, and this may be true for clades above ~15 tips, but testing on smaller trees by subsampling 10-12 tips seems feasible. From my experience network inference using pseudo-likelihood methods in SNAQ or phylonet takes a few minutes to fit 1 edge, and a few hours to fit 2-3 edges.

      Currently the two major results of the paper seem disjointed. The authors infer a time-calibrated tree, and they infer introgression events, but there is not much connection between the two. I applaud the authors on one hand for being cautious in interpreting their "upper bounds" of introgression to say too much about when they think introgression has occurred in the context of the time-calibrated tree. I think there is insufficient confidence in the introgression timing estimates to do that. But, what about the inverse relationships? Does this extent of introgression across the tree impact your confidence in the estimated timing of divergence events? One expectation would be that it is biasing all of the divergence times to appear younger. See my suggestions for addressing this.

      Overall, this study presents an impressive new dataset and important new results that greatly impact our understanding of the evolutionary history of Drosophila. Although the estimates of node ages and introgression events may be imperfect, they are clearly a step forward. It is clear from these results that introgression has occurred throughout the history of Drosophila, and this study paves the way for further investigation of these patterns, as the authors propose in their conclusions.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their careful and constructive analysis of our work. Our manuscript aims to exemplify the use of cryo-soft-X-ray tomography (cryoSXT) as a technique to study the dynamic changes to host-cell morphology that accompanies virus infection. This emerging method has several strengths when compared to other ultrastructural analysis techniques. Specifically, cryoSXT does not require the addition of contrast agents and therefore samples can be prepared via plunge cryopreservation alone, allowing us to capture them in a near-native state. Furthermore, the penetrating power of soft X rays and large field of view in cryoSXT allow rapid data acquisition, facilitating quantitative analysis of 10s to 100s of individual cells. We combined high-throughput cryoSXT data collection with semi-automated tomogram segmentation and fluorescence cryo-microscopy to study a recombinant herpes simplex virus (HSV)-1 that produces a pattern of fluorescence indicative of the stage of the infection in a single cell (‘timestamp’ HSV-1) and quantitatively monitored changes in lipid droplet, vesicle and mitochondrial morphology as HSV-1 infection progresses. In response to the reviewers’ comments, we have expanded our analysis of lipid droplet morphology, identifying a transient increase in the size of lipid droplets at early stages of HSV-1 infection, and completed additional fluorescence microscopy analysis to support our statements about the changes to microtubule, mitochondrial and Golgi morphology that accompany infection. Furthermore, we have included additional discussion on the relative merits of cryoSXT versus other ultrastructural analysis techniques like transmission electron microscopy, electron cryo-microscopy and electron cryotomography. We believe that our study serves as a powerful example of how cryoSXT can be used for quantitative cell biology and will be of broad interest to an audience of cell biologists and colleagues who study infection processes.

      1. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors have performed an explorative study, investigating morphological changes that occur in cells upon infection with Herpes Simplex Virus 1 (HSV-1) by the use of cryo soft X-ray tomography (cryoSXT). cryoSXT is an emerging technique for imaging of biological material, that allows for 3D imaging of significant volumes of cells under near-native conditions, without the need for sectioning or sample preparation other than rapid freezing. Reference (Groen et al. 2019) provides a nice list of examples from various biological samples. By the use of cryoSXT, the authors confirm findings that they have previously published by use of light and expansion microscopy (ref 16 from manuscript), namely an enrichment of small vesicles close to the nucleus and elongation and branching of mitochondria into interconnected networks in infected cells.

      Infection experiments were done in two different cell types in this study (HFF and U2OS), and a timestamp reporter virus that allows to distinguish between early and late stages of infection was used to provide more context to the observed morphological changes in the cells.

      Major comments

      It is a bit difficult to follow the main message throughout the manuscript, as the topics brought up in the introduction, results and discussion sections are not very coherent. The introduction gives some background on the virus and the timestamp reporter system, and further focuses on cryoSXT as a method and how this can overcome sample preparation artefacts that might be introduced by chemical fixation and sample processing. The results do not contain any direct comparisons between cryoSXT and other methods or sample preparations (light microscopy or EM-based), and the discussion only to a small extent comes back to the advantages brought by cryoSXT compared to other methods. Rather the discussion largely revolves around the possible involvement of microtubules in generating the observed morphological changes, and the possible meaning of elongated mitochondria in infected cells. Both of these topics are barely introduced, and not at all experimentally interrogated in the case of microtubules. There is also some discussion about Golgi fragmentation, although this is also not directly interrogated by cryoSXT in the current manuscript.

      We thank the reviewer for these comments. We have: - Updated the introduction to enunciate more clearly the aims of our study - Included a substantial comparison of the relative merits of cryoSXT versus other ultrastructural analysis techniques (TEM, cryoEM and cryoET) in the discussion - Updated the introduction to introduce the concepts of microtubule and mitochondrial morphology changes during infection that are covered in depth in the discussion - Included additional microscopy experiments, including super-resolution structured illumination microscopy (SIM), to demonstrate the changes in Golgi (Figures 6 and 7), microtubule (Figure 8) and mitochondrial (Suppl. Figure 4) morphology that accompany HSV-1 infection. These additional experiments support the hypotheses presented in the submitted manuscript, namely that microtubule organising centres are disrupted, Golgi membranes dispersed, and mitochondria redistributed as HSV-1 infection progresses.

      The authors perform imaging with a 40nm or a 25nm zone plate, where the 25nm zone plate provides improved resolution of a smaller volume compared to the 40nm zone plate. The authors do not really make use of the improved resolution offered by the 25nm zone plate in the results, so the motivation for turning to this (and therefor also changing cell line) is a bit unclear. The reason for the U2OS cell line to better preserved during X ray imaging is also not discussed, maybe it has to do with the thickness of the cells (as the U2OS cells are very flat). Furthermore, images from the 25 nm zone plate are not compared side by side to neither the 40nm zone plate nor standard TEM, which makes it hard to judge what the increased resolution really brings.

      Only one zone plate can be installed at any one time in the microscope and altering the zone plates requires extensive hardware changes that are outside the control of beamline users. We agree that this was not clearly discussed in the text. We have included additional text in the results (lines 207–208) and methods (lines 633–638) explaining this operational limitation and clarifying which zone plate was used for which experiment. In this study we observed that tomograms acquired with the 25 nm zone plate did not provide significantly more biological information than with the 40 nm zone plate, and thus both are suitable for characterisation of overarching cellular ultrastructural changes that accompany infection. We have added a sentence to this effect to the discussion (lines 410–412). Like U2OS cells, HFF-hTERT cells are also very flat. They appear more robust compared to HFFs when used for protracted exposures to soft X-rays and less likely to suffer from heat deposition after an extensive data collection round. We can speculate at this point that this could conceivably be due to the particular chemical composition of the intracellular environment in different cell lineages but it is impossible to offer anything other than speculation and therefore we have refrained from commenting further on this in the manuscript.

      The switch from a 40 to a 25nm zone plate required a switch in the model system, as mentioned above. The chosen cell types are not linked to biological relevance however (neurons and epithelial cells are mentioned as relevant cell types in the introduction), and it is therefor a bit unclear what the relevance is of keeping results from both cell types and comparing the two, rather than sticking to the one that works with cryoSXT. The results from the U2OS cells could still be compared by LM to the HFF cells if this contributes to the aim of the study.

      U2OS cells were chosen because they have been used previously for studies of HSV-1 infection (references 55–56) and are known to be well suited to cryoSXT analysis (references 32–33). We have added a sentence to this effect to the results (lines 208–211).

      The distribution of the viral proteins of the timestamp reporter virus is used to categorize infected HFF cells into 4 infection stages. In the U2OS cells the protein distribution is a bit different, which only allows them to be categorized into early (stage 1+2) and late (stage 3+4) stage of infection. Although this is what the authors state in the text, all 4 stages are included in Fig.2 for the U2OS cells, so it is not clear how this subdivision is performed and it does not seem like an accurate representation of the data. Furthermore, the uninfected population is not included in the timecourse, and there is not really a gradual change in infection states over the different timepoints as one could have expected. Therefor it is a bit hard to see the relevance of the timecourse. In the paper where the reporter virus is published (ref 16), shorter infection times were used, which leads to a more gradual change in infection stages.

      We thank the reviewer for pointing out these omissions. We have updated Figure 2A to only show the categories early (stage 1+2) and late (stage 3+4) for the U2OS cells. Furthermore, we have repeated the infection time course experiment, quantitating uninfected cells in addition to infected cells and including additional time points (2-, 4- and 6-hours post-infection). This new data (Figure 2B) demonstrates that the temporal profiles of infection progression are similar in HFF-hTERT and U2OS cells. Furthermore, it supports our choice of 9 hours post-infection as a suitable time point for plunge freezing of samples in order to obtain a mixture of cells at early and late stages of infection.

      There is a lot of importance given to the morphological changes of mitochondrial networks in infected cells. However, the quantification represented in Fig.5B is a bit unclear. The mitochondria are classified into different groups, but there is no specific description of the definition and cutoff values of each group. The name of some groups is also confusing, such as "short and long" mitochondria. Furthermore, there are large differences between replicates (suppl. fig. 2). The authors state that some mitochondria are swollen, which they interpret as a sign of apoptosis. They find these swollen mitochondria in 75% of the tomograms of uninfected cells in replicate number 3. If this is indeed cell death this replicate is not healthy.

      We apologise that the categorisation of mitochondria was not sufficiently clear in the submitted manuscript. The categories were percentage of tomograms that had the different mitochondrial morphologies present, not percentages of mitochondria. Thus, tomograms with both short and long mitochondria were classified as “short and long”. We have re-generated Figure 5C and Suppl. Figure 2C as a Venn diagram to illustrate this point more clearly. We have also updated the legend of Figure 5C (lines 845–850) to state clearly that the diagram shows percentage of tomograms with the relevant mitochondrial morphologies. The categorisation was performed manually and we have included examples of each category in Figure 5A. Manual classification can be subjective but, given the large number of tomograms analysed and the clear distinction between morphology in uninfected vs early- and late-stage infected cells, we are confident that our results are robust. We note that we have deposited all of the source tomograms in the Apollo repository at the University of Cambridge (https://doi.org/10.17863/CAM.78593); the data we used for this analysis are thus freely available for inspection and re-analysis by interested colleagues. We note that the swollen mitochondria were observed in multiple samples of uninfected and infected cells. This suggests that, regardless of infection, this is a common phenotype of U2OS cells. Others have observed this morphology by EM in the context of apoptosis and suggest it may represent porous mitochondria (reference 61). Although the proportion of tomograms containing these swollen mitochondria were higher in the uninfected sample of replicate 3, the other 25% contained typical mitochondrial morphologies that we could include in our analysis. The presence of inter-cell morphological variability such as this highlights the importance of imaging multiple cells within a population and performing several distinct biological replicates, as we have done in this study, to ensure project-relevant information is captured and delineated from the background structural variability inherent within a cell population. Previous cryoSXT studies had observed (but did not specifically comment on) a similar swollen mitochondrial morphology (reference 59). However, out of an abundance of caution we excluded all tomograms with swollen mitochondria from our analysis of mitochondrial branching (Figure 5C). Moreover, Tukey tests were performed per replicate for each pair of conditions in Figure 5C and statistical significance was reported only if it was observed independently in all three replicates. We are thus confident that any sampling error in replicate 3 that may arise from excluding tomograms will not have meaningfully altered our conclusions.

      Minor comments

      Results section 1, line 115-117: Where the authors state that it is unclear whether "naked" HSV-1 capsids would be visible by cryoSXT, it would be useful to refer to literature where these are observed by TEM, or to compare to TEM in their own experiments.

      We have included references to previous TEM studies in the results (lines 128–129), as requested. However, we note that TEM and cryoSXT are fundamentally different as TEM uses contrast agents whereas contrast in cryoSXT arises from differential elemental densities (in particular the density of oxygen versus carbon or phosphorous). We have updated the results (lines 129–131) to clarify this point.

      Results line 143: The authors state that it's hard to observe the perinuclear viruses with TEM, but there are several examples of this in the literature that could be referenced, e.g. (Skepper et al. 2001; Leuzinger et al. 2005; Baines et al. 2007; Johnson and Baines 2011), although this does not mean that they are not hard to find or that 3D is not advantegous.

      We thank the reviewer for these references and we have added them to the manuscript.

      Fig.4: It is unclear why all the vesicles are open-ended

      This is due to the differential path-length of carbon rich (and thus high contrast) membrane traversed by the X-rays for the membranes normal or parallel to the incident X-ray beam. We have clarified this point in the results (lines 290–301).

      Some places in the manuscript PFU per cell is used, other places MOI

      Thank you for pointing this out. For consistency, we have changed all instances of PFU per cell to MOI.

      If some specific adjustments to the methods had to be implemented for bio safely reasons (virus work), this should be stated in the methods.

      We have added a section on biosafety measures to the methods (lines 562–568).

      Access to the synchrotron should also be described

      We have expanded the synchrotron access attribution the Acknowledgments section (lines 737– 738).

      Discussion line 320: "consistent with previous research" - there is a reference missing.

      Thank you for spotting this. We have now added the reference.

      The quantifications are based on a limited number of tomograms, but there is no statement as to how the specific tomograms were selected. With a variability between replicates and tomograms, a random selection is important.

      We included all tomograms collected for the relevant experimental condition in all our analyses unless otherwise stated. For the vesicle segmentation we chose four reconstructed tomograms from each condition at random (lines 690–691). For lipid droplet volume analysis and mitochondrial branching analysis we included all tomograms that matched our quality-control criteria. We have added a few sentences to the Segmentation and Graphs and Statistics sections of the methods (lines 691–694 and 724–733) describing our selection criteria for the lipid droplet, vesicle and mitochondrial branching analysis, respectively.

      If gold fiducials are visible in the tomograms it could be useful to indicate, as they can look similar to lipid droplets to a non-expert reader.

      We have indicated gold fiducials Figure 1 H, the only figure in which they are visible, with a gold star as requested.

      Suppl. Fig.2: For clarity it would be good not to use the same color arrows to indicate different things in A and B.

      Suppl. Figure 2B has been removed in response to another reviewer request.

      Reviewer #1 (Significance):

      The authors of this study demonstrate that cells infected by HSV-1 virus can be investigated by the use of cryoSXT, and use this to show that infected cells have more elongated and interconnected mitochondria, and an enrichment of small vesicles close to the nucleus. They thereby also show that cryoSXT offers a nice resolution for characterizing morphological changes in significant volumes of near native-state cells, and that the method offers a promising throughput for screening of large amounts of cells. However, the study does not really present new biological or technical advances compared to previously published literature, see e.g. Müller et.al. 2012, Duke et.al 2014, Perez Berna et.al. 2016, Groen et.al. 2019, Weinhardt et.al. 2020, Loconte et.al. 2021 (not cryo but demonstrates the advantage of capillaries), Kounatidis et.al. 2020, Scherer 2021 (ref 16 from paper), some of which are also referenced in the current study. The study could thus have profited from a more defined focus and possibly further experiments (live-cell imaging, CLEM, TEM, microtubules or more mechanistically focused) depending on the main interest of the authors. The advantage with the current broad focus (assuming that the main concerns are addressed) is that the study could interest a larger audience, ranging from virology, cell biology and immunology to microscopy and methods development.

      We thank the reviewer for recognising the broad audience that will be interested in our manuscript. We believe that our analysis highlights the broad applicability of cryoSXT for analysing cell ultrastructure and changes that occur in response to infection. Furthermore, we think that our use of robust numerical analysis to quantitate the phenotypes we observe highlights the strength of cryoSXT as a high throughput technique for ultrastructural analysis. Our study is the first to investigate HSV-1 infection using cryoSXT and, in addition to confirming previous ultrastructural changes observed using other methods, we present new biological insight in organelle architecture and distribution such as that lipid droplets undergo a transient size increase during early stages of infection. We believe that we have demonstrated the robust utility of cryoSXT as a tool to study ultrastructural changes in response to insults, such as infection by intracellular pathogens, and hope that our manuscript will act as inspiration for others seeking to use cryoSXT to image cellular ultrastructure.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors use soft X-ray tomography to examine cell structure following infection by herpes simplex virus-1 (HSV-1). This imaging method can provide 3D images of cryo-preserved intact cells without chemical fixation or staining. The authors find several morphological differences between uninfected and infected cells, including changes in the number and size of vesicles and in the size and shape of mitochondria.

      This is a well-done study with careful and extensive analysis that in general produces convincing images to support the authors' conclusions. The procedures are clearly described and reproducible, and the authors have examined an impressive number of images and have performed appropriate statistical analyses.

      We thank the reviewer for their positive comments.

      I had two comments / suggestions regarding the findings about changes in morphology after infection. First, in the Discussion, the authors consider the possibility of Golgi fragmentation. Can the authors test this by counting Golgi before and after fragmentation?

      We did not frequently observe well-defined Golgi apparatuses in our tomograms, consistent with previous cryoSXT studies (reference 61). We therefore performed new experiments using SIM microscopy to demonstrate the disruption of Golgi apparatus and trans-Golgi network in fixed U2OS cells stained with the markers GM130 and TGN46, respectively. These new results are presented in Figures 6 and 7 and in the results (lines 342–355).

      Second, in the Results the authors report that they did not observe a change in lipid droplets after infection. However, the late-stage image in Fig. 5A seems to show such a change, with the lipid droplets becoming larger and darker relative to the early stage or uninfected cells. Maybe this is just the particular image that was selected, but perhaps it is worth looking at more images by eye just in case the segmentation procedure somehow missed this change.

      We thank the reviewer for suggesting we re-visit the properties of lipid droplets. Based on this suggestion we segmented the lipid droplets from 94 tomograms and found a robust change in the median volume of lipid droplets at early stages of infection. We have included this new data in Figure 4C, Suppl Figure 2 and the text of the results (lines 302–312). The observation that lipid droplet volumes change is particularly interesting as another group recently observed similar changes in lipid droplets in response to HSV-1 infection of astrocytes and they postulate that this may modulate the cellular immune response (reference 85). Our data support and extend their conclusions, as described in the discussion (lines 476–494).

      Minor comments:

      Line 127 - As I understand it, the alignment by fiducial markers corrects primarily for small inaccuracies in tilting of the stage. Hopefully there are not significant vibrations in the microscope because this would also lead to loss of resolution during the exposure of each tilt angle.

      Thank you, we have corrected “vibrations” to “small inaccuracies in tilting of the microscope stage”.

      Line 145 - "electron light" Is this common usage? To me it seems more accurate to just say electrons because light to me means photons.

      Thank you, we have corrected “electron light” to “electrons”.

      Line 390 - detection OF ("of" is missing)

      Thank you, we have made the correction.

      Line 564 - Fig. 2 legend. "partial retention in the nucleus of U2OS cells". I am not sure where the nucleus is in the images. To me, it looks like there is almost no stain for ICP0 in hTERT at stage 1 and stage 3, and then cytoplasmic stain at stage 2 and stage 4. In contrast, for U2OS, the stain looks mostly nuclear until stage 4 when it is partially cytoplasmic. This all needs to be better explained, and perhaps arrows added to the images such that the reader does not have to guess.

      We agree and have added a silhouette around each nuclei in Figure 2 to make this clearer. We have also added arrows to indicate the gC-mCherry enriched juxtanuclear compartment in cells at stage 3 (HFF-hTERT) or a late stage (U2OS) of infection.

      Line 585 - The authors could consider rotating the images by 180{degree sign} in panel A (late) in order to maintain the same orientation of nucleus and cytoplasm. This would make it easier for readers to see the point.

      Done as requested.

      Line 614 - I could not find the length of the scale bar in the legend.

      We apologise for omitting this – is has now been added.

      Reviewer #2 (Significance):

      The significance of the study is two-fold. First, it is a nice technical demonstration of what can be accomplished using soft X-ray tomography. I am qualified to evaluate this, since my expertise is in biological applications of this technique. The second significant aspect of the study is the demonstration of morphological changes in mitochondria and vesicles. I am not a virologist, so I do not know the literature on this point with regard to virus infection, but I find it interesting that the authors were able to detect such changes.

      We thank the reviewer for their positive assessment of our work.

      I believe the authors should cite a couple of papers:

      10.1016/j.cell.2015.11.029 which looks at HSV infection and reports viral particles between the inner and outer nuclear membrane.

      We have included a citation to this work as requested (lines 162–165).

      10.1016/j.jsb.2011.11.025 which also reports nuclear membrane separations or bulges by soft X-ray tomography.

      We have elaborated on this section and incorporated the reference as requested (lines 265– 276).

      Regarding these nuclear membrane bulges, there are a number of papers that show they can also arise from mutations in nuclear-lamin associated proteins like nesprin and SUN (see for example https://doi.org/10.1093/hmg/ddm338). This is perhaps something interesting for the authors to think about, but not necessary for the current manuscript.

      Thank you for this comment. We did consider studying the breakdown of the nuclear lamina during HSV-1 infection, as this has been shown in previous studies [e.g. 10.1101/2021.06.02.446771]. However, we could not robustly resolve the nuclear lamina from the nuclear envelope in uninfected cells. The nuclear lamina is quite thin (30–100 nm in width) and this may have confounded its identification.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Nahas et al. describes the structural studies performed in U2OS cells infected with a recombinant HSV-1 virus that enables tracing the stage of the infection using fluorescent markers. This system was used to determine major structural changes in HSV-1 infected cells using cryo-soft X ray tomography (cryo-SXT) on near native-state samples. The data presented complement previous studies (particularly ref.16) using similar reagents but different microscopy techniques. While the data are generally well presented and discussed, they do not provide any substantially novel information on the structural changes in HSV-1. Nevetheless, they constitute an interesting technical achievement.

      We thank the reviewer for supporting the technical quality of the analysis. In response to the comments of another reviewer we have extended our analysis and documented new biological information for this system relating to lipid droplet re-shaping and distribution in response to HSV-1 infection; all our new findings are included in the updated manuscript.

      Major comments:

      There are no major concerns on the data, although some of the statements could be revised for a more realistic interpretation of the results.

      • In Figure 1F and lines 152-156 it is stated that a bulging of the nuclear envelope occurs around some of the putative particles, while in lines 243-244 and lines 625-628, it is stated that bulging occurs both in mock and infected cells. This should be clarified to avoid confusion. It is possible that authors differentiate both situations and this should be more clearly stated.

      Many thanks for identifying a possible area of confusion. We have updated the results to clearly distinguish the expansion of the perinuclear space that accompanies virus nuclear egress (lines 160–175) from the bulges of the nuclear envelope that are observed in uninfected and infected cells (lines 265–276).

      • The statistical tests are different for different hypothesis testing throughout the manuscript. The authors should justify in the methods section the use of one or another test. This will contribute to clarity in the hypothesis that is being test and will clarify the reason for the selected test.

      We have significantly expanded the Graphs and Statistics section of the methods (lines 703– 734) to further justify the statistical tests used throughout our study.

      • Sentence: "Our observation..." in lines 349-352. Even though the sentence is in the Discussion it is wildly speculative. The authors could use different approaches to tackle experimentally the question of whether active fusion or faulty fission is involved, but this is not the main subject the manuscript. Please revise the sentence or address experimentally, this would provide new insight into the impact of HSV-1 infection on mitochondrial network morphology. This sentence could be qualified as "speculative".

      We agree that this section of the discussion strayed into speculative territory and have removed it from the updated manuscript.

      • Although ref.16 provides evidence supporting Golgi fragmentation and mitochondrial elongation after HSV-1_timestamp virus infection in HFF cells, it would be important to show confocal microscopy data in U2OS cells, which were used for cryo-SXT, particularly since the authors refer differential virus kinetics and subcellular distribution of viral antigens in these cells. These would greatly contribute to support the statements regarding these two phenomena. It is very likely that the authors already have the data and could easily show them.

      We have included new microscopy experiments to demonstrate changes in mitochondrial (Suppl. Figure 4) and Golgi (Figures 6 and 7) morphology that accompany HSV-1 infection, and these new experiments are now included in the results (lines 335–310 and 342–355).

      -Line 269: Apposition of lipid droplets and mitochondria is not thoroughly described. This statement requires quantitation. Optimally, confocal imaging using Mitotracker and bodipy493/503 or superresolution imaging using specific antibodies may also contribute to strengthen the statement.

      We agree with the reviewer that we do not at this stage have adequate data to support this assertion and have therefore removed it from the manuscript.

      • It would be of great interest to document the budding events observed by cryo-SXT using higher resolution techniques and the kinetic resolution provided by the fluorescent infection fiducials. This would confirm the nature of the particles (using immunogold) and would demonstrate the the usefulness of the cryo-SXT data. This by itself would justify the use of cryo-SXT to temporally locate events that are difficult to visualize otherwise (as stated by the authors).

      We agree with the reviewer that a correlative imaging strategy involving cryoSXT and fluorescence microscopy could aid in identifying features of infection, and have highlighted this interesting future direction in the discussion (line 406–409). However, performing such analysis will be a substantial experimental commitment in its own and is outside the scope of our current manuscript.

      Minor comments:

      • Given that the software used for segmentation (Contour) is not published, a minimal comparative description between manual and semi-automated segmentation may be shown in the supplementary, to illustrate the robustness of the new method and the reliability of the measurements.

      We have now published a preprint (recently accepted in the journal Biological Imaging) that describes Contour in detail, which we have referenced in the updated manuscript: Nahas, K. L., Ferreira Fernandes, J., Crump, C., Graham, S. C. & Harkiolaki, M. (2021) Contour, a semi-automated segmentation and quantitation tool for cryo-soft-X-ray tomography. http://biorxiv.org/lookup/doi/10.1101/2021.12.03.470962

      • Lines 278-280: statistical test and p value are not shown.

      We have updated the text to include details of the statistical test and p value as requested (lines 326–330 of the updated manuscript).

      • After line 376: It would be interesting to mention that transient elongation of mitochondria is observed during dengue virus infection (https://doi.org/10.1016/j.chom.2016.07.008) and that this has also consequences for innate immunity against viruses.

      We thank the reviewer for this suggestion, which we have incorporated into the discussion (lines 522–523).

      • Given that HSV-1 is a BSL-2 level virus and that a recombinant version (GMO) has been used in the study, the authors should describe the biosafety measures taken to image non-inactivated infectious samples by cryo-SXT. The authors should state that a biosafety committee has reviewed these activities.

      We have included a Biosafety Measures section to the methods (lines 562–568) that details the biosafety measures used and their approval by the relevant committees.

      Reviewer #3 (Significance):

      This study constitutes an incremental technical advance in the study of HSV-1 infection. The broad context and the quasi-native structure of the cells enables documenting events that are difficult to observe thin sections for TEM.

      This study is one of the few examples of the use of cryo-SXT for infected cell imaging. Other examples of the literature are cited as well as previous structural studies performed with higher resolution techniques.

      The manuscript may be suitable for HSV-1 specialists and cell biologists interested in using near-native samples for gross cellular imaging and documentation of low-resolution maps revealing alterations in large subcellular structures.

      We thank the reviewer for highlighting that ours is one of only a few comprehensive studies using cryoSXT, illustrating how it can be used to image cellular processes that are hard to ‘catch’ using techniques that require ultra-thin sectioning, and as such that it will be of interest to cell biologists studying infection processes in cellulo.

    1. Reviewer #3 (Public Review):

      The manuscript presents data that high expression of Protein Phosphatase 1 inhibitor in triple-negative breast cancer contributes to the poor outcome by downregulation of an important kinase, GSK3β. If substantiated, this would enhance our understanding of the pathophysiology of this important disease and might suggest new treatment options. Indeed, changes in PPP1R14C expression alter the behaviour of TNBC in cells and in mouse models, but the mechanistic links to GSK3 are not robustly established.

      Fig 1-2 identified the PPP1R14C as upregulated in TNBC and with a significant correlation with worse outcome. Fig 3 and 4 show in vitro and in vivo effects of changes in PP1R14C consistent with increased proliferation, migration and metastasis in vivo. These studies look very solid and appear to identify a role for this phosphatase regulator in TNBC.

      The weaker part of the manuscript is the mechanistic link to GSK3 regulation. Over-expression and knockdown of PPP1R14C have effects on GSK3β phosphorylation and downstream targets, but the direct connection is unclear and made challenging by a number of complex experimental issues.

      The big questions -<br /> 1. Is GSK3 directly ubiquitylated by TRIM25 on K183? I don't think the data are strong here, for reasons elaborated on below.

      2. Is GSK3 really the important target of PPP1R14C/PP1 complex? The biological data are correlative and the direct experiment, does GSK3β (S9A/K183R) rescue PPP1R14C over-expression, would need to be done. But since I suspect K183R is kinase-dead, this may fail.

      3. The studies with C2 are confounded by the broad effects (including on PP2A) of treating cells with ceramide. Calling C2 a specific PP1 activator is I think unwarranted.

      Specific comments:<br /> Why is there a band in Fig 5D lane 2, the Flag-PPP1R14C lane, in the absence of Flag-PPP1R14C?

      Why in Fig 5E, F, G are there two bands in the pGSK3bS9 blot?<br /> The authors would need to show the total GSK3 coming down here too, and the total GSK3 present in Fig 5H as well.

      I have trouble understanding the result in Fig 5H. According to this, global PP1 phosphatase activity increases 3 fold when PPP1R14C is knocked down. First, there is no method noted for this assay. How do we know this is specific to PP1? Second, PPP1R14C is only one of many PP1 interactors. How can its knockdown change cellular PP1 activity 3-fold? I note the knockout mouse for PPP1R14C had a 15% increase in thalamus PP1 activity (see fig 3, https://doi.org/10.1016/j.neuroscience.2009.10.007). This experiment needs much more in the way of controls.

      Fig 6 evaluates the role of PPP1R14C in GSK3 protein stability. There is a fundamental weakness here - How do the authors know the ubiquitylated smear in the various Fig 6 assays is GSK3 versus a ubiquitylated protein that interacts with active GSK3? GSK3 phosphorylation directs many proteins (famously β-catenin and Myc) for ubiquitylation and degradation, so the co-IP of ubiquitylated proteins with GSK3 is to be expected if the IP stringency is not very very high. This is consistent with inactive pSER9 GSK3 not bringing down ubiquitylated proteins. An IP after for example boiling in SDS to break up large complexes would be needed to test if GSK3 itself, rather than associated substrates, is directly ubiquitylated.

      Is TRIM25 specific for GSK3? It's identified by mass spectrometry. However, when I plug TRIM25 into the CRAPome database (https://reprint-apms.org) I find it comes down in 136/716 (19%) of all MS IP studies, making it a very common contaminant in IP. Thus the bar is high to show this is specific. Here the interaction is validated with over-expression of various truncation mutants.

      Line 235: "K183 of GSK3β has been recognized as the ubiquitylation site". First, what is the reference for this statement? I found one paper (https://doi.org/10.1074/jbc.M116.771667 that claims this residue is important for FBXO17 K48 modification, not the K63 linkage associated with TRIM25). In the crystal structure of GSK3β, that K183 appears to coordinate the phosphates of ATP, so the effect of the K183R mutation may be to make the kinase inactive, which would confound their results. So an important experiment is, does K183R retain wildtype kinase activity? Or is it inactive, and so act like the phosphorylated S9 GSK3?

      The reference for ceramide as a PP1 activator is not a primary reference, it is to a paper in the Journal of Endodontics, which uses it. It would be important to cite primary literature for this usage of C2. I note that many papers cite C2 ceramide as a PP2A activator. It is unclear what the rationale is for using it as a specific PP1 activator?

    1. I’d want to learn a lot from Professor Zimmerman so that I may obtain as much information as possible and use it in reality. It’s not about the work.

      This is a "free write" that we did in class recently to think on how we want our experiences in this class to play out during the rest of the semester. As you can see from the first few phrases, I explained how I wanted to learn as much as possible to help me in the future. I made it very obvious that "it wasn't about the work" and that it goes far deeper than that.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We would like to thank the reviewers for their helpful and constructive comments.

      2. Point-by-point description of the revisions

      Reviewer #1

      This reviewer thought our findings would be of interest to a broad range of scientists from both the centrosome and mitosis fields, but noted some important aspects for improvements.

      Additional Experiments (we number these points for ease of discussion).

        • Figure 3. The reviewer points out that because our analysis of Ana2-∆CC and Ana2-∆STAN mutant proteins was conducted in the presence of endogenous WT protein, we should be more cautious in our interpretation.* We agree and apologise for overstating these findings. We have now rewritten the title and text of this section to be more cautious (p11, para.2)
      1. Figure 5A. The reviewer wonders whether the reduced recruitment of Sas-6 in the presence of Ana2(12A) is due to reduced binding, and they request we test this biochemically. This is our favoured interpretation, but we have been unable to test this biochemically for two reasons. First, although we have successfully purified several recombinant Sas-6 and/or Ana2 fragments (Cottee et al., eLife, 2015), the full-length proteins are poorly behaved (tending to precipitate, likely due to their inherent ability to self-oligomerise). Thus, we have been unable to reconstitute their interaction in vitro*. Second, as we show here, the proteins are normally expressed in embryos at surprisingly low concentrations (~5-20nM), and we can detect no interaction between them in coimmunoprecipitation experiments from embryo extracts (not shown). Indeed, this concentration is so low that Sas-6 does not even appear to form a homo-dimer in the embryo, even though Sas-6 clearly functions as a homo-dimer in centriole assembly (new Figure S4A). We now explain these points, and state that our favoured hypothesis that Ana2(12A) has reduced affinity for Sas-6 (or other core duplication proteins) remains to be tested (p22, para.2).

      2. The Reviewer wonders if all 12 of the potential Cdk1 phosphorylation sites that we mutate in Ana2(12A) are important in vivo, and whether we have tested whether mutating fewer sites (e.g. the two sites [S284/T301] that we show are phosphorylated by Cdk1/Cyclin B in vitro) might be sufficient to recapitulate the Ana2(12A) phenotype. *We have now tested this by mutating just the S284/T301 sites to Alanine [Ana2(2A)], but the results were not very informative (Reviewer Figure 1 [RF1]). Whereas Ana2(12A) is recruited to centrioles for a longer period and to higher levels than WT Ana2 (Figure 4A), Ana2(2A) is recruited to centrioles for a normal period but to lower levels (RF1A,B). The interpretation of this result is complicated because western blots show that Ana2(2A) is also present at lower-levels than normal (RF1B). Thus, it is clear that Ana2(2A) does not recapitulate well the behaviour of Ana2(12A). We have decided not to present this data as it is difficult to interpret and it does not change any of our conclusions.

      3. Figure 6. The reviewer asks whether the 12A mutations impair the interaction with Plk4, influence Plk4’s kinase activity or the ability of Plk4 to phosphorylate Ana2. These are excellent questions but, for the same reasons described in point 2 above, we cannot address them biochemically as we cannot purify well-behaved recombinant full-length Ana2 or active Plk4 in vitro, and both proteins are present at such low levels in the embryo that we cannot detect any interaction between them in embryo extracts. We are working hard to reconstitute in vitro* systems to probe these important points, but it may be sometime before we are able to do so.

      4. Figure 7. The reviewer suggests that the 12D/E phosphomimetic substitutions introduce more negative charge than the putative phosphorylation of Ser/Thr residues and they ask if the Ana2(2D/E) [stated as Ana2(3D/E)] is, like the Ana2(12D/E) mutant, not efficiently recruited to centrioles.* This is a fair comment, but we have not analysed an Ana2(2D/E) mutant because, as described in point 3 above, the Ana2(2A) mutant did not recapitulate well the Ana2(12A) phenotype.

      Minor comments

        • Figure S1. The reviewer requests that we show that the mNG tag on its own is not recruited to centrioles.* We do not show this (as it would create a lot of white space in this Figure), but now state that mNG and dNG do not detectably localise to centrioles (p7, para.1).
        • Figure S4C.* We have included the missing error bars (now Figure S4B).
        • Figure S5A. The reviewer asks about the expression levels of the Ana2(12A) mutant, which are not shown in this Figure. They also state that the expression levels of the transgenes shown in Figure 5A are not similar.* The expression level of Ana2(12A) is shown in Figure S9, as this data was analysed independently of the other mutant proteins shown in Figure S5. We agree that it was overly simplifying the situation to state that the expression levels of WT Ana2-mNG, eAna2(∆CC)-mNG and eAna2(∆STAN)-mNG were “similar” (Figure S5), and we now specifically mention the differences between them (p11, para.3). Reviewer #2

      This reviewer found this a rigorous study that advances our understanding of the regulation of centriole duplication, but raised some minor points.

      Minor Points

      The reviewer requests that we mention the literature describing how Ana2/STIL can influence the abundance and centriolar localisation of Plk4. We apologise for this omission, and have amended our description of this literature in the Introduction to include this point (p3, para.2).

      The reviewer notes that we interpret the ability of the Ana2(12A) mutant to keep incorporating into the centrioles for a longer period as being consistent with our idea that rising levels of Cdk activity during S-phase normally reduce the ability of WT Ana2 to bind to the centriole. They ask us to show how Cdk activity increases over this time-course, and to test whether dampening Cdk has the same effect on Ana2 recruitment (i.e. allows Ana2 to be recruited for a longer period). The time-course of Cdk activation in these embryos has been reported previously (Deneke et al., Dev. Cell, 2016; we present the relevant data from this paper in RF#2A [black line]). This reveals how Cdk activity rises throughout S-phase, which is crucial for our model. To assess the effect of dampening Cdk activity in these embryos we have now analysed the effect of halving the genetic dose of Cyclin B (RF#2B). This perturbation extends S-phase length, but has a complicated effect on the recruitment dynamics of Ana2 (RF#2B). As we would predict, Ana2 is recruited to centrioles for a longer period in these embryos, but it is also recruited more slowly (so it accumulates to lower levels). This is consistent with our hypothesis that Cdk1 activity might first stimulate and then ultimately inhibit the centriolar recruitment of Ana2. The interpretation of this experiment is not straightforward, however, as dampening Cdk1 activity alters Ana2 recruitment dynamics (and many other processes in the embryo) in complicated ways, so we have decided not to include it in the manuscript.

      The reviewer suggests that it would be valuable to show that all 12 of the potential Cdk1 phosphorylation sites in Ana2 can be phosphorylated by Cdk1 in vitro. We think this would not be particularly informative as our hypothesis does not rely on all 12 sites being phosphorylated to generate the Ana2(12A) phenotype. We simply mutate all 12 sites because we don’t know which, if any, are relevant. Thus, showing that some/all of the 12 sites can/cannot be phosphorylated in vitro does not test any hypothesis and would not change any of our conclusions. We now explain our thinking on this in more detail (p12, para.2)

      Other points

      Figure 3. We have corrected the amino-acid numbering mistakes.

      Figure 5Aii. We have changed the x-axis (time) labelling in this and all other Figures.

      Figure Legends. We have tried to eliminate the typos from the Figure legends, and apologise that these errors made it through to the final submitted version of our manuscript.

      Reviewer #3

      This reviewer thought our manuscript would be of great interest to not only the centrosome field but also to cell biologists more generally. Although they had no major concerns, they made a number of suggestions for improvements.

      1. As the reviewer suggests, we now explicitly state that although the Ana2(12A) mutant appears to be largely functional, the overall conformation of the protein may be altered, changing its function in ways we do not appreciate (p21, para.2).

      2. The reviewer suggests we include a multiple sequence alignment of Ana2/STIL proteins to provide more context about the distribution and conservation of the 12 S/T-P sites mutated in Ana2(12A).* This is an excellent idea, and we now include this in a new Figure S6, where we also provide more information about which of these sites have been shown to be phosphorylated in embryo or S2-cell extracts

      3. The reviewer is confused as to why the 12A and 12D/E mutants rescue the ana2-/- mutant flies so well, which suggests that the mechanism we propose here cannot be essential for centriole duplication. We understand this confusion and we now make this point more clearly and explain why we think this occurs in more detail (e.g. p22, para.1). We propose that Cdk normally phosphorylates Ana2 to inhibit its ability to promote centriole duplication, but this phosphorylation does not entirely block this function. So, if all other elements of the system are functional, Ana2(12A) is recruited to centrioles for longer than normal, but this does not dramatically perturb centriole duplication because the many other factors that regulate centriole duplication (such as the pulse of Plk4 recruitment to centrioles [Aydogan et al., Cell, 2020]) still occur normally and are sufficient to ensure that centrioles still duplicate normally. When Ana2 phosphorylation is mimicked [Ana2(12D/E)], the ability of Ana2 to promote centriole duplication is perturbed (but not abolished). This perturbation is lethal in the early embryo—where the centrioles must duplicate in just a few minutes to keep pace with the rapid nuclear divisions. In somatic cells S-phase is much longer, so these cells can still duplicate their centrioles (as we observe) even though Ana2(12D/E) does not function efficiently. As we now explain, this phenotype (being lethal in the early embryo, but not in somatic cells) is a common feature of mutations that influence the efficiency* of centriole and centrosome assembly (p17, para.2).

      4A. The reviewer asks us to comment in more detail on why centrioles do not seem to be elongated in the Ana2(12A) mutant wing disc cells (now Figure S8C), even though we show that Ana2(12A) (Figure 4A), and also Sas-6 (Figure 5), are recruited to centrioles for an abnormally long period. This is an excellent question and, although we do not know the answer, we now discuss this interesting point in more detail (p16, para.1). We think this is likely due to the “homeostatic” nature of centriole growth: in our hands, almost any perturbation that makes centrioles grow for a longer/shorter period, also makes them grow more slowly/quickly, so that they tend to grow to a similar size (Aydogan et al., JCB, 2018; Cell, 2020). This is fascinating, but poorly understood. When we perturb the system by expressing Ana2(12A), both Ana2(12A) and Sas-6 incorporate into centrioles for a longer period, as we predict (Figure 4A and 5A). Unexpectedly, however, Sas-6 is also recruited to centrioles much more slowly. Thus, as so often happens, when we perturb the system so the centrioles grow for a longer time, the centrioles “adapt” by growing more slowly. We do not currently understand why this occurs (although we speculate that Ana2 may also be regulated by Cdk/Cyclins to help recruit Sas-6 to centrioles in early S-phase). In the embryo, where S-phase is very short, this homeostatic compensation is not perfect, and the centrioles appear to actually be shorter than normal. In somatic wing-disc cells, where S-phase is much longer, we suspect that there is more scope for homeostatic compensation and so the centrioles grow to the correct size.

      4B. In this point (also labelled [4] by the reviewer, so we have retained this numbering but labelled the points A and B) the reviewer asks why levels of Ana2(12A) eventually decline at centrioles once the embryos actually enter mitosis. The reviewer notes our rheostat theory, but suggests a discussion of other mechanisms might be interesting. This is a good point, and we agree that the observation that Ana2(12A) levels ultimately still decline at centrioles during mitosis is likely to be important in explaining why centriole duplication is not more dramatically perturbed by Ana2(12A). We now expand our discussion of this point, highlighting that other mechanisms must help to ensure that Ana2 is not recruited to centrioles during M-phase, and discussing the possibility that the receptors that recruit Ana2 to centrioles are themselves inactivated during mitosis by high levels of Cdk activity (p15, para.1). In such a model, the rapid drop in WT Ana2 centriolar levels is due to a combination of switching off Ana2’s ability to bind to centrioles (as we propose here) and switching off the ability of the centrioles to recruit Ana2. For Ana2(12A), only the latter mechanism would operate, so Ana2(12A) levels would start to drop later in the cycle (as the inflexion point at which Ana2 recruitment and loss balances out would be moved to later in the cycle), and these levels would drop more slowly—as we observe.

      • The reviewer is confused to how the Ana2(12D/E) mutant can rescue the mutant phenotype when it is recruited to centrioles so poorly. Ana2(12D/E) is indeed recruited very poorly to centrioles in the experiment shown in Figure 7. However, this experiment had to be conducted in the presence of WT untagged Ana2—as the embryos do not develop in the presence of only Ana2(12D/E). We would predict that WT Ana2 would bind more efficiently to centrioles than Ana2(12D/E) (which appears to behave as if it has been phosphorylated by Cdk/Cyclins, and so cannot be recruited to centrioles efficiently). Thus, in the experiment we show in Figure 7, the Ana2(12D/E) protein is probably being “outcompeted” for binding to the centriole by the WT protein. In somatic cells expressing only* Ana2(12D/E) presumably sufficient mutant protein can be recruited to centrioles to support normal centriole duplication (as it no longer has to compete with the WT protein). We now explain our thinking on this point (p18, para.1).

      • The reviewer wonders whether Ana2(12D/E) may be unable to homo-oligomerize, and this may explain why the protein is not recruited to centrioles efficiently even in the presence of WT protein. This is indeed a possibility, but we think it unlikely as it is widely believed that Ana2/STIL proteins must multimerize to be functional (Arquint et al., eLife, 2015; Cottee et al., eLife, 2015; Rogala et al., eLife, 2015; David et al., Sci. Rep., 2016). As Ana2(12D/E) strongly restores centriole duplication in ana2-/-* mutant somatic cells, it seems unlikely that it cannot multimerize. Nevertheless, we now specifically highlight that the 12D/E (and 12A) mutations might alter the ability of Ana2 to multimerise (p21, para.2).

      We thank the reviewers again for their thoughtful and constructive comments. We hope they will agree that the revised manuscript is now improved and would be appropriate for publication in The Journal of Cell Biology.

      With best wishes,

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point description of the revisions

      Black: Comments from reviewers

      Green: Answers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Yamamoto and colleagues have investigated the interplay between microtubules (MTs) and actin in positioning the MTOC at "the cell centre". They have developed a novel experimental setup akin to a synthetic cell to study this question. Essentially a cell-sized (15 µm) microwell that is coated in lipid and then tubulin/actin added and the positioning of a MTOC proxy is studied by microscopy. This is a well executed study. These complicated biochemical reconstitutions are the hallmark of Blanchoin and Théry's group, but even so, it's clear that the exact conditions (e.g. tubulin concentration) are fiddly and critical for these experiments to work. The data are clear, well analysed and presented. In brief, the conditions for centring a cytoskeletal network and decentring/polarising it are recapitulated. This is a short, straightforward paper and I found the results to be clear and the authors' interpretation to be well supported by the data.

      Two questions occurred to me as I read the paper: 1. While the setup is reminiscent of a cell, I suspect that the edge/wall of the microwell is much stiffer than the plasma membrane. So a MT that encounters the wall may behave differently in the cell. This would affect the non-actin conditions but possible also the conditions where an actin mesh is present. Maybe my intuition is not even correct, but I think this issue should be discussed in the paper as a potential limitation of the system.

      Author response: We thank the reviewer for this wise comment. Indeed, the deformation of the container may impact the organization of the MT network, the force balance and the final position of the MTOC. We commented this limitation in the revised discussion (page 10 line 31). However, it should be noted that in the presence of a cortical actin network, MTs are much less capable of deforming the cell than in a vesicle or a in cell treated with actin drugs, so our conditions with a cortical actin network are physiologically relevant although the container can not be deformed.

      1. The graphs in 3C and 4G (lesser extent Fig 1) show nicely that the aMTOC position has apparently rested at a steady state. Some representative trajectories are shown in some figures, but not mentioned much in the text. How does the pathlength (cumulative distance) over time compare to the "distance to centre" measurement? Is there more or less travel under the different conditions? From the supplementary videos it looks like there is a difference. An apparent resting position may still represent significant motion, e.g. circling the centre. What does an analysis of tracklength tell us, if anything?

      Author response: We appreciated reviewer’s comment and followed his/her advice. We measured the pathlength (cumulative distance moved) based on the data shown in Figure 3C and 4G. The analysis confirmed that the MTOC was static in the presence of bulk actin network (shown in the new Supplementary Figure 6B). Interestingly, it also showed that the final position adopted by the MTOC in conditions where it could move more freely was also static, as revealed by the saturation of the pathlength after 1 hour. These analyses are shown in the new Supplementary Figure 6B for the centering in the absence of cortical actin, for the non-centering with long microtubules in Supplementary Figure 7E and for the centering with long MTs and a cortical actin network in Supplementary Figure 7E.

      Very minor clerical point: - the first two sentences of the abstract could be clearer. "The position of centrosome, the main microtubule-organizing center (MTOC), is instrumental in the definition of cell polarity. It is defined by the balance of tension and pressure forces in the network of microtubules (MTs)." In the second sentence, "it" and "defined" are confusing. Are you talking about the position of the centrosome or cell polarity?

      Author response: We thank the reviewer for this comment. As the reviewer suggested, this was a confusing description. Accordingly, we corrected the sentence in the abstract for :

      The orientation of cell polarity depends on the position of the centrosome, the main microtubule-organizing center (MTOC). It is determined by the balance of tension and pressure forces in the network of microtubules (MTs).

      Reviewer #1 (Significance (Required)):

      As I see it, the main advance here is in novel experimental setup which has real potential in the field. Existing methods such as MTs inside lipid bubbles are limited, whereas as the microwell method with fabrication methods allows the shape of the "synthetic cell" to be carefully modulated. Tying the results together with cytosim simulations is also a powerful combination. There is a lot of interest in bottom-up reconstitution of cell biological phenomena, especially those that underlie specialised cell processes, e.g. polarity. My expertise: microtubules in a cellular context with limited experience of MT reconstitution assays.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript describes the use of an elegant in vitro reconstitution system to study the effect of variations in the organization of the actin network on the positioning of a microtubule organizing center (MTOC) within the cell. By using a reconstituted system the authors are able to specifically study the contribution of the "pushing" forces generated by microtubule (MT) growth, without the confounding influence of other factors, like pulling forces from MT motors. The authors find that a bulk actin networks at sufficient density can impair MTOC displacement, likely a result of the large viscous drag of the MTOC. Next they show that MTOC centering more resilient to changes in microtubule length. Finally they show that an asymmetric actin network can cause asymmetric positioning of the MTOC.

      Major comments: 1) The model the authors put forth is that the growth of long MTs leads to decentering as a result of the MTs slipping along the well edge. The presence of a cortical actin mesh prevents this slipping. Their argument would be strengthened with and analysis of the MT behaviors in the various conditions. For example when discussing MTOC in well without actin...

      "As they grew, they first ensured a proper centering but after an hour, MT elongation and slippage along microwell edges broke the network symmetry and MTs pushed aMTOC away from the center (Figure 1I, J and Supplementary Movie 2)"

      In this movie I don't see evidence of MTs hitting the cortex and sliding on the "short" side of the well relative to the MTOC. An analysis of the behavior of MTs in various circumstances would help link the behavior of MTs to the movement of the MTOC for all of their conditions. What fraction of MTs hit the cortex and remain relatively motionless, what fraction slide, what fraction catastrophe, what fraction turn and follow the curve of the well? And how does this behavior change for microtubules that end up on the short side vs. the long side of the MTOC? This type of analysis would solidify their model for how centering/decentering occurs in the various conditions they test.

      Author response: This is a fair criticism. The possibility to perform fine analysis of MT dynamics is technically limited by the fluorescent background due to free tubulin dimers. It is the reason why classical in vitro assays are monitored in TIRF microscopy, which is not possible here since MTOCs move in 3D in the microwells. In addition, working with higher laser power to increase the signal to noise ratio generates severe photodamages on MTs. Nevertheless, we could visualize MT dynamics and displacements near the edge of the microwells and describe their behavior more precisely than in the previous version of our manuscript. New images and tracking of MT behavior are now reported in the new Figure 4E, 4F and 5G, as well as the new supplementary Figure 4C, 4D, 7B, and 7C. We also replaced the supplementary movie 2 and Figure 1I in order to show more clearly MTs hitting and slipping along the well boundary. In addition, we also characterized the pivoting of MTs around the MTOC and near the edge of the microwell in order to better characterize the effect of cortical actin. This is now shown in the new Figure 4G and 4H as well as in the new Supplementary Figure 7C-D). We found that the changes in MT orientation and position, at the centrosome and at the contact with the microwell, were clearly prevented by the presence of cortical actin.

      2) The authors use simulations to support their in vitro findings. However, their simulations have many more microtubules emanating from the MTOC than their experiment (Looks like about 50 in the cytosim and they state they are aiming for 15-20 in the aMTOCs). Do the simulations still reproduce the behavior of the in vitro system with a similar number of MTs?

      Author response: This is another fair criticism. We addressed this point by performing simulations with 10~30 microtubules (the number of MTs is variable because of MT dynamics) which are more similar to the number of MTs that we obtained in our experimental conditions. Results were consistent with previous simulations with higher number of MTs and are now shown in the new supplementary figures 6E-F, 7G and 8I).

      3) When the actin networks are asymmetric, the authors see decentering of the MTOC towards the side with less actin. However there is still actin on the side where the MTOC will move to and in some of their images it looks pretty think. Is the actin on that side not dense enough to prevent MT sliding along the "cortex"? If so, can they generate less dense, but uniform actin networks on the "cortex", where MTs can slide. Again descriptions of MT behaviors would be useful in understanding what is happening.

      Author response: We thank the reviewer for asking this important question. We followed reviewer’s advice and generated homogeneous and less dense cortex by working at lower concentration of actin (0.5 mM). In such conditions, we could not see the centering effect that was observed with dense cortex. These new data are now shown in the new Supplementary Figure 7I. This effect was also tested with numerical simulations (new Supplementary Figure 7J) which were consistent with the key role played by actin network density for MT network positioning by cortical friction.

      Minor Comments: 1)Title - the current title implies that actin is balancing the forces generated by the MTs. I'm not sure this is a good description of what is shown in the paper.

      Author response: We thank the reviewer for pointing at this issue. We revised the title to:

      Reconstitution of centrosome positioning by the production of pushing forces in microtubules growing against the actin network.

      2)The discussion would benefit from more explanation about how the results of this paper relate to the classic examples of MTOC positioning they cite. How do they envision the actin and MTs interacting in these systems and what new insight have we gained from the experiments in this manuscript.

      Author response: This is a good suggestion. We added some comments in our discussion about the actin network asymmetry in several classical examples of cell polarization and explained how our observations suggest some new interpretation on the role of this asymmetry in the reorganization of forces in the MT network and on the consequential peripheral positioning of the MTOC.

      Reviewer #2 (Significance (Required)):

      Overall, this work is a significant advance in our understanding of the potential mechanisms of MTOC movement in cells via pushing by MT growth. The experimental system they have developed is powerful advance, allowing meaningful MTOC reconstitution experiments to be performed in chambers of approximately cellular size. This is an important contribution to understanding the interaction between microtubule pushing and the actin cortex.

      Reviewer expertise: Cell biology of MTOC assembly and positioning. I do not have the expertise to assess the parameters used to generate their cytosim models.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Review of "The architecture of the actin network can balance the pushing forces produced by growing microtubules" by Yamamoto et al.

      The means by which cells maintain their characteristic cytoskeletal architectures is not well understood. This is in part because there is considerable variation in such architectures with, for example, fibroblasts, neurons, and epithelial cells. It is also in part because the microtubule, actin and intermediate filaments engage in a wide range of mechanical and signaling crosstalk mediated by a wealth of proteins and signaling networks, which further complicates the picture.

      In the current study, Yamamoto take the welcome step of developing a simplified system for assessing the mutual contributions of microtubules and F-actin for general cytoskeletal organization in vitro (specifically, in lipid-lined microwells). This allows them to define basic principles of microtubule-F-actin interactions in the absence of the various confounding factors alluded to above. Using their model, they show that artificial MTOCs (aMTOCs) alone will center but as a complex function of microtubule length (controlled by varying tubulin concentrations). That is, the aMTOCs are randomly positioned with short microtubules, stably centered with intermediate length microtubules, and randomly oriented with very long microtubules (following symmetry breaking).

      They then assess the contributions of F-actin to the centering process. In low concentrations of "bulk" F-actin (ie F-actin distributed throughout the droplet) there is no effect on centering whereas at higher concentrations of bulk F-actin, centering is impaired as is the translocation of the aMTOCs. In the presence of uniform peripheral F-actin, in contrast, aMTOC centering is enhanced, and rendered less sensitive to variations in microtubule length. Finally, when the authors contrive a situation in which the peripheral F-actin is non-uniform (by lowering the concentration of actin and adding alpha-actinin, which creates a peripheral ring of F-actin with (I think) relatively less F-actin within the ring), the aMTOCs position themselves within the ring.

      Finally, the authors extend their results with simulations that indicate that the various behaviors can be explained by a combination of friction, pushing and slippage.

      This study is fascinating and will be of general interest to anyone who seeks to understand the contributions of mechanical forces to cytoskeletal organization in a minimal system. I have only minor concerns; these are listed below.

      1. Some of the terminology was a little confusing. The authors introduce the term "inner zone" (pg. 8) without defining it. From the context, it seems like they are talking about the approximate center of the ring of peripheral F-actin. If so, why not just do away with the term "inner zone" and refer to the ring center. If it isn't the ring center, then more explanation is needed as to what the inner zone actually is.

      Author response: We apologize for this confusion and appreciate reviewer’s comment. We coined earlier the term “actin inner zone” to define the central cytoplasmic region in cells that is devoid of actin filament (Jimenez et al., Current Biology, 2021). Because it was a confusing point, we clarified this in the revised version of the manuscript (Page 8, Line 20). What we would like to call the “inner zone” is the region inside of the actin cortex. The definition of this zone and of its geometrical reference points were also pictured more precisely in the new Supplementary Figure 9B.

      1. It is not clear from the text or the images if the region within the F-actin ring has less F-actin, more F-actin, or the same amount of F-actin as the region outside the F-actin ring. This point should be clarified, as it makes a big difference in the interpretation of the findings.

      Author response: We apologize for this lack of clarity. In the revised version of our manuscript, we plotted a line scan intensity profile of the actin fluorescence (new Supplementary Figure 9B). It showed that the region within the actin inner zone contained much less actin than in the cortex. This is consistent with our interpretation of a region-selective pattern of friction acting on microtubules.

      1. Ideally, the authors would include manipulations in which the high concentration of peripheral F-actin is combined with alpha-actinin because, as currently presented, the authors are drawing conclusions from changing two variables at once (ie going from a high concentration of peripheral F-actin to a lower concentration with added alpha-actinin). Thus, the authors cannot cleanly distinguish between effects that arise from F-actin asymmetry versus the presence of an F-actin crosslinker. Since the crosslinking is likely to change the mechanical properties of the peripheral F-actin network, this point should at least be addressed in the text, if not by experiments.

      Author response: We are not sure to fully understand the reviewer’s point. We don’t understand how the crosslinking of a symmetric actin network could break the symmetry of the MT network and force its off-centering. The opposite is clearer to us. A homogeneous and loose actin network can allow MT gliding and MTOC off-centering (like in in Supplementary Figure 7J). The mechanical reinforcement of this network by crosslinkers could indeed resist gliding. But the consequence of this resistance would be similar to the consequence of a dense network: a more robust centering (like in Figure 4). So we don’t understand how the crosslinking by alpha-actinin, rather than the asymmetry of the actin network, could be at the origin of the off-centering we observed. In addition the off-centering of the MTOC was systematically aligned with the asymmetry of the actin network, so both parameters were clearly connected.

      Reviewer #3 (Significance (Required)):

      This is an elegant, well-designed study that provides a clear description of how basic mechanical forces can contribute to cytoskeletal organization in a simplified model system.

    1. It's a little hard to tell if "IndieWeb" is in practice just its own community of people who like to talk about #indieweb things. (That's what gets surfaced when I try to learn more, but of course it is.) I like the idea more than most "fediverse" incarnations, though.

      The Logos, Ethos, and Pathos of IndieWeb

      Where is the IndieWeb?

      Logos

      One might consider the IndieWeb's indieweb.org wiki-based website and chat the "logos" of IndieWeb. There is a small group of about a hundred actove tp very active participants who hang out in these spaces on a regular basis, but there are also many who dip in and out over time as they tinker and build, ask advice, get some help, or just to show up and say hello. Because there are concrete places online as well as off (events) for them to congregate, meet, and interact, it's the most obvious place to find these ideas and people.

      Ethos

      Beyond this there is an even larger group of people online who represent the "ethos" of IndieWeb. Some may have heard the word before, some have a passing knowledge of it, but an even larger number have not. They all act and operate in a way that either seemed natural to them because they grew up in the period of the open web, or because they never felt accepted by the thundering herds in the corporate social enclosures. Many are not necessarily easily found or discovered because they're not surfaced or highlighted by the sinister algorithms of corporate social media, but through slow and steady work (much like the in person social space) they find each other and interact in various traditional web spaces. Many of them can be found in spaces like Tilde Club or NeoCities, or through movements like A Domain of One's Own, some can be found through a variety of webrings, via blogrolls, or just following someone's website and slowly seeing the community of people who stop by and comment. Yes, these discovery methods may involve a little more work, but shouldn't health human interactions require work and care?

      Pathos

      The final group of people, and likely the largest within the community, are those that represent the "pathos" of IndieWeb. The word IndieWeb has not registered with any of them and they suffer with grief in the long shadow of corporate social media wishing they had better user interfaces, better features, different interaction, more meaningful interaction, healthier and kinder interaction. Some may have even been so steeped in big social for so long that they don't realize that there is another way of being or knowing.

      These people may be found searching for the IndieWeb promised land on silo platforms like Blogger, Tumblr or Medium where they have the shadow on the wall of a home on the web where they can place their identities and thoughts. Here they're a bit more safe from the acceleration of algorithmically fed content and ills of mainstream social. Others are trapped within massive content farms run by multi-billion dollar extractive companies who quietly but steadily exploit their interactions with friends and family.

      The Conversation

      All three of these parts of the IndieWeb, the logos, the ethos, and the pathos comprise the community of humanity. They are the sum of the real conversation online.

      Venture capital backed corporate social media has cleverly inserted themselves between us and our interactions with each other. They privilege some voices not only over others, but often at the expense of others and only to their benefit. We have been developing a new vocabulary for these actions with phrases like "surveillance capitalism", "data mining", and analogizing human data as the new "oil" of the 21st century. The IndieWeb is attempting to remove these barriers, many of them complicated, but not insurmountable, technical ones, so that we can have a healthier set of direct interactions with one another that more closely mirrors our in person interactions. By having choice and the ability to move between a larger number of service providers there is an increasing pressure to provide service rather than the growing levels of continued abuse and monopoly we've become accustomed to.

      None of these subdivisions---logos, ethos, or pathos---is better or worse than the others, they just are. There is no hierarchy between or among them just as there should be no hierarchy between fellow humans. But by existing, I think one could argue that through their humanity they are all slowly, but surely making the web a healthier, happier, fun, and more humanized and humanizing place to be.

    1. Author Response

      Reviewer #2 (Public Review):

      Schumacher and Carlson present volumetric data on the brain and main brain areas in several linages of fish that have independently evolved electroreceptors and electrogenesis. The main question is if the evolution of this novel sensory system has led to similar changes in the brain. Previously, the same authors (Sukhum et al 2018) have shown an increase in the relative size of the cerebellum and hindbrain in mormyrid fishes, one group of electrogenic fish. Here they have collected data on South American weakly electric fishes (Gymnotiformes) and weakly electric catfishes (Synodontis spp.) as well as some outgroups. (22 additionally species). I think the question is very interesting, and the inclusion of electrogenic catfishes is particularly interesting as they are a largely understudied group. I do have some concerns about how the data has been analysed and presented.

      1) A first conclusion is that gymnotiform and siluriform brains are not as enlarged as mormyrid brains, and that this suggests that an increase in brain size is not directly tied to an electrosensory system evolution. I think the story here is more complicated than that. From the data presented, it seems that mormyrids have a different body size-brain volume slope than other groups, but is unclear if this was tested in the PGLS model for brain vs body size, although mormirids show different slopes than other groups in the scaling of the cerebellum to brian volume. This difference in slope for body brain allometry has been confirmed by a manuscript published after the submission of this manuscript (Tsuboi 2021 BBE) with a large data set (~ 850 species, 21 of Osteoglossiformes). This steep slope close to one means that mormyrids with large body size have very large relative brain sizes but smaller mormyrids don't (this can be seen in figure 2). I think this needs to be addressed more carefully. First testing in the PGLS for body size vs brain size if mormyrids have a different slope and then in the discussion. Why mormyrids but not other electrogenic fish have evolved such a unique brain scaling?

      We thank the reviewer for this suggestion. We combined our data with the data from Tsuboi 2021 and assessed how the brain-body allometry has changed across 870 actinopterygians. We identified 3 shifts in lineages with at least 3 descendants and 7 shifts total that were supported by both the OUrjMCMC and PGLS analyses. One of these identified shifts was along the branch leading to osteoglossiforms, with a secondary decrease in one lineage within mormyoids. A second identified shift was along the branch leading to Synodontis multipunctatus. However, we find no shifts along the branches leading to other electrosensory lineages. This suggests that although mormyroids do have a different brain-body allometry compared with other electrogenic fishes, this shift predates the origin of mormyrids as it is found in all osteoglossiforms and thus is unlikely to be related to the evolution of electrosensory systems. These changes are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      2) I think the number of outgroups species used are too few and spread among several different linages of teleosts. I think this unfortunately tampers some of the conclusions. Particularly seems to leave unanswered the question if other electrogenic fish have brain larger than non electrosensory or electrogenic fish. A large data set of brain and body size data for teleost has been published (Tsuboi et al 2018; 2021). Adding this data should allow to test for changes in body-brain size relationships in the each electrogenic clades. The addition of the additional data should allow to accurately test for difference in relative brain size between and within electrogenic clades and make it possible to test when exactly in the phylogeny of teleost have grade shits in the body-brain allometry have happened.

      We thank the reviewer for this suggestion. We explicitly addressed this question by fixing shifts along the branches that evolved our three electrosensory phenotypes: evolution of electrogenesis, tuberous electroreceptors, and ampullary electroreceptors. After comparing these models to the unfixed shift model, a model where only osteoglossiforms have a shifted allometry (following the finding of Tsuboi 2021), a model where only intercept can shift, and a model with one shared allometry across all actinopterygians, we found that the unfixed shift model has a better fit than any of the electrosensory phenotype associated models. This further supports the conclusion that a shifted allometry/ large brain size is not necessary to evolve an electrosensory system. These additions are reflected in lines 778-826, 110-153, 513-528, 530-538, 569-575 and figure 3 and associated source data files. See also our detailed response to essential revision 1.

      3) Next, the authors use a principal component analysis and phylogenetic linear models to test how much of brain variation is explained by concerted evolution vs mosaic and where the mosaic change have happened. Here, despite the few non electrogenic/ electrocereptive species, the differences are more clear. I do think that in the case of the linear models, the use brain volume as the independent variable is unnecessary. By regressing the total brain volume, the authors are regressing each structure partially against the same value, and not surprisingly, this generates tight linear correlations. Further, this makes grade shifts (i.e. changes in relative size) less apparent. I think only brain volume -the structure should be used and shown in all figures. This has been the standard in the field when testing for grade shifts.

      We thank the reviewer for this comment. There is much debate in the field regarding whether to use brain volume or brain volume – region of interest as the independent variable, and both are commonly used. Originally, we had looked at both and found qualitatively similar results, but only presented the ‘region x brain volume’ results in the main text for brevity. We have revised this to include the results of statistical analyses for ‘region x brain volume – region’ and the accompanying figures in the main text for both the electrosensory phenotype comparisons and the within electrosensory phenotype comparisons (broadly distributed throughout the results and figure 5—figure supplement 1, figure 5—source data 4-6, figure 7—figure supplement 1, figure 7—source data 2). All of the major findings of relative mosaic shifts between tuberous receptor taxa and non-electric taxa, between electrogenic + ampullary only and non-electric taxa for cerebellum and torus, and no mosaic shifts with electrosensory phenotype in telencephalon hold regardless of the method, and we only find minor differences between the analyses for comparisons that had p values near 0.05. These discrepancies do not change any major conclusions. However, we have kept the reporting of ‘region x total brain volume’ analyses in the main text figures to be consistent with other large comparative studies in the field and our group’s previous work (Yopak et al 2010, Sukhum et al 2018).

      4) Related to the previous point, the authors report significant decreases electrogenic clades in the size of the olfactory bulb, rest of the brain and optic tectum. I think this is and artifact that results from including the cerebellum and other enlarged areas (TS and hindbrain) in the dependent variable. Similarly, the authors state that they found no increase in the size of the telencephalon in electrogenic clades and that non-electric osteoglossiforms have a mosaic increase in telencephalon relative to non-electric otophysans. Again, I think this suffers from the same problem. Figure 4-figure supplement 2 actually provides some insight in this respect. When plotted against the rest of the brain, no apparent differences are found in the size of the optic tectum. In the case of the olfactory bulb only two of the out-group species seem to have larger OB than all other species. Regarding the telencephalon, when plotted against RoB, all osteoglossiform seem to have similar telencephalon size. These conclusions need to be carefully evaluated.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      Reviewer #3 (Public Review):

      The authors use micro-CT scanning and sophisticated statistical techniques to compare the sizes of various major brain regions across a sample of 32 fish species, including lineages that have independently evolved passive electroreception and, in a smaller subset, the ability to generate and sense weakly electric fields. They found that most of the variation in brain region sizes is linked to variation in total brain size, indicating concerted evolution. However, the analysis also reveals that the electrogenic lineages/species have selectively enlarged the cerebellum, the midbrain torus semicircularis, and the hindbrain. These findings are interesting and usefully extend the last author's prior work on a subset of these species.

      A significant strength of the work is that it includes a relatively large number of species, makes a good attempt to understand how these species are related to one another (though the authors admit that the phylogeny is tentative), and that the analytical methods are quantitative and relatively sophisticated. It is also true that other researchers have long argued about the relative frequency and importance of concerted versus mosaic evolution. The present study is a valiant attempt to address this issue.

      However, some key results must be viewed cautiously. Most important is that the dramatic increase in the cerebellum (and torus semicircularis and hindbrain), relative to the rest of the brain, must necessarily lead to some other brain regions appearing to have decreased in size. Therefore, their absolute size may well have stayed the same or even increased in evolution; it's just that the enlarged brain regions decrease the proportions of at least some other regions. The authors mentioned this caveat in their previous paper on mormyroids (Sukhum et al., 2018), but not in the present manuscript. As a result of the problem, it is difficult to interpret the documented variation in olfactory bulb, optic tectum, or telencephalon size; is that variation "real" or just artifacts of major changes in the size of other brain regions (mainly cerebellum, torus, and hindbrain). The best way to address this problem would have been to repeat the analysis using a "reference" brain region that is thought not to vary dramatically in size across the species of interest (e.g., "rest of brain"). However, I acknowledge that this approach also has limitations. Still, the problem should be addressed somehow.

      We thank the reviewer for identifying this miscommunication. We have moved previous figure 4—figure supplement 2 to the main text (now figure 6) and have added the statistical analyses and discussion of this point to both the results and discussion. We have also clarified the distinction between relative and absolute shifts in region sizes throughout but see in particular lines 261-295, 307-317, 330-331, 473-499. See also our detailed response to essential revision 3.

      One strength of the manuscript is that it provides information about y-intercepts and slopes. Many other studies simply note increases or decreases in average volume (before or after correcting for absolute brain size). I like knowing which changes in relative brain region size are grade shifts (changes in intercept) versus changes in slope. However, the authors don't really do anything with those results. What do they mean? Are there different kinds of evo-devo mechanisms that underlie the two types of changes (slope versus intercept)?

      We thank the reviewer for this suggestion. We have added some discussion on potential mechanisms for evolutionary changes in intercept and slope (lines 543-559). Unfortunately, this topic is not well studied in fishes, which have extensive adult neurogenesis.

      On a related note, do the major brain regions vary in allometric slope within a given lineage? The realization that such differences do exist (at least in mammals and cartilaginous fishes) contributed much to the excitement around the concept of concerted evolution, since it means that evolutionary changes in absolute brain size can lead to major shifts in brain region proportions, but the authors seemingly ignore this point.

      We thank the reviewer for this suggestion. We do find variability in slope for different regions of each lineage. We reported these values (figure 5—source data 1, figure 7—source data 1) and add discussion of this point (lines 539-542).

      Finally, I must confess that some of the study's findings didn't surprise me. It is well known among fish neurobiologists that mormyrids have a dramatically enlarged cerebellum and that all electrogenic gymnotoids and mormyroids have a very large torus semicircularis and dorsal/alar hindbrain. One didn't need the fancy analytical techniques to confirm this. To be fair, however, it had not been clear whether the cerebellum is enlarged in gymnotoid electric fish and their non-electrogenic relatives (the authors report that it is). Nor was it known that the weakly electric catfishes have a larger cerebellum (not so much for the torus) than their non-electric relatives. This is new information that raises interesting questions about how the electric catfishes are using their electrosensory system (I would have liked to see some discussion of this).

      We thank the reviewer for this comment. We too agree that electric catfishes warrant further study into which species are electrogenic, whether their discharges are sporadic versus continuous, and how they are using their electrosensory systems. We have added further discussion on electric catfishes (lines 411-416, 425-437).

      On balance, I appreciate that the authors have provided a large and useful data set , which they used to address an interesting set of questions about how brain evolution "works." I'm just disappointed that, for me, there are relatively few significant, novel insights. For example, the notion that "selection can impact structural brain composition to favor specific regions involved in novel behaviors" (last sentence of the abstract) is one that I've accepted for a long time. Maybe the conclusion can be made more interesting by focusing more explicitly on changes in the size of major brain regions versus smaller cell groups (where mosaic evolution is widely accepted).

      We thank the reviewer for this suggestion. We agree that mosaic evolution is more readily detected in smaller subregions/ nuclei/ circuits and is found less so at the scale of major brain regions. We have adjusted the text throughout to further highlight this distinction, but see in particular lines 42-48, 500-528.

      Reviewer #4 (Public Review):

      The authors present a detailed and thorough comparative analysis of brain composition across 3 different lineages of weakly electric fish, and several non-electric fishes. The goal of this comparison was to determine whether the evolution of electrosensory systems is associated with common changes in brain composition across the three lineages. Several aspects of this research are highly novel, such as the use of m-CT imaging and phylogeny-informed multivariate statistics. Overall, the authors show that cerebellar enlargement is key to the evolution of electrosensory systems of all three groups and the enlargement of the hindbrain and torus semicircularis varies depending on the types of electroreceptors and electrical signals produced. This is one of very few examples in evolutionary neuroscience of convergent evolution of brain anatomy and behaviour and sets the stage for future research on other sensory specialists and clades.

      Strengths

      The comprehensive analysis provided by Schumacher and Carlson has several strengths. First, the use of m-CT scans to derive neuroanatomical measurements in fish is relatively novel and the detailed descriptions of brain region borders were greatly appreciated. Few papers that focus on comparative neuroanatomy put this degree of effort into describing how regions were differentiated and defined, but the level of detail provided here will allow other researchers to acquire data in an identical method and is therefore an important resource.

      Second, the statistical analysis is phylogeny-informed and uses an array of approaches. Too many neurobiology papers either avoid phylogeny-informed statistics or execute them poorly. This paper is neither of those and should serve as a template for future studies in the field.

      Third, the inclusion of some recording data for Synodontis is an important contribution. I am not an expert on weakly electric fish, but I do know that the catfish are understudied compared with gymnotiforms and mormyroids. Hopefully, this will result in some well-deserved attention to the diversity of catfishes.

      Fourth, I found the manuscript as a whole well written and presented. In particular, the authors provided a novel way of incorporating additional statistical information into Figures 3 and 4.

      Last, the supplemental video was great addition to the data presented.

      Weaknesses

      First, the Introduction was a bit brief for readers unfamiliar with weakly electric fishes. It would be helpful to provide a bit more information to a general audience. Including a figure depicting the phylogenetic relationships among some (not all) bony fish clade to illustrate the independent evolution of electrosensory systems across the three clades would be particularly helpful in this regard.

      We thank the reviewer for this comment. We have included more background on the evolution of electrosensory systems in actinopterygians and included a figure showing this (lines 76-83, figure 1).

      Second, I think it is important to determine if the principal component analysis changes if the volumetric data is scaled. One issue that can affect multivariate analyses is including variables that differ greatly in scale. For example, if one brain region varies between 0.5-1.2 mm3, but another varies from 10-50 mm3 across species, that difference in scale can sometimes affect the PCA. I suggest checking that the analyses are broadly the same if the volumetric data is scaled (e.g., converting to z-scores).

      We thank the reviewer for this suggestion. We z-score normalized the regions and repeated the pPCA and found nearly identical results (lines 175-177, figure 4—figure supplement 1).

      Third is there any information regarding malapteurid catfish? Are they similar enough to Synodontis or could they exhibit yet another brain type from that discussed in this study? The reason I ask is that the authors raise the issue of Torpedo, but do not discuss other strongly electric fish like Malapteurus (which is a siluriform related to Synodontis).

      We thank the reviewer for this comment. We too agree that they would be worthwhile species to add. Unfortunately, there is no data available on malapteurid catfish, and we were unable to sample any. We have added discussion of this point to lines 411-416.

      Last, some of the graphs in the supplemental material are too small with datapoints too crowded to effectively read them. Larger graphs would enable a more effective evaluation of how the various clades differ from one another.

      We thank the reviewer for this comment. We enlarged the region x region plots and plotted species means instead to make it easier to visualize these data (Figure 6, figure 7—figure supplement 2-4).

    1. Not at this time. You know, we believe that the way we collect images is just like any other search engine. And you know, this is stuff in the public domain. And for the purposes that it’s being used for I think, they can be very pro-social. I don't think we want to live in a world where any big tech company can send a cease and desist, and then control, you know, the public square. So, I think it's an issue that is really important because the issue of collecting publicly available online data is not just images, any kind of data. It affects researchers who may be, you know, studying things like discrimination or studying other things like misinformation, and it affects academics and a whole wide range of other types of use cases as well.

      the companies that have asked Clearview to delete these images, has Clearview done so?

      • Didn't delete anything
      • He thinks they are collecting data like other searching engine
      • He believes the purpose of collecting data is favor by the social
      • Don't want the big tech company control the publiced data, that they could just send a cease and desist, and control eveythign
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Unlike other cell organelles, mitochondria contain a small fraction of their genetic information. However, most of the genetic information about mitochondrial proteins is still in the cell's nucleus and the localization of the respective proteins to mitochondria is facilitated by localized translation of their mRNAs. In turn, the mRNA localization to the mitochondria is partly due to the co-translational association, via the mitochondrial target sequence (MTS) of the nascent peptide.

      The manuscript "Mitochondrial mRNA localization is governed by translation kinetics and spatial transport" investigates the mechanisms of mRNA transport and attachment to mitochondria. Concerning mitochondria-localized mRNAs, two types of mRNAs have been distinguished before: mRNAs that are always attached to the mitochondrium (called "constitutively binding" by the authors) and mRNAs that become "sticky" only under certain conditions (called "conditionally binding" by the authors). Modeling the corresponding cellular processes biophysically, the authors infer that yeast cells exercise control over the localization of mRNA (and consequently over their metabolism) in two ways: via varying the mitochondrial volume fraction, and via varying the speed of translation elongation. Data from previously published genome-wide measurements of mRNAs that localize constitutively and conditionally via their MTS in budding yeast S. cerevisiae were used to investigate these mechanisms.

      The manuscript is very well written and the analysis is of high quality. It starts with an introduction that thoroughly reviews many facets around the conducted research and briefly, but self-consistently, summarizes the current knowledge regarding mitochondrial localization of mRNAs. Next, the consequences of the modeling work (presented in the "methods"-section) are explored in the "Results"-section, which contains meaningful and instructive figures and explanations. The manuscript concludes with a comprehensive evaluation of the consequences of the conducted research. All in all, there are only very few minor changes that could be considered.

      Content-wise, we suggest:

      The modeling of translation kinetics is pretty coarse-grained, using only an average elongation rate per amino acid. Much work in this field was done using totally antisymmetric exclusion principle (TASEP)-based models (e.g. MacDonald, J.H. Gibbs, A.C. Pipkin: Kinetics of biopolymerization on nucleic acid templates; Duc, Saleem, Song: Theoretical analysis of the distribution of isolated particles in totally asymmetric exclusion processes: Application to mRNA translation rate estimation). Perhaps this work can be mentioned, and furthermore, the consequences of inhomogeneity of elongation rate for different codons and amino acids could be explored or at least discussed. In particular, this could shed light into the question if ribosome interference and tRNA charging times have any impact on mitochondrial mRNA localization.

      Thank you to the reviewer for pointing us to these relevant papers. As suggested, we have added a paragraph to our Discussion that mentions this work and discusses the possible implications of inhomogeneous elongation along mRNA sequences. We find this suggestion (and the similar one made by the other reviewer) to explore inhomogeneous elongation particularly encouraging, because we are in the early stages of actively pursuing such work. We feel that beyond discussion, exploring the consequences of inhomogeneous elongation is beyond the scope of this work because significant further experimental work would be needed to quantify the impact of specific sequences on translation progress.

      To our Discussion, we have added the following paragraph.

      "In this work our quantitative model assumed uniform ribosome elongation rates along mRNA transcripts. In the presence of ribosome interactions, such dynamics can lead to both uniform and non-uniform ribosome densities and effective elongation rates along the transcript (MacDonald et al., 1968; Duc et al., 2018). With these uniform ribosome elongation rates, previous theoretical results suggest that collisions will be rare (Duc et al., 2018). However, elongation may not be homogeneous along an mRNA transcript, due to factors such as tRNA availability (Varenne et al., 1984), boundaries between protein regions (Thanaraj and Argos, 1996), amino acid charge (Charneski and Hurst, 2013), and short peptide sequences related to ribosome stalling (Sabi and Tuller, 2017). We have found that slow (homogeneous) elongation facilitates mitochondrial mRNA localization, by providing time for MTS maturation, diffusive search, and to maintain binding-competent MTS-mediated mRNA binding to mitochondria. We expect that inhomogeneities in elongation rate along mRNA could either enhance or reduce mitochondrial mRNA localization, controlled by whether slower elongation is in regions that favor longer MTS exposure. For example, a ribosome stall site following full MTS translation could provide more time for MTS maturation and facilitate mitochondrial localization. Future experimental work could identify such stalling sequences and point towards how modeling can improve understanding of sequence impact on localization."

      Ribosome occupancy data from Arava used to infer translation parameters. But there are more recent data sets based on ribosome profiling. Any reason for not using the more recent data?

      We thank the reviewer for bringing up this important point. Our text describing the origin of data for ribosome occupancy in the inset of Figure 2A lacked a citation to the dataset used, and we agree that more recent ribosome occupancy datasets are more appropriate. For the cumulative distributions of ribosome occupancy shown in the inset of Figure 2A, we used the ribosome occupancy data from Zid and O'Shea from 2014. The Arava data from 2003 was used for the cumulative distributions of Figure S1, to show that the similarity between conditional and constitutive genes in the inset of Figure 2A was present in more than a single dataset.

      We have clarified the origin of the ribosome occupancy data in the text.

      In the text description of the inset of Figure 2A, we now include a direct citation of Zid and O'Shea from 2014.

      "These measurements (Zid and O'Shea, 2014) indicate that conditional and constitutive genes have similar distributions of ribosome occupancy (Fig. 2A, inset; see Fig. S1 for similar distributions of conditional and constitutive gene ribosome occupancy derived from (Arava et al., 2003))."

      We also added a citation of Zid and O'Shea to the caption describing the inset of Figure 2A.

      "Inset is cumulative distribution of ribosome occupancy (Zid and O’Shea, 2014), showing ribosome occupancy and β have similar distributions. "

      To determine the translation parameters in our quantitative model, we applied the datasets of Couvillion et al from 2016 for relative protein per mRNA measurements and Zid and O'Shea from 2014 for ribosome occupancy measurements, combined with individual measurements from Morgenstern et al from 2016 and Riba et al from 2019. How these datasets and measurements are used is described in the Methods subsection “Calculation of translation rates”. In addition to the citations in the methods, we have added citations to the briefer description in the Results section.

      "Using protein per mRNA and ribosome occupancy data (Couvillion et al., 2016; Morgenstern et al., 2017; Zid and O’Shea, 2014; Riba et al., 2019), we estimated the gene specific initiation rate kinit and elongation rate kelong for 52 conditional and 70 constitutive genes (see Methods)."

      The effect of the mitochondrial volume fraction on mRNA localization is investigated with a diffusive model. However, the authors make a two dimensional Ansatz for the cell and mitochondrion while it would seem more natural to assume diffusion in three spatial dimensions, as the cell and mitochondria are both three dimensional objects and diffusion strongly depends on the number of dimensions it occurs in. Why was that Ansatz made and why is it justified?

      Our diffusion model is in fact three-dimensional, rather than two dimensional. Specifically, we treat the search process as occurring in a three-dimensional cylinder, whose cross-section is shown in Figure 1D. We have added to Figure 1D to further describe how three-dimensional cylinders represent the mitochondrial proximity in the cell.

      In the Results, we now write:

      “Specifically, we treat the geometry as a sequence of concentric three-dimensional cylinders, each representing an effective region surrounding a tubule of the mitochondrial network. Figure 1D shows a two-dimensional cross-sectional view of these cylinders. The innermost cylinder represents a mitochondrial tubule…”

      We have also clarified the caption of Figure 1D to include:

      "Schematic of mRNA diffusion in spatial model, shown in cross-section. The cytoplasmic space is treated as a cylinder centered on a mitochondrial cylinder: the three dimensional volume extends along the cylinder axis (not shown)."

      The range of variability in the localized fraction +/- CHX is smaller in the experiment compared to the model (Fig. 4B, C). What could be the rationale?

      We agree that the variability in localized fraction from applying CHX is smaller in the experiment (Figure 4C) in comparison to the model (Figure 4B). Our model uses translation parameters (initiation and elongation rates) that are derived from experimental measurements that are expected to be quite noisy. We expect that this noise in the model parameters will expand the range of localization changes predicted by the model for CHX application.

      In l. 417, the authors remark that "constitutively localized mRNAs are on average longer [...] than conditionally localized mRNAs." Yet constitutively localized mRNAs seem to have higher localized fraction than conditionally localized mRNAs. This is somewhat surprising. While it's clear that a higher diffusivity would be compatible with a faster response time of shorter, conditionally-localized mRNAs, it is not clear how the longer, less diffusive mRNAs would have a higher localization fraction. Perhaps the authors can clarify this point.

      The reviewer is correct that experimental measurements show that constitutively-localized genes are, on average, longer than conditionally-localized genes. In our quantitative model, we assume the mRNA of all genes have the same diffusivity. We have used the same diffusivity for different genes because experimental measurements suggest that mRNA length and the number of translating ribosomes on an mRNA do not substantially impact mRNA diffusivity. In our Methods section, we have added citations to papers indicating lack of dependence of mRNA diffusivity on mRNA length.

      "Simulated mRNA have a diffusivity of 0.1 𝜇m2/s. This diffusivity remains constant across genes and mRNA states, consistent with experimental measurements showing little dependence of mRNA diffusivity on mRNA length (Calderwood et al., 2016) or number of translating ribosomes (Wang et al., 2016)."

      We have additionally clarified the part of our Discussion where we explain the distinction of our results from proposals based on differential mRNA diffusion speed.

      "Lower occupancy was proposed to drive mRNA localization through increased mRNA mobility of a poorly loaded mRNA (Poulsen et al., 2019), as more mobile mRNA could more quickly find mitochondria when binding competent, increasing the localization of these mRNA. By contrast, our results imply an alternate prediction – that translational kinetics lead to enhanced localization of longer mRNAs, due to the increased number of loaded ribosomes bearing a binding-competent MTS. Indeed, constitutively localized mRNAs are on average longer than conditionally localized mRNAs."

      Minor formal changes would be:

      Setting the expressions of the fraction in the binding-competent state in l. 118 and the faction of the mRNA-accessible volume in l. 123 in normal math-environments instead of the inline-environment since they are of key importance to the following discussion.

      These two equations (now equations (1) and (2)) are set as distinct equations that are now referred to by their equation numbers later in the manuscript.

      l. 414 contains the verb "vary" twice

      Thank you to the reviewer for pointing out this redundancy, the sentence now reads

      "Translation kinetics can widely vary between genes ... "

      l. 438 lacks an "h" in the word mitochondria

      Thank you to the reviewer for pointing this out, this spelling error has been corrected. The sentence now reads "all mRNA transcripts studied would be highly localized to mitochondria in all conditions."

      Reviewer #1 (Significance (Required)):

      All in all, this is a strong manuscript that contains solid, simple but meaningful and by no means oversimplified models with impactful consequences on the understanding of mitochondrial mRNA localization. Furthermore, it is likely that the approach applies to other cellular compartments like the ER. The research is explained in a remarkably clear and focussed style which makes it easy to follow and meanwhile succeeds in not omitting any details.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Arceo et al. have developed a stochastic, quantitative model of mitochondrial targeting sequence (MTS)-mediated mRNA localization to mitochondria in yeast. They use this model to investigate the role of translation- and diffusion kinetics in controlling mitochondrial mRNA localization of conditional as well as constitutional genes.

      Most importantly, they find that neither mRNA diffusivity nor ribosome density alone are sufficient to account for the differences in localization that were experimentally observed for the two types of genes. Therefore, they implement an MTS maturation time into their model and find that they can now predict gene specific localization rates. Based on these observations, the authors conclude that yeast cells can regulate the localization of mRNAs to mitochondria through (controlling mitochondrial volume fractions and) differences in translation kinetics, which adjust the exposure time and numbers of mature MTSs that are presented on the mRNP and convey binding-competence.

      Major comments:

      Overall, the manuscript is well written and the conclusions are convincing. The underlying assumptions of the model make sense, but I have no background in modelling and can therefore only comment on the RNA biology aspects and general comprehensibility of the work.

      • The authors calculate gene-specific translation initiation and elongation rates to model localization on different transcript classes. In this context,

      (i) They use a single decay rate to estimate trajectory lifetime and this decay rate is such (1 nt / 600 s) that it would take the average yeast mRNA (~ 1400 nt; Smith et al., JCB, 2015) 10 days to be turned over. This is not consistent with physiological decay rates and as a consequence, they are essentially not accounting for mRNA turnover. This should be explained in the Methods.

      The reviewer has highlighted a lack of clarity in our model description. The mRNA decay rate in the model is (1/600) inverse seconds per entire mRNA molecule, rather than (1/600) inverse seconds per nucleotide. This leads the typical mRNA lifetime to be 600 seconds. The sentence in the Methods section describing the decay timescale now reads "The mRNA decay rate is set to kdecay = 0.0017 s-1 per mRNA molecule, such that the typical decay time for an mRNA molecule is 600 s. This decay time is consistent with measured average yeast mRNA decay times ranging from 4.8 minutes (Chan et al., 2018) to 22 minutes (Chia and McLaughlin, 1979)."

      (ii) Translation and decay are intrinsically linked and translation machinery also recruits decay enzymes. What is more, decay rates differ greatly for different mRNA transcripts. I cannot judge how feasible this is, but it might benefit the model if variable decay rates (i.e. modelled based on translation efficiency?) could be included.

      We appreciate this suggestion from the reviewer. We have added a supplemental figure (Figure S4) to explore how mRNA decay rate can impact mitochondrial localization of mRNA. While longer decay rates have little impact on localization, if the decay rate is sufficiently high, the mRNA will have limited opportunity for translation to initiate and a binding-competent MTS to develop, substantially reducing localization. This analysis does not consider how the mRNA lifetime might be coupled with translational effects (such as ribosome stalling). Accounting for the impact of such more complex decay mechanisms would require substantial expansion of the model and extensive additional experiments to parameterize the coupling effects; we believe this extension would be beyond the scope of this manuscript.

      To our Discussion, we have added

      "While we have focused on how variation in translational kinetics between genes can impact mitochondrial mRNA localization, there is also significant variation in mRNA decay timescales (Chia and McLaughlin, 1979; Chan et al., 2018). Our model suggests (see Fig. S4) that the mRNA decay timescale has a limited effect on mitochondrial mRNA localization, unless the decay time is sufficiently short to compete with the timescale for a newly-synthesized mRNA to first gain binding competence. We leave specific factors thought to modulate mRNA decay, such as ribosome stalling (Mishima et al., 2022), as a topic of future study."

      (iii) Along the same lines: Rare codons as well as specific stalling sequences, are known to slow down translation elongation on many transcripts (and will effectively increase MTS exposure time). Can the authors identify transcripts with such signal sequences (on a global scale, apart from TIM50) and incorporate in their model?

      We find this suggestion (and the similar one made by the other reviewer) to explore stalling sequences particularly encouraging, because we are in the early stages of actively pursuing such work. We feel that beyond discussion, exploring the consequences of inhomogeneous elongation is beyond the scope of this work because significant further experimental work would be needed to quantify the impact of specific sequences on translation progress.

      To our Discussion, we have added the following paragraph.

      "In this work our quantitative model has applied uniform ribosome elongation rates along mRNA transcripts, which with ribosome interactions can lead to both uniform and non-uniform ribosome densities and effective elongation rates along the transcript (MacDonald et al., 1968; Duc et al., 2018). With these uniform ribosome elongation rates, previous theoretical results suggest that collisions will be rare (Duc et al., 2018). However, elongation may not be homogeneous along an mRNA transcript, due to factors such as tRNA availability (Varenne et al., 1984), boundaries between protein regions (Thanaraj and Argos, 1996), amino acid charge (Charneski and Hurst, 2013), and short peptide sequences related to ribosome stalling (Sabi and Tuller, 2017). We have found that slow (homogeneous) elongation facilitates mitochondrial mRNA localization, by providing time for MTS maturation, diffusive search, and maintains a binding-competent MTS-mediated mRNA binding to mitochondria. We expect that inhomogeneities in elongation rate along mRNA could either enhance or reduce mitochondrial mRNA localization, controlled by whether slower elongation is in regions that favor longer MTS exposure. For example, a ribosome stall site after the MTS is fully translated could provide more time for MTS maturation and facilitate mitochondrial localization. Future experimental work could identify such stalling sequences and point towards how modeling can improve understanding of sequence impact on localization."

      • Reduced mature MTS exposure time is presented as one of the determining factors that regulate mitochondrial localization of conditionally localized transcripts. For my background, the underlying mechanisms that determine MTS maturation are insufficiently explained. I understand how chaperone recruitment can contribute to MTS maturation. However, it is not obvious to me how receptor binding would account for such long maturation times as the 40 s used here (Fig. 3, 4). I would appreciate if the authors could elaborate and possibly point to directions that their model could be used to study those.

      We agree with the reviewer that the diffusive search time for a chaperone to find a newly-synthesized MTS would be very short (a small fraction of the proposed 40-second MTS maturation time), and we expect that this maturation period is largely controlled by chaperone and co-chaperone interaction timescales. There is a wide range of timescales for newly-synthesized (or misfolded) proteins to productively interact with a chaperone, and the literature provides examples of timescales comparable to 40 seconds, which we now cite.

      To our Discussion, we have added

      "While the diffusive search for a newly-synthesized MTS by chaperones is expected be very fast ( 100 seconds for human chaperone-mediated folding (Wu et al., 2020)."

      We feel that modeling chaperone facilitation of MTS folding, to determine the timescale of this process, is very distinct from the topics covered in our manuscript, and thus beyond the scope of this work.

      • One of the two main conclusions (at least according to the abstract) from the work is that yeast cells modulate mitochondrial volume fractions to regulate mRNA localization to mitochondria. This is a fact, not a novel finding. The other main conclusion, which is that cells use different translation dynamics to control mRNA localization, is intriguing and deserves more attention. It would be great if the authors could suggest/discuss an experimental approach (i.e. a single mRNA imaging experiment quantifying mitochondrial co-localization and translation kinetics of different reporter constructs) to test this hypothesis.

      We appreciate the reviewer raising the point that yeast cells modulate mitochondrial volume fraction to regulate mitochondrial mRNA localization. While we previously showed this relationship between mitochondrial volume fraction and localization, we used experimental techniques (mutations, nutrient sources) that changed many other factors beyond mitochondrial volume fraction. In this work we have used a quantitative model, lacking those extraneous factors, to demonstrate that a change to mitochondrial volume fraction alone can lead to a change in mitochondrial mRNA localization. This work supports our interpretation of those previous experimental results.

      To our Discussion we have added the sentence

      "Previous experimental work suggested that changing mitochondrial volume fraction could control mitochondrial mRNA localization (Tsuboi et al., 2020) --- our quantitative modeling work provides further support for this mechanism of regulating mRNA localization."

      The reviewer also requests a discussion of an experimental approach to test how cells use translational dynamics to control mRNA localization. With the advent of combined mRNA imaging and live translational imaging it would be interesting to directly measure translation in live cells to correlate localization with a time delay. Unfortunately there are currently no published live translational imaging studies in yeast, and thus such a measurement would require the development of the technique in yeast.

      To our Discussion, we have added

      "Experimentally testing our proposal for translation-controlled localization would involve using combined mRNA and live translational imaging (as yet undeveloped in yeast), to directly measure translation and correlate localization with a time delay, presenting a fruitful pathway for future study."

      Minor comments:

      • Figure 1: X axis labels between panel E and F are not consistent. Inset in panel F is mainly and first discussed in text. Please do not show data as tiny inset but as separate panel.

      We have changed the axis label of Figure 1E to match the axis label of Figure 1G (previously Figure 1F). The inset of the old Figure 1F is now the new Figure 1F, and the old Figure 1F is now the new Figure 1G. We have adjusted the Figure 1 caption and the text description of Figure 1 to match these changes.

      Elongation rates of 250 aa per second are not physiological. In mammalian cells elongation has been quantified to proceed between 1 and app. 20 aa per second (Wang et al, 2016; Wu et al., 2016; Yan et al., 2016; Morisaki et al., 2016).

      The reviewer is correct that the elongation rates of 50/s and 250/s too large to be physiological. These large values have been deliberately selected to probe the nonequilibrium behavior of the quantitative model to test the prediction of the simpler four-state model, rather than represent physiological behavior.

      To the text in the Results section discussing Figure 1F, we have added the following sentence.

      "We include unphysiologically high elongation rates to compare to the expected behavior from the 4-state model."

      Panel E: elongation rate range does not match Fig 1F nor median in Fig 3A.

      The reviewer is correct that the elongation rate parameter range of Figure 1E does not match the elongation rates of Figure 1F or the median in Figure 3A. In Figure 1E, we aimed to show that the physiological range of translation parameters can produce a wide range of both MTSs per mRNA and mRNA binding competence for mitochondria.

      We have expanded the description of Figure 1E in the text.

      "By exploring the physiological range of translation parameters, many orders of magnitude of the mean number of translated MTSs per mRNA (β, see Eq. 5) are covered, which also covers the full range of mRNA binding competence (Fig.1E). We find that, for any set of physiological translation parameters, the number of binding-competent MTS sequences (β) is predictive of the fraction of time (fs) that each mRNA spends in the binding competent state (Fig.1E)."

      • Figure 2A and S1: Please explain how ribosome occupancy is defined here and why it is so different between figures

      We have inserted a citation for Zid 2014, to distinguish that the ribosome occupancy measurements in Figure 2A (Zid and O’Shea) and Figure S1 (Arava et al) come from two different techniques. Zid and O’Shea used ribosome profiling to obtain a relative, rather than absolute measurement. While Arava used a technique where they fractioned mRNAs based on the absolute number of ribosomes loaded across 14 fractions of a sucrose gradient, and measured the relative amount of mRNA in each fraction by microarray. So while ribosome occupancy in each paper was calculated in a very distinct manner, the comparison between conditional and constitutively localized mRNAs shows a very similar trend without significant differences in ribosome occupancy between these two classes of mRNAs with either measurement of ribosome occupancy.

      To the caption of Figure S1, we have added

      "These ribosome occupancy values cover a distinct range, in comparison to those of Fig. 2A, due to distinct experimental measurement techniques."

      • Figure 2C: please show experimental data along with model prediction (in the same graph) so that conclusion becomes immediately apparent from figure not just main text. Label clearly (in figure) when experimental and when model data is shown (maybe by using consistent color scheme?)

      We have added experimental data to Figure 2C. Throughout the manuscript, we have kept a consistent color scheme for data for mitochondrial localization for ATP3, TIM50, conditional, and constitutive mRNA, whether from model or experimental data. We have applied distinct line types (e.g. solid for model vs. dot-dashed with circles for experimental).

      • Figure 4B and C: clearly indicate in figure which are experimental and which are modelled data

      In Figures 4B and 4C, we have clarified which data is experimental and which is modeled by adding to the labels for each violin plot. Violin plot labels for model data now read "Model Conditional" or "Model Constitutive" and labels for experimental data now read "Expt Conditional" or "Expt Constitutive".

      • Figure 4D: show experimental vs. model data in same graph (at same axis scaling) for comparability

      We have added the experimental data, previously in the inset of Figure 4D, to the main part of Figure 4D.

      • Line 305: "constitutive" mRNA

      Thank you to the reviewer for pointing out this redundancy, the sentence now reads

      "Figure 3C shows how the localization for the prototypical conditional and constitutive mRNA varies with the maturation time."

      • Line 334: "other changes, such as diffusivity, are unable to separate the two gene groups" - what other changes? The authors only show diffusivity (Fig S3).

      Thank you to the reviewer for pointing this out. We have revised this sentence to only refer to diffusivity changes.

      "While introduction of this maturation time distinguishes the mitochondrial localization of conditional and constitutive gene groups (Fig. 4A vs Fig. 2B), changes to diffusivity are unable to separate the two gene groups (Fig. S3)."

      • Line 403-405: maybe useful to argue against lower ribosome occupancies as drivers of nascent chain complex mobilities: Wang at el, Cell, 2016; single translation site imaging experiments indicating that ribosome occupancy is not the main determinant of mRNP mobility.

      We thank the reviewer for the direction to this paper, which indeed indicates that ribosome occupancy has limited impact on mRNA diffusivity.

      We now cite this paper in our Methods section.

      "Simulated mRNA have a diffusivity of 0.1𝜇m2/s. This diffusivity remains constant across genes and mRNA states, consistent with experimental measurements showing little dependence of mRNA diffusivity on mRNA length (Calderwood et al., 2016) or number of translating ribosomes (Wang et al., 2016)."

      • Line 601-607: include experimental references to explain how measures (25 nm vs 250 nm) were determined/selected.

      The reviewer raises a valuable point, as it is important to motivate these lengthscales used in the model.

      Microscopy with visible light has a lateral resolution limit of approximately 250 nm, often known as the Abbe limit. Accordingly, we assume that mRNA within 250 nm of mitochondria will be measured as adjacent to mitochondria. To the Methods section, we now include a short explanation and a citation.

      Unlike the 250-nm diffraction limit, there is no widely-used reaction range for mRNA binding to intracellular substrates, nor a measurement of the required proximity for an MTS-bearing mRNA to bind to mitochondria. We estimate the 25-nm distance for mRNA binding to mitochondria from the following contributions:

      • The yeast ribosome is 25 - 28 nm in diameter, or 13 - 14 nm in radius.
      • Yeast MTSs have a length of up to 70 amino acids, with 20 estimated yeast MTS lengths having a mean of 31 amino acids. The MTS forms an amphipathic helix (an alpha helix), which has a pitch of 0.54 nm and 3.6 amino acids per turn, so the 31 amino acids will be approximately 5 nm long
      • The MTS will be attached to the ribosome/mRNA by other peptide regions, expected to typically be a few nanometers in length So overall we estimate a 25 nm range for an MTS-bearing mRNA to bind to mitochondria.

      To our methods, we have added this reasoning and accompanying citations.

      "We estimate the 25-nm binding distance by combining several contributions. The yeast ribosome has a radius of 13 - 14 nm (Verschoor et al, 1998). The MTS region, up to 70 amino acids long, forms an amphipathic helix (Bacman et al., 2020) a form of alpha helix. With an alpha helical pitch of 0.54 nm and 3.6 amino acids per turn, a 31 amino acid MTS (the mean of 20 yeast MTS lengths (Dong et al., 2021)) is approximately 5 nm in length. An additional few nanometers of other peptide regions bridging the MTS to the ribosome provides an estimate of 25 nm for the range of an MTS-bearing mRNA to bind mitochondria. The 250-nm imaging distance is based on the Abbe limit to resolution with visible light (Georgiades et al., 2016)."

      Reviewer #2 (Significance (Required)):

      My field of expertise is the development of single mRNA imaging methods to quantify translation/decay dynamics in living mammalians systems. Thus, I cannot judge the significance of this work with respect to the modelling that is presented here.

      However, I do appreciate that one of the main conclusions of this work, which is that cells might use different translation dynamics to control mRNA localization, is truly exciting and could be applied to other types of transcripts (this is exactly what SRP does for ER-targeted mRNAs) as well. Because mechanisms that regulate translation in a transcript-specific manner and in different subcellular localizations have only been described for a handful of cases, I think that this observation is worth following up on and should be appreciated by a broad scientific audience.

    1. Reviewer #3 (Public Review):

      In their study, the authors set up to challenge the long-held claim that cortical remapping in the somatosensory cortex in hand deprived cortical territories follows somatotopic proximity (the hand region gets invaded by cortical neighbors) as classically assumed. In contrast to this claim, the authors suggest that remapping may not follow cortical proximity but instead functional rules as to how the effector is used. Their data indeed suggest that the deprived hand area is not invaded by the forefront which is the cortical neighbor but instead by the lips which may compensate for hand loss in manipulating objects. Interestingly the authors suggest this is mostly the case for one-handers but not in amputees for who the reorganization seems more limited in general (but see my comments below on this last point).

      This is a remarkably ambitious study that has been skilfully executed on a strong number of participants in each group. The complementarity of state-of-the-art uni- and multi-variate analyses are in the service of the research question, and the paper is clearly written. The main contribution of this paper, relative to previous studies including those of the same group, resides in the mapping of multiple face parts all at once in the three groups.

      In the winner takes all approach, the authors only include 3 face parts but exclude from the analyses the nose and the thumb. I am not fully convinced by the rationale for not including nose in univariate analyses - because it does not trigger reliable activity - while keeping it for representational similarity analyses. I think it would be better to include the nose in all analyses or demonstrate this condition is indeed "noisy" and then remove it from all the analyses. Indeed, if the activity triggered by nose movement is unreliable, it should also affect multivariate.

      The rationale for not including the hand is maybe more convincing as it seems to induce activity in both controls and amputees but not in one-handers. First, it would be great to visualize this effect, at least as supplemental material to support the decision. Then, this brings the interesting possibility that enhanced invasion of hand territory by lips in one-handers might link to the possibility to observe hand-related activity in the presupposed hand region in this population. Maybe the authors may consider linking these.

      The use of the geodesic distance between the center of gravity in the Winner Take All (WTA) maps between each movement and a predefined cortical anchor is clever. More details about how the Center Of Gravity (COG) was computed on spatially disparate regions might deserve more explanations, however. Moreover, imagine that for some reason the forefront region extends both dorsally and ventrally in a specific population (eg amputees), the COG would stay unaffected but the overlap between hand and forefront would increase. The analyses on the surface area within hand ROI for lips and forehead nicely complement the WTA analyses and suggest higher overlap for lips and lower overlap for forehead but none of the maps or graphs presented clearly show those results - maybe the authors could consider adding a figure clearly highlighting that there is indeed more lip activity IN the hand region.<br /> In addition to overlap analyses between hand and other body parts, the authors may also want to consider doing some Jaccard similarity analyses between the maps of the 3 groups to support the idea that amputees are more alike controls than one-handers in their topographic activity, which again does not appear clear from the figures.

      This brings to another concern I have related to the claim that the change in the cortical organization they observe is mostly observed in one-handers. It seems that most of this conclusion relies on the fact that some effects are observed in one-handers but not in amputees when compared to controls, however, no direct comparisons are done between amputees and one-handers so we may be in an erroneous inference about the interaction when this is actually not tested (Nieuwenhuis, 11). For instance, the shift away from the hand/face border of the forehead is also (mildly) significant in amputees (as observed more strongly in one-handers) so the conclusion (eg from the subtitle of the results section) that it is specific to one-hander might not fully be supported by the data. Similar to the invasion of the hand territory from the lips which is significant in amputees in terms of surface area. All together this calls for toning down the idea that plasticity is restricted to congenital deprivation (eg last sentence of the abstract). Even if numerically stronger, if I am not wrong, there are no stats showing remapping is indeed stronger in one-handers than in amputees and actually, amputees show significant effects when compared to controls along the lines as those shown (even if more strongly) in one-handers. Also, maybe the authors could explore whether there is actually a link between the number of years without hand and the remapping effects.

      One hypothesis generated by the data is that lips remap in the deprived hand area because lips serve compensatory functions. Actually, also in controls, lips and hands can be used to manipulate objects, in contrast to the forehead. One may thus wonder if the preferential presence of lips in the hand region is not latent even in controls as they both link in functions?

    1. The biggest mistake—and one I’ve made myself—is linking with categories. In other words, it’s adding links like we would with tags. When we link this way we’re more focused on grouping rather than connecting. As a result, we have notes that contain many connections with little to no relevance. Additionally, we add clutter to our links which makes it difficult to find useful links when adding links. That being said, there are times when we might want to group some things. In these cases, use tags or folders.

      Most people born since the advent of the filing cabinet and the computer have spent a lifetime using a hierarchical folder-based mental model for their knowledge. For greater value and efficiency one needs to get away from this model and move toward linking individual ideas together in ways that they can more easily be re-used.

      To accomplish this many people use an index-based method that uses topical or subject headings which can be useful. However after even a few years of utilizing a generic tag (science for example) it may become overwhelmed and generally useless in a broad search. Even switching to narrower sub-headings (physics, biology, chemistry) may show the same effect. As a result one will increasingly need to spend time and effort to maintain and work at this sort of taxonomical system.

      The better option is to directly link related ideas to each other. Each atomic idea will have a much more limited set of links to other ideas which will create a much more valuable set of interlinks for later use. Limiting your links at this level will be incredibly more useful over time.

      One of the biggest benefits of the physical system used by Niklas Luhmann was that each card was required to be placed next to at least one card in a branching tree of knowledge (or a whole new branch had to be created.) Though he often noted links to other atomic ideas there was at least a minimum link of one on every idea in the system.

      For those who have difficulty deciding where to place a new idea within their system, it can certainly be helpful to add a few broad keywords of the type one might put into an index. This may help you in linking your individual ideas as you can do a search of one or more of your keywords to narrow down the existing ones within your collection. This may help you link your new idea to one or more of those already in your system. This method may be even more useful and helpful for those who are starting out and have fewer than 500-1000 notes in their system and have even less to link their new atomic ideas to.

      For those who have graphical systems, it may be helpful to look for one or two individual "tags" in a graph structure to visually see the number of first degree notes that link to them as a means of creating links between atomic ideas.

      To have a better idea of a hierarchy of value within these ideas, it may help to have some names and delineate this hierarchy of potential links. Perhaps we might borrow some well ideas from library and information science to guide us? There's a system in library science that uses a hierarchical set up using the phrases: "broader terms", "narrower terms", "related terms", and "used for" (think alias or also known as) for cataloging books and related materials.

      We might try using tags or index-like links in each of these levels to become more specific, but let's append "connected atomic ideas" to the bottom of the list.

      Here's an example:

      • broader terms (BT): [[physics]]
      • narrower terms (NT): [[mechanics]], [[dynamics]]
      • related terms (RT): [[acceleration]], [[velocity]]
      • used for (UF) or aliases:
      • connected atomic ideas: [[force = mass * acceleration]], [[$$v^2=v_0^2​+2aΔx$$]]

      Chances are that within a particular text, one's notes may connect and interrelate to each other quite easily, but it's important to also link those ideas to other ideas that are already in your pre-existing body of knowledge.


      See also: Thesaurus for Graphic Materials I: Subject Terms (TGM I) https://www.loc.gov/rr/print/tgm1/ic.html

  5. Apr 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Kwon, Huxlin and Mitchell compared motion perception and oculomotor responses in eight patients with post-stroke lesions in the primary visual cortex (V1). Motion perception was measured as peripheral motion discrimination thresholds (NDR) separately in the affected and the intact visual field. Due to restoration training, the NDR thresholds were below chance even in the affected visual field, indicating that some residual motion discrimination was possible. Oculomotor responses were measured as the gain of eye drifts (PFR) after saccades to dot patterns that are coherently drifting inside peripheral, stationary apertures. The authors distinguish between a predictive, open loop component up to 100 ms after the saccade that is entirely based on presaccadic motion processing in the peripheral visual field and a visually-driven component from 100 ms after the saccade that is based on postsaccadic motion processing in the fovea. While the PFR gain of patients in the intactfield was comparable to the data of healthy control subjects from a previous study (Kwon et al., 2019), the predictive, open-loop PFR gain of patients in the affected field was close to zero. This was not the case for the visually-driven PFR. The authors interpret their findings in terms of a dissociation between residual motion perception and absent predictive oculomotor control in patients with V1 lesions.

      Strengths:<br /> The study contains a rare and valuable set of perceptual and oculomotor data from eight patients with lesions in V1, who underwent restoration training. The direct comparison between peripheral motion discrimination and predictive oculomotor responses is interesting and innovative. Also, the distinction between the predictive, open-loop and the closed-loop component of PFR is important. A potential dissociation between motion perception and oculomotor control would be very relevant for the understanding of different pathways of motion processing for perception and oculomotor control and also for the understanding of the effects of restoration trainings after lesions of V1.

      Weaknesses:<br /> The dissociation between perception and oculomotor control in the affected field is primarily based on two results: First, the combination of low PFR gain (Figure 4A) on the one hand and low to medium NDR thresholds (Table 1) on the other hand. Second, the absence of a correlation between NDR thresholds and PFR gain (Figure 4B). However, the data are not as clear-cut. The regression of PRF gain on NDR thresholds in the intact-field predicts that there should be a substantial PRF gain only at NDR thresholds below about 0.3. For the affected field this applies only to three data points of which one shows a substantial PFR and is fully compatible with the data in the intact-field. Hence, the evidence of a dissociation between motion perception and oculomotor control is based on a very small number of data points. This also allows for a different interpretation: instead of assuming separate pathways for motion perception and oculomotor control in patients, the results might also be explained by a different read-out of the same motion signal for perception and oculomotor control, where oculomotor control applies a more conservative threshold and requires a higher internal signal strength than the motion perception.

      The comparison of the patients' data to the data in the previous study (Kwon et al., 2019) is not very informative. First, the patients were considerably older than the participants in the previous study, and an age-matched control group would be favourable. That being said, the fact that the PFR gain was comparable for the intact-field of the patients and the previous study renders age-effects rather unlikely.

      Second, there is no control data for the motion discrimination task, so we don't know what the NDR thresholds and even more importantly what the relationship between NDR thresholds and PFR gain in healthy observers would be.

      We thank the reviewer for their evaluation. We have attempted to address concerns about sufficient sampling from blind-fields with recovery that reached the normal range by collecting additional data, doubling our sample size within that range. This is discussed above in “Essential revisions”, along with the alternative interpretation that perception and oculomotor control might rely on a different threshold in readout. The role of age differences was considered in the original manuscript, but this remains an unlikely factor, as the reviewer notes. With regard to normative NDR threshold data, surprisingly, this has not been published in visually-intact controls in a manner that is identical to that in the present study. However, prior work has established that performance in CB patients’ intact visual fields is normal across a wide range of behavioral measures that include luminance contrast sensitivity, processing of form, color and motion, as well as spatial and temporal frequencies (e.g. Barbur et al., 1980; Morland et al., 1999; Sahraie et al., 2006; Huxlin et al., 2009; Das et al., 2014; Levi et al., 2015). In the present study, we have thus used the intact-field as an internal control for blind-field performance in the same participant, as is standard in the field, expecting that intact-field NDR thresholds should be within the normal range. Verifying this is outside the scope of the present paper, but is now planned for our subsequent studies. Other detailed responses appear below to point by point for the reviewer’s “Recommendations for authors”.

      Reviewer #2 (Public Review):

      This study addresses the oculomotor behaviour of cortically-blind patients (with lesions in V1) who are instructed to perform a saccade toward a cued target placed either in their intact or in the blind visual field. The saccadic target consists in an aperture containing random-dot motion at 75% direction discrimination threshold ("NDR"), and is presented with iso-eccentric similar distractor apertures: with this kind of stimulus, the gaze of normally-sighted participants drifts smoothly in the direction of the target random dot motion immediately after the end of the saccade. Importantly, for some patients, a perceptual training had led to a good recovery of perceptual performance in the blind-field, as documented by the reduction of motion direction discrimination threshold to levels similar to the control healthy participants. Cortically-blind (CB) patients are shown to perform very similarly to control participants in terms of saccade accuracy, but they have longer latency. As for the postsaccadic ocular following response ("PFR"), the eye velocity component projected on the random-dot motion direction Is comparable to controls when the saccade was directed to the intactfield, but the mean PFR is significantly lower for saccades directed toward the blind-field. The authors conclude that V1 lesions result in a previously ignored selective impairment of the automatic transaccadic transmission of visual information that drive the ocular following response. In the supplementary information, it is also shown and the shift of saccadic landing position which is induced by the presaccadic target motion is strongly reduced (yet different from zero) for saccades to the blind-field locations in CB patients.

      The manuscript is very well written and illustrated, and the addressed question is novel and highly interesting. The inclusion in the experiment of locations of the patients' blind-field for which some perceptual abilities had been recovered is particularly interesting. However some major weaknesses fragilize part of the results and undermine the interpretation of results (see below). I also list a series of other minor issues to be clarified or improved.

      Main weaknesses:<br /> 1) Unfortunately, the present data do not allow to strongly support the conclusion that the reduced PFR gain in patients is decorrelated from the motion discrimination performance. As a matter of fact, in Figure 4B the function describing the relation between PFR gain and NDR is reasonably linear in a very limited interval of NDR values (say <0.3), and it should rather be described as a decreasing exponential, or similar, approaching 0 already for NDR~0.3. On the other hand, it is presumably hard to appropriately fit a similar exponential function to the blind-field datapoints, as the majority of the latter lay in the range of NDR threshold (say > 0.4) where the PFR gain would in any case be flat and close to 0. In other terms, in my view there aren't enough blind-field datapoints with low NDR threshold to assess a quantitative difference in the relation between PFR and NDR between CB patients and Control participants.

      Finally, and probably just a misunderstanding of mine, shouldn't the empty circles in Figure 4A and 4B have the same y-coordinate (the PFR gain value)? It does not seem so when looking at these figures.

      2) A second weak point, in my opinion, concerns the interpretation of the results and in particular the exclusion of a role for presaccadic attentional mechanisms. The authors claim (lines 356-358): "That the FEF and its projections to area MT are intact in V1-stroke patients suggests preservation of presaccadic planning and attention selection for the saccade target even when visual input is weak or abnormal in a blind-field" and this is definitely a valuable point. However a number of other physiological mechanisms involving V1 could play a role in the spatially-selective processing of motion and the argument that (lines 368 and ff) "other aspects of saccade pre-planning related to perceptual shifts in the position of motion targets, remain in the blind-field" is not very robust here, considering that the reduction in the angular deviation is very strong in the blind-field (Supplementary Figure 2).

      Here is a speculative alternative interpretation: V1-lesioned patients suffer among others of a specific impairment for spatially-selective motion processing. Unfortunately, the training in peripheral motion discrimination does not test this particular possibility, if I understand correctly, as there was no other distractor aperture containing distracting motion information (see Fig 2A). In contrast, in the main experiment, a lack of spatial selectivity for motion integration may have strongly affected the presaccadic motion discrimination (being more global than local) as well as PFR and postsaccadic landing position shift (although the latter was partly spared). According to this possibility, a simple prediction is that depending on the (randomly determined) motion direction in the distracting apertures, the PFR (the true eye movement, not the projection according to the stimulus motion axis) should be deviated in different directions, coherent with a global integration of motion. Do the available data allow to verify this possibility? In general, I think that it would be interesting to analyse post-saccadic smooth eye velocity beyond the "projected" velocity.

      We thank the reviewer for their evaluation, several parts of which overlap with Reviewers 1 and 3. In particular, the concerns about sufficient sampling from blind-fields that recover motion integration (NDR < 0.35) have been addressed by collecting additional data and performing new analyses, and we have also addressed possible impairments to spatial attention (see above in “Essential revisions”). The discrepancy noted in the y-ordinate between 4A and B is related to those analyses being by subject (4A) versus by visual field location (4B), which we already addressed above, in response to Reviewer 1. Other detailed responses appear below.

      Reviewer #3 (Public Review):

      The human visual system comprises a tangle of neural pathways that subserve different perceptual, cognitive, and motor functions. Unfortunate cases of brain damage can reveal surprising dissociations between the functions of damaged and spared tissue. Perhaps the most famous example is blindsight, when damage to visual regions of occipital cortex leads to subjective blindness in parts of the visual field while sparing some visually-guided actions. Kwon, Huxlin and Mitchell had a rare opportunity to study eight individuals with that type of cortical blindness due to stroke, and put them through a carefully designed regimen of visual training and oculomotor testing.

      The main focus was a particular oculomotor behavior that they term the "post-saccadic following response": when a neurotypical person makes a saccade to an object moving in the periphery, their eyes immediately begin smoothly following the stimulus motion, due to an oculomotor plan made before the saccade began. In this case, the stroke patients were able to regain their ability to discriminate stimulus motion in the "blind" parts of the visual field, but upon saccading to those stimuli they did not show the immediate post-saccadic following response. This surprising result shows yet another splintering dissociation between perception and action, demonstrating that the effects of stroke can be very specific to certain motor actions.

      Strengths:<br /> - The authors masterfully combined several techniques in a rare and carefully chosen sample of participants: neuropsychiatric evaluations, rehabilitation training, psychophysics and eye-movement analyses.<br /> - The analyses that link all those measures together, while complicated and precise, and elegantly and clearly presented.<br /> The study provides a twist on blindsight that is interesting philosophically, while also constraining our models of neural circuitry and informing approaches to rehabilitation after stroke.

      Weakness:<br /> - The unique nature of this study is a strength but also potentially limits its impact: the authors studied one particular type of eye movement with a complicated, unnatural stimulus arrangement. For example, the stimuli were groups of random moving dots windowed through static apertures. These stimuli, which move but also don't, are quite different from real moving objects that people track with their eyes (flying birds, for example). A related issue, which the authors briefly acknowledge, is that the training was specifically directed towards explicit perceptual reports. We therefore don't know if the oculomotor behavior (the PFR) could also be trained.<br /> - The authors rely on traditional null-hypothesis tests (t-tests and correlations) to make binary judgements of whether each effect or difference is "significant" (p<0.05). Some of the conclusions would be more convincing if supplemented with power analyses, bootstrapped confidence intervals, and Bayes factors to evaluate the strength of evidence.

      We thank Reviewer 3 for their evaluation. The choice of stimuli/task and their “naturalness” is addressed in our point by point responses to the “Recommendations for authors” below. We have also revised the manuscript to include boot-strapped confidence intervals, along with other statistics suggested by other reviewers, as noted under “Essential revisions for authors”. Other detailed responses appear below point by point.

    1. Author Response

      Reviewer #3 (Public Review):

      Phillips and colleagues present results obtained by generating loss-of-function mutations in the YAP/TAZ ortholog of the unicellular holozoan Capsaspora owczarzaki. In previous work published collaboratively by the Pan and Ruiz-Trillo labs, the authors had shown that Capsaspora has orthologs of yorkie (yki) and hippo (hpo) and that when these genes were expressed in Drosophila they functioned in a way that was consistent with the well-characterized function of the Hippo pathway in regulating cell proliferation.

      Characterizing the role of the pathway in Capsaspora required the ability to manipulate gene expression in that organism. In this manuscript, the authors describe remarkable progress in that area. They generate lines that stably express fluorescent proteins. Excitingly, they are able to use CRISPR/Cas9 and generate loss-of-function alleles using a donor-template strategy. These accomplishments pave the way for the study of Capsaspora using molecular tools.

      The authors then use these technologies to generate biallelic loss of function mutations in Capsaspora. They find no evidence of defects in cell proliferation either when these cells are cultured by themselves or when they are mixed with wild-type cells. However, they do find evidence of abnormalities in the cytoskeleton. They find that the cells themselves, and the multicellular aggregates that they form are more irregular in shape. The cells appear to adhere to substrates better than wild-type cells. They show surface blebbing that changes in the cell cortex with evidence for altered actin dynamics.

      From these experiments, the authors conclude that the ancestral function of the Hippo pathway is to regulate the cytoskeleton and that its ability to regulate cell proliferation was acquired more recently in evolution.

      The technical achievements are impressive, the experiments are well designed and executed, and are presented clearly. I have no issues with them. However, I feel that two of the main conclusions that the authors make are not justified by the results.

      1) The authors seem convinced that CoYki functions as a transcriptional regulator. They seem to suggest that it is primarily a regulator of cytoskeletal genes. There is a body of work from the Fehon laboratory that Yki has a function at the cell cortex in Drosophila that is independent of its function as a transcriptional regulator. See the work by Xu et al. 2018; PMID30032991 (not cited in this paper). In the absence of data that shows the localization of CoYki, I don't see how the authors can tell where it is working (in the nucleus or at the cell cortex) to regulate the cytoskeleton.

      To provide support for asserting that coYki is transcriptional regulator, we have done the following:

      • We have cited previous results showing that coYki and its binding partner coSd can, when expressed together in the Drosophila eye, induce transcription of Hippo pathway genes, indicating a role for coYki in transcriptional regulation

      • We have examined the localization fluorescent fusions of coYki and a coYki (coYki 4SA) mutant predicted to be nonphosphorylatable by upstream Hippo pathway kinases. Enrichment of coYki at the cell cortex was not detected. However, the 4SA mutant showed increased localization in the nucleus relative to the WT coYki protein, arguing for a nuclear function of coYki.

      These data are therefore consistent with the prevailing view of Yki/YAP/TAZ as a transcriptional regulator in other species. Nevertheless, we cannot formally exclude the possibility that coYki may also affect the cytoskeleton through a non-transcriptional manner as described by Xu et al., which we have now stated in the Results section of our manuscript.

      2) Capsaspora and animals such as ourselves are equally separated by time from our last common ancestor. There is no reason to think that the function of signaling pathways in the Capsaspora lineage has been frozen in time while ours have evolved. Indeed, the amazing diversity of protists is consistent with lots of evolution in every lineage. One could easily argue from the same data that the ancestral function of the Hippo pathway was to regulate cell proliferation and that this was lost in the lineage that led to Capsaspora. As we learn more about the function of the Hippo pathway in diverse organisms, we will be in a better position to guess what the ancestral function was.

      We agree that the function of signaling pathways in modern protists and their ancestors may not necessarily be identical, and that studies of Hippo signaling in other organisms, especially unicellular holozoans, may clarify which functions may have been ancestral, as we make a point to state at the end of our discussion. However, given that in animals Hippo signaling regulates the cytoskeleton and proliferation, and we find that in Capsaspora coYki affects the cytoskeleton but apparently not proliferation, it seems reasonable to us to suggest a model where cytoskeletal regulation was an ancient function, and the pathway was later co-opted for regulation of proliferation. We have added a section in the Discussion pointing out that we cannot, from our results, definitively conclude an ancestral Hippo pathway function.

      In summary, this manuscript describes technological innovations that will have a big impact on those who want to study this organism. They also provide convincing data to show that the Capsaspora Yorkie ortholog regulates cytoskeletal dynamics and not cell proliferation. However, as described above, the authors would need to tone down some of their conclusions.

    1. Attribution Theory Attribution theoryA process theory of motivation holding that that people are motivated according to what they believe underlies other people’s actions and attitudes. holds that people’s behavior is motivated by how they interpret the behavior of others around them. For instance, we may think that what’s causing others to act as they do is a combination of internal, personal factors. On the other hand, we may think that their behavior is a product of environmental variables.

      impacts

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the referees for their valuable suggestions. We have revised the text accordingly and already conducted most of the requested experiments.

      Reviewer #1

        1. The authors state that addition of mannan increases length of Birbeck granules however, no data are presented. It would make this more convincing when the length is compared between conditions with and without mannan (as shown in Fig 4, where the condition without mannan is lacking).

      Reply: Thank you for pointing out the missing data. We added an EM image of Birbeck granules and quantification of Birbeck granules formation in the absence of mannan (Figure 4A-D).

      • Supp, fig 1B perhaps as a panel in main figure as this is an important control to show that Birbeck granules are isolated.

      Reply: We moved the supplemental figure 1B to main figure 1D.

        1. Only the(total) length of Birbeck granules is taken into account, but not the number of Birbeck granules. Is it possible to quantify the number of Birbeck granules.

      Reply: We added Figure 4D to show the number of Birbeck granules. Note that the difference in the number of Birbeck granules was less significant than that of total length because there were numerous short fragments in the mutant specimen.

      • Fig 5. Only the condition (ARGK) where there is virtually no Birbeck granules formation is included, however, is virus still internalized in the other conditions (MRGD or MRGK) as Birbeck granule formation was less effective but still present? It would be interesting to include those mutants. A more specific quantification would be by p24 ELISA. Is there a reason why immunoblotting has been chosen? In the supernatant condition, explain why the virus p24 seems less in the control condition whereas one would expect max concentration in that condition.

      Reply: Thank you for suggesting the use of ELISA. We chose immunoblotting because of its higher sensitivity and lower cost. But ELISA is advantageous when it comes to comparing large number of samples. We performed p24 ELISA and quantified the virus internalization in all the mutants available (Figure 5C). As you pointed out, the transfer efficiency of the immunoblot in Figure 5A was not uniform across the membrane; Pr55 bands became denser toward the right, while p24 bands had a gradient in the opposite direction. The immunoblots and ELISA showed that about ~1% of the viruses were attached or internalized and ~99% did not interact with the cells. Thus, the attached/internalized viruses did not affect the amount of viruses in the supernatant. Results of ELISA also showed the amount of viruses in the supernatant were nearly equal among the samples (Figure S3B).

      • Abstract First sentence: not mucosal tissue but mucosal epithelium Last sentence: Virual should be viral

      Reply: We corrected the typo. Thank you.

      • Discussion The last section comparing DC-SIGN and langerin is not clear and some overstatements are made. "Considering that DC-SIGN serves as an attachment receptor for viruses but not as an entry receptor, the possible structural coupling of lateral ligand binding and internalization implies that langerin functions as a more efficient entry receptor for viruses than DC-SIGN or other C-type lectins." It is not correct that langerin but not DC-SIGN can function as an entry receptor. DC-SIGN has been shown to facilitate infection of different viruses such DENV and ZIKV. In contrast, langerin can restrict viruses such as HIV-1 but also facilitate infection for example Influenza A and DENV. So attachment or entry is more likely a consequence of the internalization and dependence on pH changes for fusion as some viruses such as DENV fuse in acidic vesicles. This needs to be discussed more clearly.

      Reply: Thank you for pointing out our wrong statement. We replaced the statement with weakened one as below:

      Page 13, line 213: “The difference in the ligand-binding manner between langerin and DC-SIGN may contribute to their different carbohydrate recognition preferences (Valverde et al., 2020; Takahara et al., 2004).“

      Reviewer #2 1) Langerin can exist on the cell surface and in Birbeck granules. They should examine langerin cell surface expression in the 3 states, wildtype, mutated and lectin - . Do the mutations change cell surface expression?

      Reply: We performed surface labeling experiments and showed that those mutations did not affect surface expression of langerin (Figure S3A).

      2) Birbeck granules are present in the absence of mannan and pathogens (see Pena-Cruz JCI 2018, PMID: 29723162). Thus, this suggests that Birbeck granules are present even without langerin clathrin coated pit internalization from the cell surface. How does their model account for this observation?

      Reply: We think there are two possibilities:

      1. Birbeck granules were shown to stem from the endoplasmic reticulum (Valladeau et al Immunity 2000; Lenormand et al PlosONE 2013). Since the rER is the site of glycosylation, langerin is likely to capture the oligo-mannose-glycosylated proteins within the rER and form Birbeck granules.
      2. Blood plasma proteins such as immunoglobulin D, immunoglobulin E, and apolipoprotein B-100 are reported to carry high-mannose glycans (Clerc et al Glycoconj J. 2016). Those glycoproteins in the cell culture media can induce Birbeck granule formation.

        3) Different cell types can have varied Langerin levels (see Pena-Cruz JCI 2018, PMID: 29723162). Is Birbeck granule formation depend on certain level of langerin expression? Do Birbeck granules form when Langerin is present at low as compared to high levels?

      Reply: In the course of the experiments, we isolated a cell line stably expressing langerin. However, langerin expressing cells were extremely slow in proliferation and the expression levels were low. To answer this question, we recovered this “failed” stable cell line and found that the low langerin-expressing cells can form Birbeck granules, but with lower efficiency (Figure S3C-E).

      4) Authors use immunoblots to show that HIV is present in intra-cellular Langerin structures. It would be ideal to visualize HIV with presumably internal Birbeck granules using imaging techniques such as cryo-electron micrography or another form of high resolution imaging.

      Reply: We are currently working on ultra-thin section electron microscopy of HIV-infected langerin-expressing cells. Visualization of HIV-containing Birbeck granules using cryo-electron microscopy is highly challenging because the current precision of cryo-FIB-SEM milling technique is too low to target a specific intracellular structure. We believe conventional electron microscopy will provide sufficiently convincing evidence that HIV is present within Birbeck granules.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the referees' rigorous review of our manuscript and for their overall positive reception of our work. We have pasted below the entirety of the reviewers’ comments, interleaved with our responses.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Gama et al. use a biophysical assay DAmFRET, structural analysis, and optogenetic tools to uncover the nucleation mechanism of CBM signalosome. They performed experiments first in yeast cells that lack death folds or related signaling networks, then confirmed their discoveries in human cells. The results presented here are clear and convincing. The paper is very well presented and clearly written.

      They found it is the CARD domain of BCL10 that acts as a molecular switch that drives all-or-none activation of NF-kB. Monomeric BCL10 possesses an unfavorable conformation and serves as a nucleation barrier, keeping BCL10 in a supersaturated inactive state that allows for binary activation upon stimulation.

      They also characterized CARD9 CARD domain and a coiled-coil region. They reasoned that CARD9CARD functions as a polymer seed to nucleate BCL10, and that the coiled-coil region has multimerization ability to facilitate nucleation. Furthermore, they characterized that MALT1 activation doesn't depend on BCL10 polymers but its own proximity. And MALT1 induces graded NF-kB activation, thus further demonstrating the binary activation is conferred by BCL10.

      Major comments:

      1. Fig S1D and E, the authors used TNF-a to activate NF-kB independent of CBM signalosome and found the activation in each cell increased with dose. In contrast, CBM activation led to bimodal cell activation. The authors claim that this is evidence that positive feedback upstream of NF-kB. We do not believe this claim can be made from this comparative experiment alone. We agree that positive feedback is important for activating an NF-kB response, but the comparison between CBM and TNFa is inaccurate and glosses over published data. Specifically, there is published data that TNF-a does activate a 'switch-like' or digital response, as defined by the translocation of p65 (see (Tay et al. 2010) among other studies that have examined p65 translocation at the single-cell level). The difference in T-sapphire expression between CBM and TNF activation is most likely due to TNFa induced oscillations of p65 translocation (although this is speculation on our part). Therefore we suggest to the authors that the TNF-a data (Fig S1D and E) should be omitted, as the claim of switch or not-switch as pertains to TNF signaling is more complex and nuanced than presented here. We believe omitting this data will strengthen the manuscript and avoid confusion in the field. The bimodal expression of the T-sapphire NF-kB reporter driven by the CBM signalosome activation is sufficient to claim an all-or-none response.

      We thank the reviewer for this suggestion. We acknowledge that the activation of NF-κB by TNF-ɑ is more complex than we had presented, and agree that the differences in T-Sapphire reporter output could be attributed to p65 oscillations. We had not previously considered this interesting possibility -- which is not addressed by the present data -- believe it is worth future investigation. As suggested by the reviewer, we have now omitted the TNF-a data, and agree that this change does not impact the overall claims of the paper.

      Fig 3B, the authors introduced CARD9CARD-µNS as a stable condensed seed for BLC10. However, considering CARD9CARD can form polymers at high concentration (Fig 3B and S3D), are these high expression levels of CARD9CARD able to induce BCL10-mEos3.1 assembly (as measured by DamFRET in yeast cells)? Can the authors examine BCL10 FRET at these high expression level of CARD9CARD? We assume that BCL10 will be assembled in these cells. This would provide a valuable control experiment and support the author's conclusions.

      Indeed, this question is amenable to DAmFRET. Accordingly, we have now performed DAmFRET of yeast cells expressing Bc10-mEos3.1 in the presence of either CARD9CARD-mCardinal or mCardinal itself (see new Fig S6A and B, and associated results section). We confirmed that cells with high CARD9CARD-mCardinal expression had higher FRET on average than cells with low expression. Importantly, cells expressing high or low levels of mCardinal itself had the same FRET level (Fig S6).

      Fig 3C, the text said "Whereas WT CARD9CARD assembled into polymers at high concentration, the pathogenic mutants R18W, R35Q, R57H, and G72S failed to do so (Fig 3C and S7B,C), explaining why they cannot nucleate BCL10". This claim that these mutants can not nucleate BCL10 does not have a figure call out or a reference. The authors then show the results in Fig 3E which supports this claim. Even though they were done in the context of full-length CARD, all proteins contain the I107E mutation that releases autoinhibition. For clarity, the authors should consider rearranging the text to avoid explaining a phenomenon and making conclusions before showing the results.

      We have now rearranged this section to match the figures and claims.

      Fig 4D, E and Video 1, the authors showed the nucleation of BCL10 into puncta within live cells is followed by p65 translocation to the nucleus. The authors claim that 'this result suggests that BCL10 is indeed supersaturated prior to stimulation' (paragraph 2 section titled BCL10 is endogenously supersaturated'). We fail to understand how this live-cell experiment leads to the conclusion BCL10 is supersaturated before stimulation. We think this text should be deleted from the text, or put into context with the DAmFRET data that lead the authors to make this claim. It would be interesting for the authors to define in discussion what are the golden criteria to claim a protein exists in a supersaturated state with live cells (by microscopy or other methods)? Adaptor protein assembly into puncta and the subsequent nuclear translocation of transcription factors is a common phenomenon across signalling pathways. Not all these pathways rely on signaling adaptors existing in a supersaturated state. The field of cell signaling (and cell biology in general) would benefit from a detailed definition of how these physical-chemical definitions of proteins are supported by experimental data. We believe that this paper will become a seminal paper in the field, and future work will benefit from a clear definition of how a claim of supersaturation is derived from the data.

      We appreciate that the concept of supersaturation will be foreign to many biologists, and welcome this opportunity to elaborate. We have now rephrased the corresponding results section for figure 4D, E, and have added new evidence to support our claim that BCL10 is supersaturated, as had been requested by reviewer 2 (see below in response to point 1). Supersaturation, as we (correctly) use the term, occurs when the concentration of a protein in solution exceeds its equilibrium solubility for the given conditions. The term is also sometimes used to describe __global __protein “concentrations” in excess of the solubility limit, even if a dense phase has already formed and potentially depleted the effective concentration (in solution) to the solubility limit. This is a key distinction, as only the former implies a high-energy out-of-equilibrium scenario that predetermines a future change -- release of the excess energy via phase separation.

      How does one experimentally determine if a protein is supersaturated? In theory, one may conclude that a protein is supersaturated if its assembly causes a net loss of energy from the system (i.e. exothermic). Unfortunately, it is likely not yet possible to perform such measurements with sufficient sensitivity inside a living cell. However, it is possible to infer that a protein is supersaturated if assembly can be shown to occur without a net input of energy to the system, i.e. without any change in thermodynamic control parameters such as temperature, pH, post-translational modifications, concentration of the protein, or concentration of any interacting factor. To do this, one introduces a substoichiometric amount of pre-assembled protein to the system. This manipulation will trigger assembly if the protein is supersaturated. If the protein is instead subsaturated, assembly will not occur and the exogenously added assemblies will simply dissolve. This phenomenon, known as “seeding” in the prion field, is considered a golden criterion sufficient to conclude that a protein has prion behavior. However, because bona fide prions additionally require a means for dissemination between cells, seeding analyzed at the cellular rather than population level is more appropriately considered a sufficient criterion for supersaturation (which is a prerequisite for classical prion behavior (Khan et al. 2018)). Our CARD9CARD-Cry2 experiment was designed to test this criterion. Specifically, it allowed us to introduce a seed independently of receptor activation, thereby precluding any orthogonal cellular response that might lower Bcl10 solubility through e.g. a post-translational change. That the seeds were substoichiometric is evidenced by the fact that Bcl10 polymerized homotypically following stimulation (i.e. it didn’t just bind to the CARD9CARD puncta, but went on to deposit onto itself).

      How does assembly under this scenario differ in principle from the many examples of puncta formed by other signaling proteins that occur upon stimulation of their respective pathways? Puncta formation that is induced by a thermodynamic change in the cell cannot be said to have resulted from pre-existing supersaturation. Rather, the stimulus may have caused some change that either increases the effective concentration of the protein (e.g. upregulates its expression, induces a post-translational change that activates it, or releases an inhibitory factor) or reduces solvent activity (e.g. change in pH).

      An additional requirement (necessary but not sufficient) is that the assembly must be regular with respect to some order parameter. That is to say, it must be a bona fide “phase”. At a minimum, this implies a uniform density. Additionally, for supersaturation to persist over biological timescales under physiological conditions and confinement volumes, the assembly (once formed) must also have structural repetition in at least two dimensions, i.e. crystallinity (Rodríguez Gama et al. 2021; Zhang and Schmit 2016). We know this to be true for Bcl10.

      Rodríguez Gama A, Miller T, Halfmann R. 2021. Mechanics of a molecular mousetrap-nucleation-limited innate immune signaling. Biophys J 120:1150–1160. doi:10.1016/j.bpj.2021.01.007

      Khan, T., Kandola, T.S., Wu, J., Venkatesan, S., Ketter, E., Lange, J.J., Rodríguez Gama, A., Box, A., Unruh, J.R., Cook, M., et al. (2018). Quantifying nucleation in vivo reveals the physical basis of prion-like phase behavior. Mol. Cell 71, 155-168.e7.

      Zhang L, Schmit JD. 2016. Pseudo-one-dimensional nucleation in dilute polymer solutions. Phys Rev E 93:060401. doi:10.1103/PhysRevE.93.060401

      Regarding the supersaturated state of BCL10, the authors convincingly use optogenetics to show how transient assemblies of CARD-Cry2 can template BCL10 assembly. This is a convincing experiment that shows templated nucleation of BCL10. To strengthen the claim that BCL10 is supersaturated endogenously we suggest the author quantify the expression of BCL10-mScarlet and CARD-Cry2 and ideally show that this phenomenon can be observed at expression levels equivalent to endogenous.

      As stated above, that BCL10-mScarlet formed polymers that we observed to elongate homotypically off of the CARD9CARD seeds indicates that the protein was supersaturated under the conditions of the experiment. The concentration of CARD9 is not a relevant parameter in this case. We had already compared the expression of BCL10-mScarlet to endogenous BCL10 in 293T, THP-1, and human fibroblast cells by quantitative immunodetection (Fig. S10D), revealing that the expression level of our BCL10-mScarlet constructs matched that of endogenous BCL10, which was approximately the same in all cell lines. We also compared the distribution of expression levels of BCL10-mScarlet versus that of endogenous BCL10 using antibody staining followed by flow cytometry, which confirmed that the range of expression levels of BCL10-mScarlet falls within that of endogenous BCL10 in 293T cells (Fig. S10F). Hence, we believe our data suffice to conclude that Bcl10 is supersaturated at endogenous levels of expression.

      Minor comments:

      1. Special character "delta" is not displayed in the text (instead only a space).

      This error occurred upon exporting the manuscript from our text editor to a PDF. We now have made sure all special characters are present in the PDF version.

      Several cell lines including mouse, human, and yeast lines were used across this manuscript. It would be clearer and more helpful if the exact cell type of the line could be indicated. Such as, "BCL10-mEos3.1 yeast cells" instead of "BCL10-mEos3.1 cells", "BCL10-mScarlet HEK293T cells" instead of "BCL10-mScarlet cells".

      We have now modified all instances to indicate the origin of the cell lines tested.

      Fig 5B, the authors indicated that BCL10 colocalized with CARD9CARD, then please show the merged image as well.

      We have now included the merged image to indicate colocalization in the inset images.

      Fig 6E, authors claimed that cells were stimulated with blue light for the indicated durations. The longest duration is 12 hours. Please specify if it was continuous exposure or several rounds of exposure in the indicated durations.

      We have now specified in the figure legends, text, and methods section, that this specific experiment used a continuous exposure of blue light.

      Reviewer #1 (Significance (Required)):

      This work used a combination of FRET and optogenetic tools to engineer CBM signaling and visualize the effects. They incorporated knowledge from structure biology, together with their results from mutations and truncations, dissected the significance of each protein in CBM signalosome, and demonstrated in detail how higher-order assemblies make all-or-none cellular decisions. We believe this paper will be a seminal paper in the field of cell signalling and cytoplasmic organization. It defines a new paradigm of macromolecules assembly of signalling complexes as being dependent on protein existing in a supersaturated state. Importantly this paper opens up new questions regarding macromolecular signaling complexes (found in many innate immune signaling pathways): How is protein supersaturation maintained and used throughout evolution to construct biochemical signalling switches?

      This paper will be of particular interest to scientists working on immunity and cell signalling, especially in the field of higher-order assemblies. However, we feel the impact of this paper goes beyond these fields, and we believe this manuscript will be of broad interest to the cell biology and biophysics communities. For reference, our expertise is in innate immunity and cell biology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript entitled "A nucleation barrier springloads..." Rodriguez-Gama et al. dissect the assembly mechanism of the signalosome, composed of the proteins CARD9, BCL10 and MALT1, using a novel in-cell biophysical approach (DAmFRET). They first overexpressed fluorescently tagged versions of the proteins to promote their assembly in yeast and mammalian cells, finding that CARD9 forms higher order assemblies across a wide range of concentrations with no discontinuity in the DAmFRET profile. In contrast, the DAmFRET profile of BCL10 showed a clear separation between monomers and higher order assemblies, which started to form spontaneously only at higher BCL10 concentrations. Furthermore, at the two states of the proteins co-exist at all concentrations. These observations imply that there is a nucleation barrier to forming BCL10 assemblies. MALT1 showed no change in FRET regardless of its expression level. These observations, alongside fluorescence microscopy of the assemblies, and previous structural studies, suggest that BCL10 forms self-templating polymers that act as a switch for an all-or-nothing immune response, assayed in this case by monitoring the nuclear translocation of the NF-kB subunit p65. The authors also assessed the effects of known disease-causing mutations on the nucleation barrier, showing that changes in the strength of the nucleation barrier can have major effects on signalosome function. Finally, they used optogenetic methods to trigger assembly of individual signalosome components, providing insight into the minimal components/conditions required for signalosomes to work.

      Major comments

      Overall, the experiments by Rodriguez-Gama et al. offer convincing evidence that there is a nucleation barrier to BCL10 polymerisation, and that a CARD9 template is sufficient to overcome the barrier. Although the existence of a nucleation barrier had already been postulated, based on structural and other studies (referenced by the authors), it had lacked a rigorous demonstration. This work provides that demonstration, which is important for the signalosome field and more broadly applicable to researchers studying cellular decision making. The study further demonstrates that DaMFRET is an excellent to study protein assembly processes in their native environment, allowing the authors to tackle a question that would have been technically very difficult to address otherwise. The optogenetic experiments are a nice sufficiency test for their ideas.

      We feel there are a few key points to address before publication.

      1) One of the main conclusions is that spring-loading the nucleation barrier with high super-saturating BCL10 concentrations allows a decisive response. Although much of the data strongly imply this conclusion, the dependence of the immune response on BCL10 concentration was not tested directly. A key prediction of the nucleation barrier is that at concentrations below saturation, BCL10 should not be able to induce an all-or-nothing response when stimulated. At saturated/super-saturated concentrations BCL10 should be able to induce a response. At deeply super-saturated concentrations the response should start to be activated spontaneously in the absence of an external stimulus. These predictions could be tested using the doxycycline-inducible BCL10 system (Figure S2D), without establishing major new experimental avenues. We feel that such an experiment would strengthen the main conclusion. It might also help to shed light on whether being highly supersaturated enables a more decisive response than being just saturated.

      This is a great idea. As the reviewer suggested, our Doxycycline-inducible BCL10 system enables us to induce and track the state of BCL10 over time. We have now performed the requested experiments (Fig. S9D, E) and incorporated the results into the relevant section of the text. In short, our new analyses show that BCL10 indeed has a concentration threshold for activation by stimulation, and that it can also nucleate spontaneously when overexpressed. Note that our original analyses in Fig. 4B and C also demonstrate spontaneous BCL10 activation at high concentrations. With this new evidence and the orthogonal approaches used in Fig. 5, we believe our data definitively support our conclusion that BCL10 is supersaturated.

      2) Intuitively, readers might expect that if BCL10 is supersaturated then, once nucleated, it would rapidly assemble at the nucleation sites. In Figure 5B, CARD9CARD-miRFP670nano-Cry2 assemblies are optically induced throughout the cell. However, BCL10 appears to nucleate at just a few sites with a few minutes delay. More widespread nucleation and growth of BCL10 polymers seems to take longer (20-40 minutes, Figures 5B and 5C), after CARD9CARD-miRFP670nano-Cry2 has disassembled. Furthermore, in Figures 4D and 4E, very few BCL10 assemblies are visible/quantifiable after 70 minutes PMA exposure, but p65 has clearly entered the nucleus. It looks like BCL10 assembly slightly lags behind p65 nuclear entry. Can the authors provide a more detailed explanation of these kinetics?

      We do note that the number of CARD9CARD clusters formed upon opto-stimulation exceeds the apparent number of BCL10 nucleation sites. We believe this is consistent with nucleation-limited kinetics, where the clustering of CARD9-CARD increases the local probability of nucleation. As nuclei form and grow, they lower the probability of subsequent nucleation elsewhere in the cell. Additionally, it is possible that our artificial seeds do not perfectly mimic the native CARD9 seeds that form upon natural stimulation (e.g. due to potential steric interference from the fluorophore and Cry2). We also acknowledge that there is a slight delay in the visible appearance of BCL10 polymers relative to p65 nuclear translocation. We expect that MALT1 activates already when the polymers are still too small to see (sub-resolution), whereas the polymers only become microscopically visible once they’ve grown quite a bit more.

      3) Related to point 2 above, in Figure 5D, the leftmost cell in the field of view clearly contains CARD9CARD assemblies but there are no BCL10 assemblies and p65 is not imported into the nucleus (in contrast to the central cell in the field of view). How often does CARD9CARD optogenetic assembly lead to BCL10 assembly? In other words, can the authors quantify the cell-to-cell variability in this experiment?

      Throughout our experiments, whether analyzing BCL10 puncta formation, NF-kB transcriptional activity, or p65 translocation, we observed a persistent nonresponsive fraction of cells even at saturating levels of stimulation. Specifically, approximately 30% of THP-1 cells failed to acquire T-Sapphire fluorescence or form BCL10-mEos3.2 puncta when stimulated with high levels of β-glucan (Fig 1B and E, respectively), and approximately 25% of 293T cells failed to acquire T-Sapphire fluorescence or exhibit p65 nuclear translocation when stimulated with high levels of PMA (Fig 1C and Fig 4E, respectively). Because these numbers did not depend on whether BCL10 was endogenously or exogenously expressed, we know that the underlying cell-to-cell heterogeneity involves factors upstream of BCL10. Indeed, the fraction of recalcitrant cells drops to 10% in our optogenetic experiments that bypass upstream factors (Fig S11E). Possible sources of heterogeneity include different physiological states of the cells or fluctuations in the expression levels of any upstream factor in the signaling pathway. We believe that this phenomenon is not unique to the CBM signalosome, as we (unpublished) and others (Fernandes-Alnemri T et al, 2009, Dick M et al, 2016) have similarly observed a fraction of non-responding cells upon activation of the inflammasome, which involves nucleation-limited polymerization of the adaptor protein ASC. While this phenomenon is interesting and may be important to our understanding of the full complexity of signalosomes in vivo, we believe that identifying the source of heterogeneity would be outside the scope of the present manuscript. We now describe this phenomenon in the final paragraph of the “Endogenous BCL10 is constitutively supersaturated” section.

      Fernandes-Alnemri, T., Yu, JW., Datta, P. et al. AIM2 activates the inflammasome and cell death in response to cytoplasmic DNA. Nature 458, 509–513 (2009). https://doi.org/10.1038/nature07710

      Dick, M., Sborgi, L., Rühl, S. et al. ASC filament formation serves as a signal amplification mechanism for inflammasomes. Nat Commun 7, 11929 (2016). https://doi.org/10.1038/ncomms11929

      Minor comments

      While the work is scientifically well done, the text reads as though it is meant for experts rather than a broad audience. This is a pity because it risks alienating readers. We suggest that some adjustments to the text (mainly additional explanations and not ruling out alternative interpretations of the data) would widen the audience and increase the impact of this important study. Below are some suggestions that might help.

      1. In the first results section, the authors write: 'This suggests that Bcl10 but not CARD9 assembly occurs in a highly cooperative fashion that could, in principle (Koch, 2020), underlie the feed forward mechanism.' It isn't obvious how Figure 1 leads to this statement. Could the authors give a more detailed explanation?

      We have now revised the text to elaborate on this interpretation.

      One limitation of DAmFRET is that it can only detect a nucleation barrier where there is a difference in FRET between the monomer and the assembled form of the protein. However, it can't necessarily detect when there is not a nucleation barrier i.e. if there's no difference in FRET. The text seems to suggest that CARD9 and MALT1 don't have nucleation barriers to their assembly. While this might not be intentional, it would be helpful to explicitly state that CARD9 and MALT1 could also possess such barriers that are not detectable by this method. This wouldn't detract from the finding that BCL10 has a barrier that plays an important function.

      The reviewer is correct that DAmFRET would not be able to detect a nucleation barrier if the assembled phase does not condense the fluorophore to a sufficiently high density for FRET to occur. In our experience, this is only a concern for very large proteins whose bulk “dilutes” the fluorophores within the assembly. Death domains, on the other hand, are only ~ 3 nm in diameter, and FRET occurs within a range of ~10 nm; hence we think it very unlikely that the death domains could be forming cryptic polymers that escape our detection. In any case, when assembly does produce a change in FRET, we can with confidence determine how strongly that form of assembly is governed by concentration. Hence, for CARD9, which does produce a FRET signal upon assembly, we can say that assembly has a smaller intrinsic nucleation barrier than that of BCL10. We further eliminated the possibility of multi-step nucleation (which would reduce the apparent nucleation barrier relative to the one-step ideal case) for CARD9 by showing that artificial condensates of the protein expressed in trans do not influence the concentration-dependence of FRET (Fig. 4 B). Finally, under all conditions where CARD9 lacked FRET, it also lacked signaling activity, suggesting there is not a cryptic functional assembly that evades our assay. Likewise MALT1, which lacked FRET at all concentrations, was entirely unable to activate NF-kB upon overexpression (Fig. S8 A and B), suggesting that it too is not forming a cryptic functional assembly that evades our assay. We therefore feel confident in our conclusion that CARD9 and MALT1 lack nucleation barriers of a magnitude comparable to that of BCL10. Note that our claim is not that they entirely lack a nucleation barrier (CARD9 after all does form a multi-dimensionally ordered polymer), but rather that we fail to observe a nucleation barrier and hence any barrier that may exist is insufficient to manifest at the cellular level.

      In the final results section, the idea that MALT1 activation doesn't depend on BCL10 polymer structure doesn't necessarily follow from the data. An alternative interpretation is that optogenetic clustering of MALT1 causes it to recruit BCL10 and form BCL10-MALT1 filaments (structure solved by Schlauderer et al., 2018). Also, the optogenetic clustering of MALT1 may mimic some structure found in the BCL10 cluster. Therefore, we are neither convinced that the data unambiguously show that MALT1 activation strictly depends on multi-valency rather than an ordered structure of BCL10 polymers nor that this conclusion is truly necessary for the paper.

      We agree that the reviewer’s alternative interpretation of this experiment is possible. However, we consider it unlikely because we performed the experiment with MALT1 lacking its Death Domain (residues 126-824), which mediates its interaction with BCL10 (Schlauderer et al., 2018). Our experiments then suggest that MALT1 clustering is sufficient for activation independent of any structuring mediated by BCL10. Nevertheless, we have now performed an additional control in which we treated these cells with PMA to induce BCL10 polymerization. As expected, the NF-kB transcriptional reporter utterly failed to activate in this condition, indicating that MALT1 does not interact with BCL10 polymers when it lacks its death domain. This aspect has been further elaborated in our response to reviewer 3 point 5.

      What optical density do the yeast cells reach during the 16h induction in galactose? If they are in stationary phase, this could affect the assembly status of the proteins being expressed, as the cytoplasm becomes glassy when cells are starved, and this coincides with widespread protein aggregation/assembly (Joyner et al., 2016; Munder et al., 2016).

      In our DAmFRET strategy, we first dilute an overnight culture and regrow the cells to log phase prior to resuspending them in galactose media. Our strain is engineered to undergo cell cycle arrest upon protein induction, hence exponential growth is prevented and the cells do not deplete galactose during the 16 hr induction. We have also performed many time courses of DAmFRET following induction and generally find no qualitative difference between early and late times (unpublished). Early time points simply have lower expression and correspondingly fewer cells in the high FRET state. Importantly, all comparisons between proteins are made with the same 16 hr induction.

      Although these experiments show that thermodynamically lowering the BCL10 nucleation barrier (e.g. by post-translational modifications or protein expression levels) isn't required for a response, they don't rule it out. It would be good to state this in the discussion, as cells may have multiple mechanisms of switching on the signalosome.

      We thank the reviewer for this suggestion and have now explicitly stated in the discussion that our experiments do not argue against possible thermodynamic tuning of the nucleation barrier.

      The discussion compares signalosomes with condensates formed by liquid-liquid phase separation. This is an interesting comparison but it suggests that disordered assemblies would not be capable of performing signalosome-like functions. This needs to be explained more clearly. For example, non-amyloid prions seem to form gel-like assemblies with a high nucleation barrier that are capable of driving heritable traits, likely through self-templating (Chakravarty et al., 2020). Such examples could represent disordered assemblies with signalosome switch-like behaviour. Furthermore, there are examples of condensates that are induced by environmental changes e.g. Pab1 and Ded1 condensates (Riback et al., 2017; Iserman et al., 2020). This potentially allows the proteins to reach high concentrations and remain un-condensed until a change in heat or pH overcomes a nucleation barrier required for condensate formation. Although the condensates aren't self-templating, they seem to require energy for their disassembly. Combined, this also allows switch-like behaviour, where the switch is flipped back to the uncondensed off state once conditions return to normal. In general, crossing a phase boundary can represent a switch-like response. Finally, recent electron-tomography experiments show that ASC puncta comprise clusters of filaments (Liu et al., 2021, biorxiv). CARD9/BCL10 assemblies may have similar ultrastructures and liquid-liquid phase separation may well play a role in their assembly.

      Indeed, we explicitly maintain that liquid phases cannot themselves perform signalosome-like functions. Chakravarty et al. 2020 did not observe amyloids associated with their phenomena, but the relevant experiments were not designed to exhaustively exclude an underlying ordered phase. To the extent that gelation is involved, their observations are fully consistent with ours. IUPAC defines a “gel” as a colloidal network involving a solid phase and a dispersed phase. The existence of a solid phase necessarily implies an underlying disorder-to-order transition, even if limited to small length scales. In the case of gelation associated with liquid-liquid phase separation, nucleation of the ordered phase simply occurs in two steps (first condensation, then ordering). Note also that a liquid phase could in principle give rise to a heritable phenotype if it activates a positive feedback in a molecular biological process involving the protein of interest (e.g. upregulation of its expression or a change in interacting factors). Chakravarty et al. did not exclude such phenomena (it would be very difficult to do so); hence it cannot be concluded that phase separation is responsible for the sustained phenotypic changes.

      We do not fully follow the reviewer’s logic concerning the relevance of Pab1 and Ded1 condensates. These proteins only condense when their respective phase boundaries fall below the endogenous protein concentration, as upon thermal stress. The proteins are not supersaturated in the absence of such conditions (for example, they cannot be seeded), and it is incorrect to characterize the change in heat or pH as overcoming a pre-existing nucleation barrier. The concept of a nucleation barrier only applies under conditions where a phase is thermodynamically favored. It is also misleading to state that the Ded1 and Pab1 condensates require energy for disassembly. Rather, they require energy to disassemble rapidly. Unless the assemblies have accessed a more ordered phase as described above (two step nucleation), involving a lower phase boundary, they will inevitably dissolve after the conditions return to normal.

      We have much prior experience with ASC. Although it has not been explicitly shown, that it forms ordered polymers and can behave as a prionoid in vivo suggests that it very likely operates the same way as BCL10 (i.e. is physiologically supersaturated). That full-length ASC forms clusters of filaments is not relevant (in our view) to the mechanism shown here, which only requires that filaments are indeed formed. Formally, the size of the relevant nucleus determines the minimum length scale at which ordering must manifest in our mechanism. Based on the structure of death domain filaments, this could be as small as tetramers or hexamers (a minimal but structurally complete “polymer”).

      As stated above, and now elaborated in the discussion, our data do not exclude a role of thermodynamic regulation, as could lead to liquid-liquid phase separation, in tuning the nucleation barrier of Bcl10. What they do exclude is that such changes are required for Bcl10 to activate in the first place.

      Can the authors comment on the loss of BCL10 in Echinodermata, Anthropoda, Nematoda? Is there another protein that plays a similar role? Could a CARD or PCASP protein possess self-templating properties? Could other methods of control be at play e.g. protein expression?

      This is a very interesting question! We think the reviewer’s suggested explanations for the loss of BCL10 in those lineages are valid and worthy of future exploration. Nematodes such as C. elegans have lost multiple components of innate immunity. They have very few pathogen recognition receptors and also lack NF-kB! They do, however, have other adaptor proteins that the literature and our unpublished data suggest may have self-templating ability, such as TIR-1. Drosophila also encodes multiple TIR-containing proteins that are essential for innate immunity. In short, it is possible that other proteins have acquired the hypothetically essential role of supersaturation and nucleation-limited signaling in these organisms.

      Figures 1B/1C: Can the authors comment on why the active cells plateau at about 70-75%? This is a striking feature of the plots, but the explanation may not be obvious to readers.

      See our response to major point 3, above.

      Figures 1D/1E: What was the concentration of B-glucan used in this experiment? This could be included in the figure legend. If greater than 1ug/ml this means that the % of active cells in Figure 1B matches the % of cells with BCL10 assemblies in Figures 1D/1E, which is potentially an important point.

      We thank the reviewer for bringing this point to our attention. We have now indicated in the figure legend the concentration of B-glucan used in this experiment (10 μg/ml). That the percentage of active cells in Fig. 1B matches that of cells containing BCL10 polymers in Fig. 1D and E indeed strengthens the stated relationship between BCL10 assembly and NF-kB activation in THP-1 cells subjected to a relatively physiological stimulus. Additionally, we have performed experiments to measure the levels of p65 translocation in THP-1 cells treated with B-glucan that express BCL10-mEos3.2. This data is shown in Figs. S1D and E in response to reviewer 3.

      Use of both 'BCL10' and 'Bcl10' when referring to the protein.

      We have now replaced all instances where Bcl10 was used to follow guidelines for gene and protein name conventions.

      Bruford EA, Braschi B, Denny P, Jones TEM, Seal RL, Tweedie S. Guidelines for human gene nomenclature. Nat Genet. 2020;52(8):754-758. doi:10.1038/s41588-020-0669-3

      In the supplementary figures there are some formatting problems/missing words in the figure legends. In Figure S11 there is a black box covering the lower part of the figure.

      We have now fixed these instances.

      References used in this review

      Chakravarty, A.K. et al. (2020) "A Non-amyloid Prion Particle that Activates a Heritable Gene Expression Program," Molecular Cell, 77(2), pp. 251-265.e9. doi:10.1016/j.molcel.2019.10.028.

      Iserman, C. et al. (2020) "Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production," Cell, 181, pp. 818-831.e19. doi:10.1016/j.cell.2020.04.009.

      Joyner, R.P. et al. (2016) "A glucose-starvation response regulates the diffusion of macromolecules," eLife, 5. doi:10.7554/eLife.09376.

      Munder, M.C. et al. (2016) "A pH-driven transition of the cytoplasm from a fluid- to a solid-like state promotes entry into dormancy," eLife, 5(MARCH2016). doi:10.7554/ELIFE.09347.

      Riback, J.A. et al. (2017) "Stress-Triggered Phase Separation Is an Adaptive, Evolutionarily Tuned Response," Cell, 168(6), pp. 1028-1040.e19. doi:10.1016/j.cell.2017.02.027.

      Schlauderer, F. et al. (2018) "Molecular architecture and regulation of BCL10-MALT1 filaments," Nature Communications 2018 9:1, 9(1), pp. 1-12. doi:10.1038/s41467-018-06573-8.

      Reviewer #2 (Significance (Required)):

      The existence of a nucleation barrier had already been postulated, based on structural and other studies (referenced by the authors), it had lacked a rigorous demonstration. This work provides that demonstration, which is important for the signalosome field and more broadly applicable to researchers studying cellular decision making. The study further demonstrates that DaMFRET is an excellent to study protein assembly processes in their native environment, allowing the authors to tackle a question that would have been technically very difficult to address otherwise.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The study by Rodriguez Gama et al. addresses the molecular function of CBM complex-forming proteins CARD9, BCL10 and MALT1 in the activation of myeloid cells, using optogenetic tools, transcriptional reporters and biochemical approaches. It is known from previous studies that Bcl10 oligomerizes into filamentous oligomeric structures incorporating Malt1, and that these structures are nucleated by receptor-induced activation of CARD proteins such as CARD11 (in lymphocytes) or CARD9 (in myeloid cells), but the mechanism underlying the assembly of the resulting CBM complexes remain incompletely understood.

      The authors develop beautiful optogenetic tools to address this question, and convincingly demonstrate that CARD9-mediated nucleation of BCL10 triggers a binary cellular NF-kB response in a spring-load-like fashion, and identify mutants of BCL10 and CARD9 that impact this capacity. Unfortunately, however, the authors do not do a good job to simplify this complex problem so it can be easily understood. In particular, the choices of mutants, models and experiments are not consistent between figures, and some data seem to be arbitrarily added or omitted. Complex hybrid constructs are also used, without assessing whether these are indeed functional in the corresponding ko cells. The paper would therefore benefit from a major overhaul. We also noticed that the literature is often not cited adequately and have included a (non-exhaustive) list of examples of wrong, incomplete, or erroneous citations below.

      1. The initial observations of binary signaling are derived from a reporter system. Although there are controls to show that the reporter used does not function intrinsically cooperatively, it would be nice to see additional data to show that cooperativity occurs also at the level of endogenous response systems, for instance by qPCR-based assessment of a natural NF-kB target gene (induced for example by TNFa versus B-glucan in THP-1 cells, and by TNFa versus PMA in 293T cells).

      As detailed in the introduction, NF-kB has been shown by multiple labs to activate in a binary fashion. Our manuscript shows that NF-kB activation occurs in a binary fashion both at the level of transcription and at the level of nuclear translocation (upstream of any transcriptional output). While we do agree that additional data could further illustrate the biological significance of our findings, we do not feel it is necessary for our conclusions. Note also that because NF-kB activation occurs in a binary fashion per cell, a simple qPCR experiment would not suffice to extend our findings to the broader Nf-kB regulon. Instead, one would have to use e.g. RNA-FISH or single cell RNA-seq, nontrivial experiments that would take months to complete.

      The cell lines in Figures 1D-E (and also some of the BCL10 mutants used later on) would have been better run in the assays in the early parts of Figure 1. The final conclusion prior to the section The adaptor protein BCL10 is a nucleation-mediated switch is otherwise not justified. This is a central tenet of the paper, that is referred to again, with some other ancillary data to support it. These mutants reappear later in the paper, but it would have been better, and easier to make rescue lines of BCL10 KO in Figure 1, otherwise the logic is lost, and the models seem chosen arbitrarily.

      The choice of experiments in different panels of Fig. 1 resulted from a chronological progression of reagent construction as the project evolved. We do appreciate that switching between the assays may lead readers to doubt one or the other. Therefore, we have now immunostained for endogenous p65 in the same experiment as for Fig. 1D and confirmed that p65 translocated to the nucleus only in THP-1 BCL10-KO cells that have been reconstituted with WT BCL10-mEos3.2, but not E53R. We think this additional evidence along with our orthogonal measurements in other reporter systems confirms our findings that BCL10 nucleation determines NF-kB activity.

      Expression with microNS is not well controlled and gives little real evidence for what is occurring. It is unclear what the concentration of the protein expressed was, but certainly the relative expression of the CARD9(CARD) and the microNS version should be assessed.

      We believe these concerns result from a misunderstanding. We assume the reviewer is referring to the experiment in Fig. 3B. Expression of muNS on its own has no effect on the DAmFRET of other proteins, and we have previously used it in exactly the same way as here (Holliday M et al. 2019 and Kandola T et al. 2021). Please note that muNS fusion proteins in our experiment have an orthogonal fluorescent protein whose spectra do not significantly overlap with those of mEos3.1. The experiment evaluates a protein’s ability, when condensed via its fusion to muNS, to nucleate an mEos3.1-fused protein that is expressed in trans. Fusion of proteins to muNS does not affect their expression levels, as we now show for CARD9CARD-muNS-mCardinal versus CARD9CARD-mCardinal (Fig. S6D).

      Also, the AmFRET profile of CARD9CARD looks very weird, it cannot be compared to BCL10.

      We are unsure in what way the AmFRET profile of CARD9CARD is “weird”. It is fully consistent with expectations and has been thoroughly explained in the text. We suspect the reviewer was bothered by the sharp acquisition of FRET at approximately 100 uM. As explained in the text, this represents the phase boundary, also known as the solubility line, for CARD9CARD polymers, which we previously showed in vitro (Holliday M et al. 2019). Above this concentration, the protein self-assembles without a nucleation barrier, hence the sharp but continuous change in FRET. BCL10 plots, in contrast, show a discontinuous acquisition of FRET, which indicates a nucleation barrier. In order to highlight that the CARD9CARD transition is understood and expected, we have also now added a line to the plot to demarcate the phase boundary.

      We are not convinced of the usefulness of the introduction of a slew of disease-causing CARD9 mutations that may or may not be relevant to the authors' point. The fact that they do or do not function in a specific sub portion of an assay that may or may not be relevant to biological activity seems to be of interest but without biochemical understanding, little is clear.

      While several reports have shown the clinical importance of these CARD9 mutations on susceptibility to fungal infections, little was known about the molecular mechanism underlying their effects. The inclusion of the disease-causing mutants to this paper is justified for the following reasons. First, they demonstrate the relevance of our work to disease. Second, they build off our findings to provide an otherwise unknown molecular mechanism of these mutants. We showed using independent methods that CARD9CARD mutations disrupt the ability to nucleate BCL10, via two different mechanisms. Finally, validating the disease-causing mutations allowed us to use them as controls for subsequent experiments demonstrating that BCL10 is supersaturated.

      The Optogenetic experiments are interesting, but difficult to interpret without evidence that these MALT1 constructs are indeed still functional when expressed in MALT1-deficient THP-1 cells. We do not therefore think that this experiment shows a necessity for clustering to signal, just a sufficiency, and in a highly artificial construct.

      We welcome the opportunity to elaborate on the optogenetic experiments. Since BCL10 and MALT1 are expressed ubiquitously across cell types, the validity of our findings should not depend on the cell type used. Indeed, much of what we already know about innate immunity signalosomes comes from work in HEK293T cells. Our optogenetic experiments using MALT1 were performed in 293T MALT1-KO cells in Figures 6E and F, and employed two distinct functional assays (p65 nuclear translocation and a transcriptional reporter). While our approach employs light to control clustering, similar approaches using (no less-artificial) chemically induced dimerization domains have been used to study caspase activation (Oberst A et al, 2010, Boucher D et al, 2018). Our use of light affords higher specificity, reversibility, and spatial and temporal control over MALT1 assembly than does chemically induced dimerization.

      To demonstrate the necessity of clustering, we have now performed an experiment with MALT1(126-824)-miRFP670-Cry2 expressed in 293T MALT1 KO cells that contain a transcriptional reporter of NF-kB ,as in figures 6E and F. We added PMA to the cells and found that it failed to activate NF-kB (Fig. 6), confirming that the interaction of MALT1 (via its death domain) with polymerized BCL10 is required for activation. Note that MALT1 and BCL10 exist as a soluble heterodimer prior to BCL10 polymerization; hence it is polymerization, rather than the interaction itself, that activates MALT1. That artificial clustering rescues this defect strongly suggests that the effect of polymerization can be attributed to increased proximity rather than some allosteric effect communicated from BCL10 polymers through the MALT1 DD to its caspase-like domain.

      Oberst, A., Pop, C., Tremblay, A.G., Blais, V., Denault, J.-B., Salvesen, G.S., and Green, D.R. (2010). Inducible dimerization and inducible cleavage reveal a requirement for both processes in caspase-8 activation. J. Biol. Chem. 285, 16632–16642.

      Boucher, D., Monteleone, M., Coll, R.C., Chen, K.W., Ross, C.M., Teo, J.L., Gomez, G.A., Holley, C.L., Bierschenk, D., Stacey, K.J., et al. (2018). Caspase-1 self-cleavage is an intrinsic mechanism to terminate inflammasome activity. J. Exp. Med. 215, 827–840.

      In the introduction and other parts of the paper, there are numerous instances where the previous literature in the field is not adequately cited. Examples include:

      • In the introduction, it is weird to cite one original paper (a MALT1 ko study by Ruland et al., 2001; there are several other studies of ko papers for CBM components that would merit being citated along with this study) together with two reviews on that topic (Ruland and Hartjes 2019 and Gehring et al. 2018)

      • In the introduction, the original study by Wang et al., 2002 should be cited together with Rebeaud et al., 2002; the two studies on the same topic were published back-to-back

      • In the introduction, the statement "CARD10 and CARD14 are expressed in nonhematopoietic cells including intestinal and skin epithelia, respectively" should be supported by citations.

      • Still in the introduction, the 2 references for the statement "... CARD14 gain of function mutations cause psoriasis (Howes et al., 2016; Jordan et al., 2012)" are not appropriate. There are several reports of patients with CARD14 mutations (the study by Jordan et al is only one of them) and several CARD14 mouse models that provoke a psoriasis-like phenotype, which would merit being cited.

      • In the following sentence: "Point mutations and translocations involving BCL10 and MALT1 cause immunodeficiencies (Ruland and Hartjes, 2019), testicular cancer (Kuper-Hommel et al., 2013), and lymphomas (Zhang et al., 1999).", the citation style also seems completely random, combining the citation of a single original paper for lymphomas (Zhang et al. 1999) (there are several other important original studies on that topic or recent reviews that could be cited instead), together with a review on immunodeficiencies (Ruland and Hartjes, 2019) and then another single example for a role of BCL10 and MALT1 in carcinoma (the study by Kuper-Hommel et al. is one, but several other original publications exist on the latter topic, showing for example a role in breast carcinoma or glioblastoma).

      • In the first section of the results, the reference cited for endogenous CARD10 expression in 293T cells (Ruland et al., 2001) is wrong, no endogenous CARD10 expression was assessed in that study

      We have now revised the citations mentioned above and other instances to ensure adequate citations in each case.

      Reviewer #3 (Significance (Required)):

      The paper deals with a complex question, namely how the CBM signalosome assembles and functions to stimulate NF-kB signaling. This question is important to the understanding of pro-inflammatory immune responses and basic life sciences in general. As the focal point of the paper is complex, and tools to study such phenomena are at the limit of technical capabilities, this further increases the potential impact of the work.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The characterization of open-ended signalosomes in a number of innate-immunity and cell-death pathways, in particular formed by domains from the death-fold family, has led to the suggestions that these complexes allow a switch-like signalling response suitable for these pathways. It appears that this has been widely accepted. However, these suggestions are based largely on indirect observations and speculation.

      Rodriguez-Gama and coworkers have decided to test these suggestions more directly. Their results confirm the suggestions. Based on my own experience, papers that validate widely adopted suggestions are often not considered seriously by top journals, who are looking for hot topics/paradigm-changing/surprising type results. I would urge the editors to consider seriously work such as in this paper, which directly tests important suggestions and does so at a technically high standard. The authors use a range of ingenious approaches, both with recombinant proteins and in cells, and including proteins from organisms from different parts of the evolutionary tree, to support their interpretations, so it is an extensive and high-quality study. I am impressed that so many different fusion proteins with fluorescent tags continued to function as expected, but I guess the authors controlled for this as much as they could.

      Having said all this, I do get the feeling the authors are "over-selling" the nucleation barrier aspect of these signalling mechanisms. It is clearly an important and critical aspect of signalling in many systems, but then it is not the only important aspect; a number of other regulatory inputs play a role in different systems. So the statement "Our findings introduce a novel structure-function paradigm" in my view is overstretching things somewhat. Further in the Discussion section, the authors state "Existing explanations for the preponderance of ordered polymers in immune cell signalosomes have centered on the functions of multivalency at steady state, such as scaffolding and sensitivity enhancement resulting from the cooperativity of homo-oligomerization". They cite a small (and non-exhaustive) number of papers discussing this topic; all these include "seeding" or "nucleation" as an important part of the proposed mechanism. So I suggest the authors provide a more balanced discussion of this aspect. Different pathways appear to display a different level of switch-like behaviour, and one thing that the current version of the manuscript is missing is more discussion of other death fold-based systems and how the results on the CBM signalosome apply to these, and also other systems such as TIR domain-based ones, which currently get no mention whatsoever. In the CBM system, there seems to be one main nucleation barrier; can there be more than one in others?

      We appreciate the reviewer’s perspective and have now acknowledged in the introduction and discussion additional prior literature that has paved the way for our study. Nevertheless, we maintain -- as now stated in the abstract -- that “our results defy the usual protein structure/function paradigm, and demonstrate that protein structure can evolve via selection for energetic maxima in addition to minima”. We have elaborated in the introduction and discussion how immune signaling provides the functional context in which such a paradigm can evolve, and how our findings uniquely support the paradigm.

      One other aspect I need to express some criticism about is attention to detail - especially with a paper focusing on the physics behind biological processes, I would expect a higher standard of getting the terminology and units correct - see specific examples below. This can obviously be fixed easily.

      Specific points are listed below. No page or line numbers are provided so I have done my best to make it clear what the comments refer to.

      1. Abstract line 6 and throughout: in "NF-kB", the "k" is supposed to be "kappa" (Greek letter) - it stands for "nuclear factor kappa-light-chain-enhancer of activated B cells", not fully defined in the manuscript as far as I can see. Occasionally, small k is also used instead of the small cap K or whatever the authors used most of the time, but I don't think any of them use the Greek letter.

      We had indeed used a version of the small “kappa” κ. We have now fixed the cases where we mistakenly used k instead of κ.

      Page 2 (Introduction) paragraph 2 line 9: period missing at the end of sentence. Same Page 4 (Results: Assembly) paragraph 4 line 3.

      This is now fixed.

      Page 2 (Introduction) paragraph 2 line 15 and throughout: in long sentences, more commas can help help readability, for example before "leading" here. Similar page 15 paragraph 2 line 3 after "Additionally", paragraph 4 line 2 before "which".

      We have now included more commas and tried to improve readability throughout.

      Page 4 (Results: Assembly) paragraph 2 line 2: is "positive feedback" different from "cooperativity"? Is it a broader term that includes cooperativity, nucleation and other mechanisms? It may be useful to introduce some of these terms to avoid confusion by the readers.

      “Positive feedback” is the broadest term as it is agnostic to mechanism. “Nucleation” refers to the initiation of a first order phase transition, which is one mechanism of positive feedback. Nucleation involves “cooperativity”, in that a higher order species is more stable than smaller species. However, cooperativity can occur for oligomers of finite size, whereas nucleation is reserved for phase transitions to species of infinite size. We appreciate that the use of so many related terms may have created more confusion than necessary. Hence, we have now revised the text to omit the more general terms -- “positive feedback” and “cooperativity” where possible.

      Page 4 (Results: Assembly) paragraph 2 line 3: please define "TNF".

      We have now fixed this and other acronyms.

      Page 4 (Results: Assembly) paragraph 3 line 2: the use of size-exclusion chromatography to follow the size of complexes would assume that they are irreversible or very stable. It appears this may be the case here, but some discussion may be warranted.

      We have now explained that SEC is appropriate for this experiment because large nucleation barriers generally imply stable assemblies.

      Page 4 (Results: Assembly) paragraph 3 line 4 and throughout: the symbol for "kilodalton" is "kDa".

      We have now fixed this mistake.

      Page 4 (Results: Assembly) paragraph 3: I am not sure how the results discussed in this paragraph demonstrate that assembly occurs in cooperative fashion - just that there is a change in oligomeric states upon stimulation.

      Cooperativity is implied by the absence of oligomer sizes between monomer and the large assembly. Nevertheless, we realized this can only be concluded in the case of homotypic assembly, which we cannot yet assume at this point in the paper. Therefore, we have revised this paragraph to say that the distribution is “consistent with” an underlying phase transition (which we then go on to prove).

      Page 4 (Results: Assembly) paragraph 4 line 2: "WT" is not defined. Wild-type what? I presume "protein"?

      We refer here to the wild-type protein. We have now fixed this mistake.

      Page 4 (Results: Assembly) paragraph 4: it may be worth pointing out here the wild-type and mutant proteins expressed at similar levels; clearly the outcomes will depend on protein concentration in the cell. I believe the supplementary figure shows this to a large extent.

      Indeed, our supplementary figure shows that the WT and mutant protein express to comparable levels. We have now pointed this out in the text.

      Page 4 (Results: The adaptor) paragraph 1 line 4: "CARD domain" would stand for "caspase activation and recruitment domain domain". Please check throughout (including Supplementary Material).

      We have fixed this mistake.

      Page 4 (Results: The adaptor) paragraph 1 line 9: "expressed over a range of concentrations in cells" - this would imply the authors controlled expression - please rephrase to explain what exactly was done.

      We have now rephrased this sentence to indicate that the range of expression results from the use of a genetic construct with cell-to-cell variation in copy number.

      Page 5 (Results: The adaptor) paragraph 2 line 3 and throughout (including Supplementary Material): please use the Greek letter rather that "u" for micro.

      We have now fixed this mistake.

      Page 5 (Results: The adaptor) paragraph 3: this analysis is rather simplistic, it is not just the RMSD value, it is the nature of conformational change that is important? Please elaborate, I would think the papers presenting structural work have already discussed this to some extent?

      The reviewer is correct; it is the nature of the conformational change that is most important. We are unsure how to accurately estimate the energy barrier separating the two conformations for each protein. However, we have now undertaken a collaboration to attempt to do so via FAST molecular simulations (Zimmerman and Bowman 2015). In lieu of the results of these ongoing studies, we have modified the text to acknowledge that RMSD does not necessarily relate to nucleation barriers.

      Maxwell I. Zimmerman and Gregory R. Bowman. Journal of Chemical Theory and Computation, 2015, 11 (12), 5747-5757 DOI: 10.1021/acs.jctc.5b00737

      Page 5 (Results: The adaptor) paragraph 4 line 5 and further in this section: some symbol(s) do not show in the pdf - before "(delta)", next page line 3-5 after "higher" and "both".

      We have fixed this issue that resulted from exporting to a PDF file from our text editor.

      Page 6 (Results: The adaptor) paragraph 4: interface IIa and IIIb are not introduced, and there is not even any reference provided here.

      We have now added a reference for these mutations and elaborated on the interfaces IIa and IIIb.

      Page 6 (Results: Pathogenic) paragraph 1 line 12: "FL" is not introduced.

      We have now fixed this mistake.

      Page 8 (Results: Pathogenic) paragraph 7: the text "absent the pathogenic mutations" is missing something.

      We have now reworded this section.

      Page 10 (Results: BCL10) paragraph 3: why does CARD9 CARD clustering peak and then disassemble (I guess "clustering" doesn't disassemble, please rewrite as well).

      We have now fixed this mistake.

      Page 11 (Results: MALT1) paragraph 1: I presume dimerization doesn't achieve the same level of proximity as higher-order multimerization?

      Our interpretation here is that for MALT1, activation requires close proximity of more than two molecules. Although our dimerization module did not activate the caspase-like domain of MALT1, we know that it achieves close enough proximity to activate the caspase domain of CASP8. Hence we believe the MALT1 mechanism has a stoichiometry requirement in addition to a proximity requirement. This is, of course, consistent with the fact that activation normally occurs in the context of polymers rather than dimers.

      Page 11 (Results: Ancient) paragraph 1 line 4: is this AlphaFold2?

      That is correct, we used AlphaFold2. We have added that detail.

      Page 12 (Discussion) paragraph 4: not sure if "molecular examples of evolutionary spandrels" will be clear to most readers.

      We have now explained what evolutionary spandrels are, and elaborated on the relationship to our findings.

      Page 14 (Materials: Plasmid) line 2 and throughout: "Golden Gate" is usually capitalized. Similar for "Gibson" further in the paragraph. The English in this paragraph is not up to standard in general; for example "Then placing..." is not a complete sentence, and a number of sentences ending with "via gibson" need to be rewritten.

      We have now rewritten this paragraph.

      Page 16 (Materials: Cell) line 4 and throughout: "2" in "CO2" should be subscripted.

      This is now fixed.

      Page 16 (Materials: Transient) line 6 and throughout (including Supplementary Material): please use a space between number and unit ("35 mm").

      This is now fixed.

      Page 16 (Materials: Generation) line 4 and throughout: to distinguish from "gram", please italicize "g" and/or use "x g".

      We have now fixed this.

      Page 17 (Materials: Yeast) line 3: please specify which table is "table X".

      We have now fixed this mistake.

      Page 17 (Materials: Mammalian) line 1: please provide full reference. Same next paragraph line 2.

      We have now fixed this.

      Page 17 (Materials: DAmFRET) line 3: "SSC" and "FSC" are not defined.

      We have now fixed this.

      Page 18 (Materials: Fluorescence) line 10: "Coefficient" does not have to be capitalized. It does not have to be defined again in the next paragraph.

      We have now fixed this.

      Page 19 (Materials: Optogenetic) line 1: "performed" rather than "made"?

      We have now fixed this.

      Page 19 (Materials: Protein) line 12: the Compass software doesn't have a reference?

      We have now added the reference to the software.

      References: please make format consistent: articles titles in sentence or title case.

      We have now formatted all references to be consistent.

      Legend to Fig. 1: I suggest "Schematic diagram"; and "h" rather than "hrs"; please check throughout (including Supplementary Material).

      We agree with this suggestion.

      Legend to Fig. S1: is "TNF-a" supposed to be "TNF-alpha"?

      We have fixed this.

      Legend to Fig. S7: please capitalize "Figure 2H".

      We have fixed this.

      Legend to Fig. S10F: please move "Dox" behind the concentration.

      We have fixed this.

      Fig. S14B: the colours in the superposition make it difficult to see the differences.

      We have used a different color now.

      Legend to Fig. S14: I suggest "structure...predicted by AlphaFold" (2?) and include the reference.

      We agree with this suggestion.

      Reviewer #4 (Significance (Required)):

      As argued above, the significance of this paper is that it tests directly important hypotheses proposed or assumed previously, and does so at a technically high standard. No published report has done so to a similar extent.

      The paper should be of interest to a broad audience from cell biologists and immunologists to biochemists, biophysicists and structural biologists.

      My expertise is in structural biology or systems similar to the one studied here.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The study by Rodriguez Gama et al. addresses the molecular function of CBM complex-forming proteins CARD9, BCL10 and MALT1 in the activation of myeloid cells, using optogenetic tools, transcriptional reporters and biochemical approaches. It is known from previous studies that Bcl10 oligomerizes into filamentous oligomeric structures incorporating Malt1, and that these structures are nucleated by receptor-induced activation of CARD proteins such as CARD11 (in lymphocytes) or CARD9 (in myeloid cells), but the mechanism underlying the assembly of the resulting CBM complexes remain incompletely understood.

      The authors develop beautiful optogenetic tools to address this question, and convincingly demonstrate that CARD9-mediated nucleation of BCL10 triggers a binary cellular NF-kB response in a spring-load-like fashion, and identify mutants of BCL10 and CARD9 that impact this capacity. Unfortunately, however, the authors do not do a good job to simplify this complex problem so it can be easily understood. In particular, the choices of mutants, models and experiments are not consistent between figures, and some data seem to be arbitrarily added or omitted. Complex hybrid constructs are also used, without assessing whether these are indeed functional in the corresponding ko cells. The paper would therefore benefit from a major overhaul. We also noticed that the literature is often not cited adequately and have included a (non-exhaustive) list of examples of wrong, incomplete, or erroneous citations below.

      1) The initial observations of binary signaling are derived from a reporter system. Although there are controls to show that the reporter used does not function intrinsically cooperatively, it would be nice to see additional data to show that cooperativity occurs also at the level of endogenous response systems, for instance by qPCR-based assessment of a natural NF-kB target gene (induced for example by TNFa versus B-glucan in THP-1 cells, and by TNFa versus PMA in 293T cells).

      2) The cell lines in Figures 1D-E (and also some of the BCL10 mutants used later on) would have been better run in the assays in the early parts of Figure 1. The final conclusion prior to the section The adaptor protein BCL10 is a nucleation-mediated switch is otherwise not justified. This is a central tenet of the paper, that is referred to again, with some other ancillary data to support it. These mutants reappear later in the paper, but it would have been better, and easier to make rescue lines of BCL10 KO in Figure 1, otherwise the logic is lost, and the models seem chosen arbitrarily.

      3) Expression with microNS is not well controlled and gives little real evidence for what is occurring. It is unclear what the concentration of the protein expressed was, but certainly the relative expression of the CARD9(CARD) and the microNS version should be assessed. Also, the AmFRET profile of CARD9CARD looks very weird, it cannot be compared to BCL10.

      4) We are not convinced of the usefulness of the introduction of a slew of disease-causing CARD9 mutations that may or may not be relevant to the authors' point. The fact that they do or do not function in a specific sub portion of an assay that may or may not be relevant to biological activity seems to be of interest but without biochemical understanding, little is clear.

      5) The Optogenetic experiments are interesting, but difficult to interpret without evidence that these MALT1 constructs are indeed still functional when expressed in MALT1-deficient THP-1 cells. We do not therefore think that this experiment shows a necessity for clustering to signal, just a sufficiency, and in a highly artificial construct.

      6) In the introduction and other parts of the paper, there are numerous instances where the previous literature in the field is not adequately cited. Examples include:

      • In the introduction, it is weird to cite one original paper (a MALT1 ko study by Ruland et al., 2001; there are several other studies of ko papers for CBM components that would merit being citated along with this study) together with two reviews on that topic (Ruland and Hartjes 2019 and Gehring et al. 2018)
      • In the introduction, the original study by Wang et al., 2002 should be cited together with Rebeaud et al., 2002; the two studies on the same topic were published back-to-back
      • In the introduction, the statement "CARD10 and CARD14 are expressed in nonhematopoietic cells including intestinal and skin epithelia, respectively" should be supported by citations.
      • Still in the introduction, the 2 references for the statement "... CARD14 gain of function mutations cause psoriasis (Howes et al., 2016; Jordan et al., 2012)" are not appropriate. There are several reports of patients with CARD14 mutations (the study by Jordan et al is only one of them) and several CARD14 mouse models that provoke a psoriasis-like phenotype, which would merit being cited.
      • In the following sentence: "Point mutations and translocations involving BCL10 and MALT1 cause immunodeficiencies (Ruland and Hartjes, 2019), testicular cancer (Kuper-Hommel et al., 2013), and lymphomas (Zhang et al., 1999).", the citation style also seems completely random, combining the citation of a single original paper for lymphomas (Zhang et al. 1999) (there are several other important original studies on that topic or recent reviews that could be cited instead), together with a review on immunodeficiencies (Ruland and Hartjes, 2019) and then another single example for a role of BCL10 and MALT1 in carcinoma (the study by Kuper-Hommel et al. is one, but several other original publications exist on the latter topic, showing for example a role in breast carcinoma or glioblastoma).
      • In the first section of the results, the reference cited for endogenous CARD10 expression in 293T cells (Ruland et al., 2001) is wrong, no endogenous CARD10 expression was assessed in that study

      Significance

      The paper deals with a complex question, namely how the CBM signalosome assembles and functions to stimulate NF-kB signaling. This question is important to the understanding of pro-inflammatory immune responses and basic life sciences in general. As the focal point of the paper is complex, and tools to study such phenomena are at the limit of technical capabilities, this further increases the potential impact of the work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Two reviewers commented on the smeared appearance of Tae1 bands in our Western blot analyses (Figure 4F and 5B) and asked us to improve their technical quality.

      -We agree and will repeat these experiments with more careful attention to lysate preparation, using a higher percentage SDS gel for better separation of low molecular weight proteins as suggested.

      Reviewer 2 requested that we assess how Tae1 variants impact interbacterial competition outcomes.

      -We agree that this would be interesting to take a look at. While this will not be feasible for every variant we examine in the paper, we can conduct comparative interbacterial assays between P. aeruginosa and E. coli using P. aeruginosa strains with a tae1 point mutation for c110s. Given that our biochemical experiments show that this hyperactive variant evades inhibition by the cognate immunity protein, we expect that this may decrease P. aeruginosa fitness, even in the context of competition.

      More generally, we think that examining Tae1 variants in the context of interbacterial competitions would be a critical orthogonal approach in order to validate that the DMS results have any bearing on competition outcomes. However, we feel that major focus of this paper is on the more molecular and biophysical insights that our approach can offer. Our study tests our assumptions about the kinds of features and surfaces that are important for proteins that engage with non-canonical complex substrates. It is, of course, interesting to think about the implications of this for physiological phenotypes and the drivers of toxin evolution. It is also exciting to imagine how this kind of information could be used to one day engineer certain interbacterial outcomes. We hope that others in the field will push our efforts into these directions, but we do not feel that these directions are essential for our conclusions. However, our conclusions on the molecular and biophysical aspects have helped generate interesting hypotheses in microbial ecology that could be largely followed up on by others.

      In order to conduct well-controlled P. aeruginosa:E. coli competition assays for more Tae1 variants, we would need to generate a significant number of new P. aeruginosa strains encoding point mutations for each of our variants across several genetic backgrounds. The competitions themselves also require a considerable amount of work to optimize and quantify. We are able to do this for one of the variants as previously mentioned (C110S). It’s important to note that the first author of this paper, who was the primary driver of this work, is no longer in my lab or in academia. As for myself, I am also in the middle of a transition out of academia and am actively ramping down my lab at UCSF. I no longer have the space or appropriate set-up to support this longer-term effort.

      Reviewer 2 asked that we examine Tae1 (WT and C110S) expression levels in vivo to more precisely examine whether increased self-intoxication by Tae1C110S in P. aeruginosa was due to differences in toxin activity or toxin levels.

      We agree with this suggestion and will look at toxin protein levels by Western blot analysis in the context of P. aeruginosa cells grown 1) alone on solid media and 2) together with E. coli on solid media during interbacterial competition using conditions that match our other competition assays.

      All 3 reviewers asked us to provide more experimental evidence addressing the hypothesis that differential peptidoglycan (PG) affinity across Tae1 variants could explain variation in toxic activity.

      -We agree that this is an interesting point to follow up on further. To be clear, we also do not know whether this hypothesis is true at this stage, and the answer is not necessarily critical for our central advance, but we would like to give it a try! We have devised an approach to ask the question experimentally across a subset of our deep mutational scanning (DMS) variants.

      Reviewer 1 suggested that we quantify in vitro binding affinities for PG using isothermal titration calorimetry (ITC). However, given that ITC requires high concentrations of well-defined homogeneous substrates, which we are not able to generate for more complex higher order structures of cell wall PG, we propose a pull-down based approach.

      Briefly, we plan to conduct pull-downs using insoluble, purified cell wall sacculi from our two E. coli grown under the two conditions as bait for recombinant Tae1 proteins. Given that intact sacculi or inherently insoluble, we can simply collect bound Tae1 through centrifugation of sacculi pellets and examine the amount of Tae1 associated by Western blot analysis. These analyses will need to be conducted across a titration of Tae1 concentrations and also with catalytic activity inhibited to avoid solubilization of sacculi. We will block Tae1 hydrolysis by carrying out pull-downs in the presence of a general commercially-available cysteine hydrolase inhibitor, E64. If there is indeed differential affinity for PG underlying lytic differences across Tae1 variants, we would expect to see greater relative association of Tae1 variants with the type of cell wall sacculi that they more effectively lyse in our DMS screen. We would expect the reverse trend to also be true (lower affinity for less active variants).

      Reviewer 1 would like to know if we have done lysis experiments with any E. coli mutants that only impact PG density but not PG polymer structure? If they haven’t tested any E. coli mutants, have we done lysis experiments using drugs that have a similar impact on PG? Even if we don’t include these data in the paper, the reviewer would like us to comment on the trends we have observed.

      We have not done experiments in any mutants or chemical backgrounds known to only impact PG density but not polymer structure. We think this would be a very interesting angle! But unfortunately this is outside the scope of this study. It would require that we first experimentally confirm that the restrictive effect on only density is clearly demonstrated using a variety of techniques, including microscopy, chemical analyses, and biophysical probing of sacculi.

      Reviewer 1 asked for additional DMS screens in more conditions

      We love this idea! In fact, we hope that others are motivated to adopt our workflow to run many more DMS screens for T6S toxins, as we believe these screens provide a lot of useful and sometimes surprising insights that could be of great interest to others. However, we believe that the primary goal of this paper is to establish this methodology as a compelling approach for studying toxins and, more generally, proteins with complex cellular substrates. It does not necessarily fall within the scope of this paper to fully assess the mechanistic implications of cell wall diversity across a wide range of conditions.

      In our experience, rigorously conducting DMS screens requires a significant amount of effort and resources to establish consistent experimental conditions. Also, a non-trivial number of costly sequencing-based experiments are required across control and variables for the results to be statistically sound and meaningful. Furthermore, experimental validation of results are ultimately important for our ability to confidently generate hypotheses stemming from these datasets. As stated above, the first author of this paper, who was the primary driver of this work, is no longer in my lab or in academia. As for myself, I am in the middle of a transition out of academia and am actively ramping down my lab at UCSF. I no longer have the space or appropriate set-up to support this longer-term effort.

    1. Design is hope made visible. You can live your life as the result of history and what came before, or you can live your life as the cause of what’s to come. You choose. When talent doesn’t hustle, hustle beats talent. But when talent hustles, watch out. When you work only for money, without any love for what you do in and of itself, your work will lack energy. People will feel that. So give every project everything you’ve got, at every moment, every time. A good philosopher will say: “Know thyself.” A good shopkeeper will say: “Know thy customer.” A good designer will say: “Know both.” Listen for when someone is dismissing your ambitions. Only the petty do that. Avoid them. Instead, seek out those much better than you; they’ll make you feel that you can achieve your dreams, as theirs are probably even larger. They’ll wave you on to the finish line. A brand is always answering two questions. The first one internally facing: What do we believe? The second, externally: How do we behave? You must remain authentic to yourself, your core values, and what you stand for. If you’re not, people will sniff you out. But your brand must maintain cultural congruence — remaining relevant to the times, always evolving to inspire people at large. The answers to these two curiosities must always be aligned. Find a way to connect every project to something much bigger: a higher order value, a truth, a courageous goal, or a larger question. Then, if your efforts start to lag or feel mundane, return to that larger ideal that inspired you in the first place. It works. Put this over your desk: “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” Buckminster Fuller knew stuff. A good designer will help a company get to where they want to go. A great designer will push a company to where they should go. Are you going to tell a story? Then tell a big story. An enormous story. An epic story. Or tell no story at all. The role of creative leadership is to create more leaders — not more followers. This view is more uncommon than I’d like. I’ve learned that there are only two kinds of people: 1.) People who do exactly what they say they’ll do. 2.) People who are full of shit. Form follows fantasy. Every good idea comes from a spark of imagination, not pragmatism. Facts are important. But possibility creates futures. Never take an unpaid internship. Ever. It is unethical to be offered one, and in many places, it is illegal. But more importantly, what kind of people would refuse to pay you? Oh yeah, really shitty people. If you lose the desire to be silly, the power to laugh, and the ability to poke fun at yourself, you will lose the power to think. All work and no play makes Jack a dull boy for one reason: It kills off his imagination. Stuck on a problem you can’t solve? Go bigger. Expand it. Make it giant. Do not try to contain it, or simplify it, or reduce it. Make it so large that you can begin to see a new pattern. Solve the larger problem and the smaller one will get solved along the way. Always begin in mythology. It’s good fuel. Fables and fantasy don’t age or grow stale for one reason: They are a step into a dimension beyond the reach of time itself. Build with them. When I turned 35, I shifted my desire to be happy to a desire to be useful. It made all the difference. There are only two kinds of leaders: 1.) Those in the engine room helping the crew shovel the coal. 2.) Those who sit on top of the train and wave at the crowds as they pass by. Learn from ad agencies. They say yes to everything, even when they can’t do it. But they try. Designers say no all too often: “Oh, no. We don’t do that!” That’s shortsighted. Instead, say yes to everything. But always add “yes…if.” Then define your terms. I was on a board with the esteemed educator Sir Ken Robinson. At one meeting where a pompous guest was droning on, he turned to me and whispered, “What we do for ourselves dies with us when we leave this planet. What we do for other people can live on forever.” The opposite of courage is not cowardice. The opposite of courage is conformity. Ubiquity = Invisibility. What we’re overly familiar with, what becomes common, we stop seeing. One function of design is to restore our perception, renew our understanding, and invite us to be more alert. Seek simplicity only on the far side of complexity. Do the work, the research, the understanding, and discover the unseen, surprising, unanticipated insight before you start crafting your solution. A celebrated designer I admire once said “Style = Fart.” I disagree. I believe “Style = Accuracy.” It gives focus and timely relevance to ideas. If you want to make people like things, work in advertising. If you want to make things people like, work in design. Both are valid ways to build a brand, but the second way pays off better in the long run. You can always pull a good story out of a successful product or service. You can’t always pull a good product out of a story. Hire gifted people your clients would never let in their front door. Give them influence. Clear the runway. Provide sandwiches. And stand back. When designers get overwhelmed we can retreat into passivity. We pull back. This gives us an illusion of control. The less we try, the less our chances to fail. We make it look like we’re not responsible for what happens to us. But never give up. Move in closer, instead. Try. Make a mistake. Apologize quickly. And keep trying. Never be boring. Be ridiculous. Absurd. But never be boring. (Yes, this rule will get you in trouble.) Push. Push harder. The goal is to make the complicated simple — not the other way around. The best ideas are often expressed as simple ideas. They’ll have power because they’ll feel inevitable. Looking backward from the end of a project, it will have the appearance of inevitability. But when you began, you had no idea you’d end up there. What dullards suggest at this point is dangerous: “This creative process is too messy and too complicated. It needs efficiency since this solution was so logical. We should apply more logic throughout the process!” That’s the beginning of the end of creativity. Resist this urge. It destroys spontaneity, originality, serendipity, and unintentionality, which is where the biggest ideas are waiting for you. Do you find yourself surrounded by people who whine that “clients don’t understand what we do”? Those people will never have good clients. A designer’s first job is to articulate the tangible value we bring to every situation. It’s not the clients’ job to try to guess it. Average designers hit the brakes when they feel fear. But when the talented get frightened, they hit the pedal, accelerate, and drive headlong into the unknown. I’ve taught students for 20 years. In that time I’ve seen self-confidence, persistence, and desire play a much larger role in growth and achievement than talent. Passive? Whining? Waiting for orders? You won’t get off the ground. Energized? Enthused? Curious? The sky’s your limit. If you want to teach design, first read “Teaching to Transgress” by bell hooks. Your whole mindset will change. If it doesn’t, please do not teach. Seeking mastery in design means being comfortable with making your own path. Forge the new road. Others will question it and doubt it. But that path will eventually come to fit your soul. It will not only lead you into deeper parts of your craft, but to hidden parts of yourself. There may come a time when someone publicly attacks you or your work. If that happens, remember this: Those who attack are the ones who fear you the most. They’ll suspect that your talents might be greater than theirs. They, in fact, become your most sincere believers. It’s a proof point when they start showing up. Watch for them. Then thank them when they arrive. “Always think with your stick forward.” Amelia Earhart painted that on her plane. She meant, I imagine, to seize the moment when it arrives. Refuel as necessary. Don’t wait for any damn kind of “inspiration.” Punch the throttle. Get back in the air. Keep flying. Are you at an agency that habitually recruits outside industry hotshots to lead instead of promoting potential hotshots from the ranks? Run. Now. It will never become what it wants to become. Separate talkers from doers. For someone to score an interview, I suggest a good book — on anything — to read in advance. “After you finish it, call me, and we’ll schedule some time.” 90% drop off. There are exceptions, but I hire from the remaining 10%. Be careful of doing too much work that copies the people you admire. Start out that way to see what feels right. But aim to seek what they were seeking instead of doing what they were doing. Stay away from people who confuse pomposity for profundity. Articulate incompetency is contagious. When you’re out-gunned, out-staffed, and out-equipped in a competition, what are the things you’ve got left to use? Kindness and imagination. When someone disagrees with you, do not defend yourself. Instead, listen. Ask them to explain, validate their concern, expand on it, and affirm their point of view. Only then will anyone listen to anything you have to say. I wish someone had told me this in my teens. We don’t create fantasy worlds to escape reality. We create them so we can better see, understand, and reshape reality. Seek ambition. Hire character. Train talent. When I hear the word “iterate” more than three times in three minutes, I fear there will be a Post-It® fiesta within three minutes. Fair warning. A story is not just a tale of conflict. It can be a well of shared values. If you shift the story people tell about themselves and their communities, you can not only shift those people, you can shift an entire culture. Build a library for yourself, and read John Milton. He had profound respect for books and human thought. “For books are not absolutely dead things, but do contain a potency of life in them to be as active as that soul whose progeny they are; nay, they do preserve as in a vial the purest efficacy and extraction of that living intellect that bred them.” A better definition about the sanctity of books was never written. Notice someone doing something cruel for the first time? Never wait for a second time. Address it fast, or cut them out. Either way, do not “wait and see.” It leaves you and your team vulnerable. What they showed you is who they are. Move fast. Mastery is not gained from intellect. Mastery is not gained from talent. Mastery is not gained from ambition. Mastery is only gained from time and focus applied to your craft over many, many years. Do not conflate it with fame. Try absolutely everything. Then try it all again. And then, one more time. Accept compliments gracefully. Treat flatterers with suspicion. Listen to your complainers and cynics — not because you might learn from them, but because they secretly care. Design ain’t what the thing looks like. Design is what the thing does. Smartphoning has supplanted daydreaming. Fixated on our little, lit-up screens, dusty old thoughts no longer slip out of our brains as easily, so no new, silly, absurd thoughts slip back in. And all good ideas start out as silly, absurd thoughts. Turn off your phone. Daydream. Fart around. Ponder. Let something odd fly in that’s floating around, hoping for an open mind to land in. If an idea doesn’t scare you in some way, it’s not really a good idea. A strong, sincere voice is like a clear bell—when rung, it travels far, across fields, mountains, and rivers. Ring it. And teach others to. Ignore those who tell you to “only focus on your strengths.” Nonsense. Your strengths never go. Build them, hone them, and add muscle to them. But also focus on what you need to move into new and larger worlds. Become a shocking triple threat, not just a shiny, one-trick pony. Failures are not always mistakes. It just might have been the best you could do at that point. Okay, fine. Apologize quickly. The real failure is to beat yourself up and not take the opportunity to learn. Never hire people for “cultural fit.” What a pernicious term. Instead, hire insanely talented people for their “cultural contribution.” For how unique they are. For why they are different from you. For what they will add that you do not have. People who use the word “lifestyle” don’t have one. Big agency order of importance: Clients –> Work –> People. Ours: People –> Work –> The client’s customers –> Clients. It’s easy. Good people do good work that customers love so clients succeed. T’was ever thus. Don’t work with clients to help them become the best. Work with clients to help them become the only. Hire Tigger. Never Eeyore. Surround yourself with optimists. They will build futures into existence. Read a good book every week. After a year your brain will be fueled like a rocket and your mind will naturally start going to new places, connecting new ideas, and thinking in ways you never have before. Never create and edit at the same time. Get all the sloppy, ugly roughs and first drafts out. Quantity is more important than quality at the start. Mess is more. All ideas are bad ideas. They only become good through craft and love. Clients want you to succeed like crazy. That’s why they hired you. Show them how. That’s your damn job. Do it. We perceive through images. We think in metaphors. We learn by stories. We create with fantasy. When you find yourself on the horns of a dilemma, always do the honest thing. This will shock people. And you’ll come out better, anyway. Perhaps. Maybe. Possibly. Someday. These are among the most damaging words a creative person can use. Lose them. Everybody starts out with good intentions. Not everybody finishes with them. This has been the most painful thing I’ve ever learned. People already know what advice they need to hear. They just need to hear it told to them by someone else. There is no such thing as “The Future.” There is only and always “The Futures“—and they are all in competition with each other, fighting for dominance. Which future will you feed? When asked for a definition of “brand,” I use this: A brand is a promise performed consistently over time. It’s held up for a while now. Brands are mentors of things to come. The best ones anticipate, create, and move us into tomorrow. Companies are no longer in competition with each other. They are—we all are—in competition with the future itself. The era of human-centered design is now gone. Our existence was never human-centered, anyway. Covid-19 proved that to be nonsense. It’s time for environment-centered. Not sustainability. Regeneration Design where we create not apart from Nature but as a part of Nature. It is never about winning. It is never about losing. It is only about contributing. It is only about learning. I’m tired of talks from “designers” who never design anything beyond their keynotes. I’m tired of talks from “entrepreneurs” who never build anything beyond themselves. I’m tired of talks from “thought leaders” who lead nothing but the perpetuation of their own fame. When you submit a fee for your work, someone will always ask, “Is this negotiable?” Answer with this: “Yes. Up.” In the end, it’ll not be what you took. It’ll be what you gave away. Do not worry about your competition. You’re not in competition with them anymore. You’re only in competition with the future itself. So don’t look over your shoulder. Look two, three, five years down the road and invent backward from there. Design is the bridge that gets us from where we are to where we should be. It is future-making. And it’s our job to get our clients into the best futures for themselves as quickly and effectively as possible. Skip the whole “Minimal Viable Product” thing. It leads to incrementalism. Try “Maximum Fucking Love.” It leads to something that someone else might actually care about. Be aware that every choice you make comes down to two options: Feeding grievance or creating hope. In the end, it is that simple. The era of problem-solving is gone. It’s too reactive in a world where the future arrives too fast. Designers must now be problem seekers, finding and anticipating problems before they arrive on our desks, because at that point, it’s already too late. We must now all build bridges, not walls. The rest is detail. Design Thinking gives a definition of romantic expression as found in timely historical contexts. Design says: “I’ll be upstairs.” In first creative presentations, to ensure your creative work has time and space to land, ban all of the devil’s advocates from the room before you show a thing. Then say this: “We are here to create something new. New ideas can be fragile because they are unfamiliar. You may not like something you see here, but you are not allowed to say that for now. We’ll have to edit and remove some of this work later, but for now, everything will be in play. So find something, anything, you like in every idea. A color. A word. An image. A sentence. Anything. In the end, we find what we look for. And today we are going to look for the new.” In the end, there are only two key questions the world asks of us: 1.) Who are you? 2.) Where are you going? These questions are the same ones we ask our clients. The first is about authenticity; the second is about relevance. Asking them will keep the world wide open in front of you. Whether you like it or not, your brand’s story already exists, so you should manage it as you would any other powerful company asset. After your product, your means to deliver it, and your audience, your story will be the most potent tool you have to build with. Be very. For a very long time, it took a very long time for anything to change. If you found an answer that worked, you could count on it being the answer for ages. But those days are over. Being an answer is not the answer. Or even an option. Unless, of course, you’re very curious. Or very focused. Very gay. Very straight. Very caring. Very prickly. Very visual. Very verbal. Very brash. Very funny. Very heady. Very anything. Everyone at COLLINS is very something. If I took any lessons from Ogilvy, it was these two: 1) Think bigger. And then, think bigger still. 2) Take every chance while you can. Grab them. And go all in. You never know if they’ll ever come again. Experience. Don’t observe. Inhale. Don’t read. Transfigure. Don’t shift. Advocate. Don’t ponder. Prove. Don’t promise. Encourage. Don’t cut. Imagine. Don’t worry. Do. Don’t analyze. Hear. Don’t listen. Show. Don’t tell. Give. Don’t take. Design is not what we make. Design is what we make possible.

      Some great design principles and wisdom.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to investigate the diet of the early fossil bird Jeholornis and its implications for bird-plant interactions in early bird evolution.

      Major strengths were: 1) an exquisite near-complete cranial reconstruction of the early fossil bird Jeholornis from the Early Cretaceous of China, 2) a large sample of extant bird skulls (160) for the geometric morphometric analysis, and, 3) qualitative description of alimentary contents of extant birds.

      Major weaknesses were: 1) restriction of diet consideration to only granivory and frugivory, 2) under-detailed comparisons between the extant and extinct alimentary contents, 3) unclear explanation of the connection between early fossil birds and seed dispersal.

      Thanks for the summary of our work! To briefly reply to the weaknesses mentioned here (more details are provided in the following reply to the reviewer’s comments and suggestions):

      1) We have added supplementary analyses according to the reviewer’s suggestions, so this should have been addressed now. Our morphometric analyses attempt to explain the presence of seeds in the gut contents of some individuals of Jeholornis. We believe there are only two possible explanations of the presence of these seeds: granivory or frugivory. Therefore, we were initially motivated by the need to rigorously rule-out a granivorous explanation of the present of seeds in the gut of Jeholornis, which then would demonstrate the partially frugivorous diet of Jeholornis - it doesn’t have to be a specialist frugivore and its supplementary diet components don’t influence the inference that the presence of seeds results from fruit-consumption. Fruit-consumption is the key mechanism that we provide evidence of for the first time in early birds, and is central to the potential for mutualisms between plants and early birds. However, our supplementary geometric morphometric analyses do indicate some clues about its supplementary diets that are useful. In particular, they rule out some other diets e.g. piscivory or a probing diet.

      2) Our work is the first work we know to provide comparative data on the seed-containing gut contents of extant birds, as a tool to interpret fossil gut contents. For granivores and frugivores, we have done detailed 3D comparisons among several species. We think this is important, and we have done our best to document them clearly. However, for now, we have further clarified the images that we have presented, in response to a comment by referee 3 (see below). We hope that this also addresses the concerns of referee 1 here.

      3) By providing direct evidence of fruit-consumption in early birds, we provided evidence of the mechanism for potential bird-plant co-evolutionary mutualism during the Early Cretaceous. We are not showing the direct evidence of the mutualism, although note that plants invest energy in fruit production specifically to attract fruit-eating animals to act as seed dispersers. Therefore, the inference of mutualism is not far-fetched and is very likely, even if direct evidence is almost impossible to preserve in fossils - so that we tend to tone down this statement rather than making it too strong. More detailed analyses based on more new fossil discoveries in the future are expected to further explore the role of birds the Cretaceous Terrestrial Revolution. However, our study is the first step to evidence and discuss this ecological topic and the furthest we could go based on the current fossil discoveries. Nevertheless, this seems important and will be the base of future studies.

      The authors did not yet achieve their full aims because their methods limited the scope of their conclusions. Specifically, a third hypothesis that Jeholornis was neither granivorous nor frugivorous was not addressed in the study. This is especially poignant as the PCA data show overlap between the granivory and frugivory data points and the 'other diet' data points. If it is assumed that Jeholornis must be a granivore or a frugivore, then the results support frugivory over granivory for Jeholornis. However, as explained above, this assumption is not supported by the data provided so the third hypothesis needs to be tested.

      Thank you very much for stating the concern of our study. It seems that there is some misunderstanding here about our study. Our analyses attempt to explain how seeds entered the gut content of Jeholornis, not to predict diet in the absence of evidence from gut content. That is why we tested between just two alternative explanations of the gut contents in our original analyses: (1) That seeds entered the gut through granivory (seed-consumption); and (2) That seeds entered the gut through frugivory (fruit-consumption). Based on this combined evidence of seeds in the gut, comparative study of the gut contents of extant birds, plus morphometrics of the skull and mandible, we claimed partial (possibly seasonal) frugivory - a form of facultative frugivory for this lineage. Therefore, we are not claiming specialised frugivory in Jeholornis as the reviewer might think. However, we acknowledge that the word 'frugivorous' might be misleading to some readers, who could interpret it as meaning 'specialised frugivorous'. To avoid this misunderstanding, we did consistently use adjectives such as 'partial', 'seasonal' and 'opportunistic' in our initial submission. And we have tried to reinforce this in our revised manuscript. For example, we converted some instances of ‘frugivory’ to ‘fruit-consumption’ to indicate the act of consuming fruit rather than a perceived idea of specialised frugivory.

      We may also need to emphasize here that, the seed dispersal and frugivore ecology studies of the modern taxa show that, for most frugivores, fleshy fruits are a non-exclusive food resource, which is supplemented with other foods like animal prey and plants (Howe, 1986; Corlett, 1998; Jordano, 2000; Wilman et al., 2014). In addition, plants usually bear fruits only in certain seasons rather than being available throughout the year, which makes strictly specialized frugivore very rare. Therefore, avian frugivores occupy a wide range of diet space that is highly overlapping with some other diets. However, to reply to the comment from the reviewer and also make this clearer to some other readers, we conducted supplemental analyses by dividing 'other diets' further to test what diets Jeholornis possibly/impossibly had as supplements of frugivory. The results of them were shown in Figure 2 - figure supplements 3, Figure 2 - figure supplements 4 and Figure 2 - figure supplements 5 now. We revised and added these texts into the manuscript to describe the added supplemental analyses:

      “Our main analysis is intended to test why seeds entered the gut of Jeholornis by distinguishing between two hypotheses, either (i) fruit consumption or (ii) seed consumption (Figure 2, Figure 2 - figure supplements 2).”

      “Our supplemental analysis includes a further split of “Other diets”, separating the “Other diets” category into: (1) Probing for invertebrates; (2) Grabbing/pecking for invertebrates (Figure 2 - figure supplements 3); (3) Piscivores; (4) Animal-dominated omnivores; (5) Carnivores (Figure 2 - figure supplements 4); (6) Nectarivores; (7) Omnivores; (8) Plant-dominated omnivores (Figure 2 - figure supplements 5). Our prior expectation is that these analyses will not provide an unambiguous classification of the diet of Jeholornis on their own, because craniomandibular shape data does not completely differentiate among diets in birds (Navalon et al., 2019), but that they may be capable of ruling out the occurrence of some diets.”

      The results of these supplemental analyses are as the descriptions we added in the manuscript:

      “Our supplemental analyses exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4). However, it cannot be distinguished from other diets such as the grabbing/pecking for invertebrates and omnivory (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the full multivariate shape space suggest that the mandible of Jeholornis is relatively similar to those of various omnivorous (e.g. Podica), seed-grinding (e.g. Calandrella), frugivorous (e.g. Crax), and invertebrate pecking (e.g. Picus) birds (Figure 2 - Source data 3).

      “Similar to the results of the mandible analyses, the results of the supplemental analyses of cranial shape also exclude Jeholornis from possessing a probing diet, which occupy negative PC1 values (Figure 2 - figure supplements 3), as well as being a piscivore, which occupy positive PC2 values (Figure 2 - figure supplements 4).The other diets are also undistinguishable in the supplemental analyses of cranial shape (Figure 2 - figure supplements 3, 4, 5). Euclidean distances in the multivariate shape space, excluding PC3 (which describes the large-scale differences between stem- and crown-group birds) suggest that the cranium of Jeholornis is relatively similar to those of various frugivorous (e.g. Manucodia), seed-grinding (e.g. Pedionomus) and invertebrate pecking (e.g. Hymenops) birds (Figure 2 - Source data 4).”

      These results are briefly merged into the discussion part:

      “Mandibular and cranial shape excludes Jeholornis from being having a probing/piscivorous diet, and is consistent with omnivory, grabbing/pecking for invertebrates, or processing foliage (using the gastric mill).”

      The existed main morphometric analyses show that a seed-cracking diet can be ruled out as an explanation of the presence of seeds in the gut of Jeholornis, which is its primary goal. In addition, our intention of this study is to show evidence for at least seasonal fruit consumption in some of the earliest birds (not specialised frugivorous), which all three reviewers seem to agree is a well-founded conclusion, and the bigger picture insights of our paper arise from that. Here with the new supplementary analyses inspired by the reviewer, the diet of Jeholornis is more detailed in our study, which may interest more readers concerning about the diet components of early birds.

      The cranial reconstruction of Jeholornis and the alimentary content data for extant birds would be invaluable to the community. The geometric morphometric data are presented in a way that obscures how much overlap there is between dietary categories (non-frugivore and non-granivore diets are grouped as 'other diets'), so the utility of these data is unclear. This aspect has hampered the ability of the authors to reconstruct diet in Jeholornis and, thus, the bigger picture insights that can be drawn from these results, limiting the likely impact of the work.

      Thank you very much for the positive comments about our cranial reconstruction of Jeholornis and the alimentary content data for extant birds.

      It was not our intention to obscure the overlaps between the mandible/cranial shape of frugivorous birds, and those of other birds. In fact, we believed that this was clear from the plots, and from the way we described results in the text that various birds with ‘other diets’ could have similar mandible/cranial shape to Jeholornis. This degree of overlap is also expected based on recent studies that found evidence for only quite diffuse relationships between cranial form and diet in birds (Navalón et al., 2019). However, we also see the point that some readers might be curious about the nature of particular datapoints and it would be useful to clarify this. We therefore added supplementary analyses according to the reviewer’s comment/suggestion by dividing the 'other diets' category into several much more detailed categories, so the concern of the reviewer here that “the non-frugivore and non-granivore diets are grouped as 'other diets' is expected to have been addressed here.

      Jeholornis is one of the earliest fossil birds, so understanding its diet and ecological role is important for understanding Mesozoic ecosystems and the emergence of modern ones.

      Thank you very much for this good explanation of the importance of this study, and it also is what we believed when we wrote the manuscript. We hope that the referee will be satisfied with the efforts we made to address their initial comments that that our paper on the ecology and morphology of Jeholornis can be published in an appropriate venue.

      Reviewer #3 (Public Review):

      Hu et al. reported on a new specimen of the early bird Jeholornis, including a nearly complete skull. Using geometric morphometrics data collected from 3D and 2D retro-deformed reconstructions of its skull, the authors convincingly dismiss a seed-cracking feeding strategy for the taxon. They then use comparisons of 3D reconstructions of ingested seeds to extant birds with known feeding strategies to convincingly argue that Jeholornis was likely at least partially frugivorous. As such, this study provides the strongest evidence yet that early birds such as Jeholornis may have played a role in bird-mediated seed dispersal strategies in the Mesozoic.

      Generally, the data presented in this paper support the authors' interpretations. The specimen at the core of this study is truly spectacular, and the authors' retro-deformation of its skull is skilled. The results of the authors' geometric morphometric analyses support their inference that Jeholornis was likely not a seed-cracker. Their comparisons of ingested seed shapes also convincingly supported a partially frugivorous diet. I especially applaud the authors' detailed description of their process of retro-deformation of the fossil skull (an example many should follow, including myself) as well as making both their raw data and their reconstructed surfaces available online.

      Thank you very much for the summary of our work!

      However, there are a few major and several minor issues that I believe need to be addressed.

      1. The implications for possible bird-mediated seed dispersal are clear in this study, but they are not conclusive. Rather, the authors (convincingly) demonstrate that Jeholornis was at least partially frugivorous -- a necessary component of such a mutualistic interaction. The authors do not demonstrate that such an interaction actually occurs. These results are nonetheless exciting and important, but I think certain statements in the paper are too strong. A notable example is the title - "Earliest evidence for frugivory and seed dispersal by birds." I would strongly urge the addition of a single word to better reflect the data presented: "Earliest evidence for frugivory and possible seed dispersal by birds." Similarly, in lines 328-329 -- "Strong indications for at least seasonal frugivory in Jeholornis provides direct evidence of [specialised seed-dispersal by animals during the Early Cretaceous] for the first time" -- is not true. This paper does not provide direct evidence for this, but does provide a mechanism consistent with this. There are a handful of other statements in the paper that I think should be toned down to account for this.

      Thanks for the helpful suggestions! We have revised the title to be “Earliest evidence for frugivory and potential seed dispersal by birds”, and revised this sentence to be “Evidence for at least seasonal frugivory in Jeholornis provides direct evidence of fruit-consumption by early birds, long before the origin of the bird crown-group. This provides an important indication of the likelihood that birds were recruited by plants for seed-dispersal very early in their evolutionary history, during the Early Cretaceous” now. We also revised through the manuscript to tone down some similar statements about the seed dispersal, such as “…indicating that birds may have been recruited for seed dispersal during the earliest stages of the avian radiation.”.

      1. Much more information should be given about the new Jeholornis specimen. In the supplement, the authors state that "a few post cranial elements" (p. 17, line 352) are preserved along with the skull -- which elements? They should be figured and briefly described in the supplement. This is of relevance to the core assumption of the paper, namely that this individual belonged to Jeholornis -- the taxonomic assignment is based partially on the tail morphology -- which I assume means that, minimally, a complete tail is preserved. The authors also mention the pelvic morphology of the new specimen, so I assume at least some part of the pelvis is preserved. These should all be figured. Most anatomical discussion is limited to the skull (and especially the palate), which is understandable, given the focus of the paper. However, with that in mind, more attention should be paid to the retro-deformation of the skull. Figure 1 is quite attractive, but I'm confused by the differences in depicted preservation between the 3D (Fig. 1C, D) and 2D (Fig. 1E, F) reconstructions. For example, the braincase is not shown in panel C but is in panel E -- why? Is its shape inferred from other specimens for panel E? Again, I very much appreciate the inclusion of near step-by-step description of how the rostrum was retro-deformed. Minimally, a few comments on what isn't preserved would be useful.

      1) We added the photograph of the whole slab of Jeholornis STM 3-8 as Figure 1 - figure supplements 1 here (the eLife format for supplementary figures), and revised this sentence to be “…and a few postcranial elements including the vertebral column, the pelvic girdle and fragmentary hindlimbs.” now. As you could see from the photograph, there are very few valid information could be extracted from the incompletely preserved postcranial elements. Considering this paper is focusing on the skull, we only mentioned the relatively better-preserved tail and pelvis in the taxonomic part.

      2) We added “Dashed-lines indicate the elements not preserved but suspected to exist.” in the legend of Figure 1, and added the details of reconstructions of unpreserved elements in the end of CT scans and digital reconstructions in Materials and Methods part: “However, since the braincase is too flattened to be used as the reference for 3D retrodeformation, it was omitted in Figure1C and reconstructed according to its common shape in early birds in Figure 1E. The ectopterygoid is not preserved but suspected to exist as discussed in the Cranial Anatomy part, therefore it was reconstructed according to the shape of this element among other stem birds e.g. Archaeopteryx and Sapeornis (Elzanowski and Wellnhofer, 1996; Hu et al., 2019).”

      1. The figures are visually attractive but I found some of them confusing or unclear. See my comments above regarding Figure 1. Despite the red arrows in Figure 4 and the supplemental figure, I was hard pressed to understand precisely what set the indicated seeds apart from the rest. In some cases I could see slight "dents" where one or two of the arrows indicated, but it was hard for me to see, even when I zoomed in on my screen. I think inset panels featuring zoom-ins on the indicated regions would be very useful in making the point the authors intend. Also, I don't know if the supplemental image naming/number scheme was imposed by the journal or is a choice by the authors, but I found it baffling. Something more traditional (like "Fig. S1" or "Supplemental Figure 1") would be much more efficient.

      1) We have clarified the confusions in Figure 1 as suggested. For Figure 4 and related supplementary figures, the 3D reconstructed seeds are pretty clear, such as the broken ones in Figure 4B. The broken seeds in the scanning slices are more difficult to observe as the reviewer said, since the seed husks are very thin so that they are only slightly brighter, and that’s why we put the red arrows indicating the breakages there. To help readers observe them easier, we added some zoom-in panels and line drawings for the representative ones (not all of them since otherwise it would be too many) now as suggested by the reviewer;

      2) The supplementary image naming/number scheme was imposed by the journal, and it would be more clear when the paper is digitally published, since these supplementary images will be connected to links in the legends of the main figures.

    1. Author Response:

      Reviewer #2 (Public Review):

      This is an interesting and scientifically rigorous report documenting atypical, dendritic locations for the emerging axon of pyramidal neurons. This is not an entirely new observation (the authors cite relevant publications, including Kole and Brette, 2018 and Mendizabal-Zubiaga et al., 2007), but still important, as a relatively overlooked fact with functional implications. A main feature of the present report is an exceptionally thorough cross-species survey, from which the authors conclude that, as compared with non-primates, the macaque and human brains have a lower proportion of neocortical pyramidal neurons with axon carrying dendrites. The results might be further supported by additional experiments, especially ultrastructural data, or by including more extensive developmental data. There is a section on Development, but there is hardly any Discussion. However, these matters are raised and adequately treated by reference to the existing literature.

      We cannot do EM with frozen material or DEPEX-cleared sections. The developmental aspects have been more extensivel discussed now, but we refrained from speculating too much, since we do not have physiological data.

      Reviewer #3 (Public Review):

      The authors used neuroanatomical techniques to study neocortical pyramidal neurons from several different mammalian species. Their message is that primate neocortex differs from that of other mammals in having substantially fewer cells with axons emanating from dendrites, rather than the canonical route from the soma. The authors employed a range of standard methods, ranging from tracer injection to Golgi impregnation to immunocytochemistry. The feature the authors report is undeniable; there clearly are axons that emanate from dendrites of neocortical pyramidal neurons. Prior studies have reported that these axons are more excitable, thus leading to the intriguing possibility of a fundamental architectural (and thus presumably functional) feature in how primate neocortex operates.

      This is a provocative narrative, that leads to a number of interesting questions. However, I have reservations that the authors must address before I believe the claim that primates are really fundamentally different from other mammals in this respect. A strength but also a central limitation of this study is that different species were compared using different methods, and different areas were studied in different species. The authors make the implicit assumption that the prominence of this feature does not differ among cortical areas.

      We initially considered it a strength of the study – looking into many area with many methods in many species. However, it seemed a bit like cherry-picking, and we now enlarged the data sets for a more systematic analysis. Please note, we assessed archived material. We are bound to what we have available. We now delivered areal comparisions, and I am afraid, the answer is NO, no remakable differences in the areas that we assessed in monkey and cat.

      However, it is entirely plausible that the proportion of neurons with axon-carrying dendrites does differ among cortical areas. The authors also group neurons into 2 large populations: infra- and supragranular. But again, layers 2 and 3 differ from one another (as do layers 5 and 6) in the specific populations of pyramidal cells they contain (morphological and neurochemical types, inputs and outputs, etc.). Certainly many studies do group neurons into these broad populations, but for this kind of comparison relevant differences or similarities could have been lost. Comparisons among species ideally would have all been in the same layer and area.

      As said, we are bound to what we have available. And this is more than what has ever been published on these question so far. The graph and the Tables to Figure 3B allow to compare species across the layers.

      We are aware that pyramidal cells in the layers can differ. Looking into RNA seq papers, up to 19 types exist in mouse. How many could potentially then exist in human? There is no way of pulverizing our kind of analysis down to the level of 19 pyramidal cell types differing by some unexplained RNA signatures which so far exist only for mouse. The SMI-32 staining already “selects” for one subtype in that it stains preferentially so-called type 1 pyramidal cells (Molnar et al., 2006).

      Another limitation is that the same method was not employed in different species. The reader needs to know that different methods reveal the same proportion of axon-carrying dendrites in a given area of a certain species. This should have been stated more clearly and earlier in the text; it took examination of the data tables to see this. The tables show that measurements were made in several different cortical areas. Can the authors provide any evidence that the proportion of neurons with axon-carrying dendrites does not differ in any one species among cortical areas?

      We now provide areal comparisons for 5 fields in monkey (new Figure 4A) and visual fields in cat (new Fig. 4B), both with the same methods. We can even provide a within-individual comparison of brain areas and of methods. Another three areal values for the infant macaque have been plotted in Figure 3B.

      Figure 3 description and/or legend needs to state clearly that different species' neocortex was studied in different areas (and if all Fig3 samples shown are from same layers).

      Figure 3A is total cortex, Figure 3 B is by layers. Counting strategies are now described in detail in methods.

      Supplementary Excel file suggests that for humans Golgi-Kopsch reveals fewer infragranular AcD-cells than Golgi-Cox (4.43 vs 1.39), while for adult macaques Golgi-Kopsch revealed fewer than biocytin injection or SMI-32/BetaIV-spectrin immunofluorescence (13.34 vs 7.98 vs 6.29). Since the human data relies on Golgi methods, the authors must reassure the readers that the comparison of species is validated by direct comparison of different methods.

      The message that primates have fewer cells with axon-carrying dendrites than other mammals might therefore certainly be interesting but far less compelling. The message might be that primate neocortex is not qualitatively different from that of other species; instead they simply have somewhat fewer AcD-bearing neurons than other mammalian species. But even that more modest conclusion is suggested but not fully proven by the data here.

      The referee was right at this point. Having doubled our data sets with more human data we now aggree: the Golgi method underestimates the AcD neurons simply because of optical limitations. We now extensively discuss the issue and we no longer do statistical analysis on human. The issue needs further investigation with more methods.

      I was puzzled by Fig 4 not including primate tissue. If the message is that spine density does not differ in dendrites with and without axons, surely it would be important to include primate tissue in this comparison; the comparison between primates and on-primates is after all the core message of this study. I also do not think the values for each species for non-AcD and shared root should be connected by a line; I suggest instead there should simply be a scatter of values for each group with a large symbol indicating mean or median value of each group. This would facilitate comparison.

      First to the graph on spines, now Figure 6. You have to connect the individual neurons by line, otherwise the major point can no longer be seen: the dendrites differ in spine counts, sometimes the AcD is higher than the other basals of the very same neuron, in the next cell the AcD had a lower count. Statistics did not even suggest a trend. We aggree that things may differ in immature neurons. Possibly, during early development the AcD gains advantages by means of its higher excitability.

      Please read the methods part to this point, elegible neurons had to fullfil a number of criteria. We fully exploited the available material of rat and ferret; no more elegible neurons. We indeed tried the same in macaque. Section thickness 50 µm. We found exactly two neurons which fullfilled the criteria. We had no chance with this material given the enormous dimension of the pyramidal cell dendritic trees in monkey. They were simply cut. For this type of classical tracing studies, non-alternating section series were prepared and submitted to different types of staining. Section spacing was several hundred µm in each individual. No chance to “reconstruct” dendrites from adjacent sections, since there were no adjacent sections.

      The core message of the study is still valid, also without the spine analysis in monkey.

    1. solo thinking isrooted in our lifelong experience of social interaction; linguists and cognitivescientists theorize that the constant patter we carry on in our heads is a kind ofinternalized conversation. Our brains evolved to think with people: to teachthem, to argue with them, to exchange stories with them. Human thought isexquisitely sensitive to context, and one of the most powerful contexts of all isthe presence of other people. As a consequence, when we think socially, wethink differently—and often better—than when we think non-socially.

      People have evolved as social animals and this extends to thinking and interacting. We think better when we think socially (in groups) as opposed to thinking alone.

      This in part may be why solo reading and annotating improves one's thinking because it is a form of social annotation between the lone annotator and the author. Actual social annotation amongst groups may add additonal power to this method.

      I personally annotate alone, though I typically do so in a publicly discoverable fashion within Hypothes.is. While the audience of my annotations may be exceedingly low, there is at least a perceived public for my output. Thus my thinking, though done alone, is accelerated and improved by the potential social context in which it's done. (Hello, dear reader! 🥰) I can artificially take advantage of the social learning effects even if the social circle may mathematically approach the limit of an audience of one (me).

    2. Humans’ tendency to“overimitate”—to reproduce even the gratuitous elements of another’s behavior—may operate on a copy now, understand later basis. After all, there might begood reasons for such steps that the novice does not yet grasp, especially sinceso many human tools and practices are “cognitively opaque”: not self-explanatory on their face. Even if there doesn’t turn out to be a functionalrationale for the actions taken, imitating the customs of one’s culture is a smartmove for a highly social species like our own.

      Is this responsible for some of the "group think" seen in the Republican party and the political right? Imitation of bad or counter-intuitive actions outweights scientifically proven better actions? Examples: anti-vaxxers and coronavirus no-masker behaviors? (Some of this may also be about or even entangled with George Lakoff's (?) tribal identity theories relating to "people like me".

      Explore this area more deeply.

      Another contributing factor for this effect may be the small-town effect as most Republican party members are in the countryside (as opposed to the larger cities which tend to be more Democratic). City dwellers are more likely to be more insular in their interpersonal relations whereas country dwellers may have more social ties to other people and groups and therefor make them more tribal in their social interrelationships. Can I find data to back up this claim?

      How does link to the thesis put forward by Joseph Henrich in The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous? Does Henrich have data about city dwellers to back up my claim above?

      What does this tension have to do with the increasing (and potentially evolutionary) propensity of humans to live in ever-increasingly larger and more dense cities versus maintaining their smaller historic numbers prior to the pre-agricultural timeperiod?

      What are the biological effects on human evolution as a result of these cultural pressures? Certainly our cultural evolution is effecting our biological evolution?

      What about the effects of communication media on our cultural and biological evolution? Memes, orality versus literacy, film, radio, television, etc.? Can we tease out these effects within the socio-politico-cultural sphere on the greater span of humanity? Can we find breaks, signs, or symptoms at the border of mass agriculture?


      total aside, though related to evolution: link hypercycles to evolution spirals?

    1. Author Response

      Reviewer #1 (Public Review):

      In the present study, the authors first analyzed simultaneously recorded human EEG-fMRI data and found the fMRI signatures of burst-suppression. Then, they reported such burst-suppression fMRI signatures in the other three species examined: macaques, marmosets, and rats. Interestingly, their results indicated an inter-species difference: the entire neocortex engaged in burst-suppression in rats, whereas most of the sensory cortices were excluded in primates. The fMRI signatures of burst-suppression were confirmed in several species, suggesting that such signature is a robust phenomenon across animals. These findings warrant further investigation into its neural mechanisms and functional implications.

      Major Issues

      1) One of the major findings is that burst-suppression in primates appeared to largely spare sensory cortices, especially V1. However, as seen in the tSNR map for macaques and marmosets (Figure 3 &4 -figure supplement 4), the tSNR around the primary visual cortex was much weaker than other cortices. Moreover, in marmosets, the EPI slices did not cover the entire brain and actually left most of the V1 uncovered as seen in Figure 4. If so, the authors should draw their conclusions very carefully when talking about the differences in V1 across species. It would be better to analyze and discuss how the tSNR differences affect their findings. For example, the author may consider including the tSNR as covariance in their map analysis.

      The tSNR in the occipital cortex—especially in the macaque V1—is indeed lower than in more anterior parts of the brain. The higher noise in V1 may have obscured the burst-suppression signal and hindered its detection. That said, we think that burst-suppression would still be detectable at such low tSNR values. We base this claim on our analysis of another macaque brain region—area TE of the inferior temporal cortex (see our additions to Figure 3–figure supplement 4). The tSNR in areas TE and V1 is comparably low, and yet TE is significantly correlated with asymmetric PCs while V1 is not. Therefore, if the burst-suppression fluctuation was present in V1 we should have still detected it.

      Regarding the marmoset data, part of V1 was indeed left out of our field of view, as explicitly shown in our figures (Figure 4 and Figure 4–figure supplement 3). Though we cannot exclude the possibility that the omitted posterior V1 engages in burst-suppression, we think that it is unlikely to behave any differently to more anterior visual areas. We sought more support for this view by obtaining full-brain fMRI data in one additional marmoset. We present this analysis in a new paragraph of the relevant Results section and in the new Figure 4–figure supplement 5. The asymmetric PC map in this individual showed widespread correlation across the neocortex, extending slightly further caudally compared with the group map presented in Figure 4. However, nearly all of V1—including the occipital pole—was still uncorrelated. Considering both the new full-brain marmoset data and the results from area TE in macaques, we think that our conclusion about the uncoupling of primate V1 during burst-suppression is still justified. That said, we have now explicitly included the relevant concerns in the manuscript text.

      2) To confirm their findings, it would be great to look into the EEG signals around the sensory cortex (e.g., V1) to see whether the findings in fMRI could be also confirmed with EEG.

      EEG signals around V1 were already examined during the previous analysis of the human dataset (Golkowski et al., 2017). As reported there, the EEG signal of the occipital electrodes did contain bursts, which could not be differentiated from bursts detected by more anterior electrodes in terms of onset timing, duration, or spectral content. This might mean that the BOLD signal in VI is truly uncoupled from electrical activity. However, we should also consider that EEG may lack the spatial resolution to detect a different activity originating from V1. As seen in the human map (Figure 3), the external cortical surface is almost exclusively covered with areas engaging in burst-suppression, whereas the ‘uncoupled’ V1 represents a small patch by comparison. Therefore, EEG cannot safely determine the nature of electrical activity in V1. We have added the above arguments to the last section of Results. We expect a conclusive answer to come from future electrophysiological recordings in nonhuman primates. The larger proportional size of visual areas in macaques and marmosets as well as the possibility of invasive intra-cranial recordings make these animals attractive models for addressing this question.

      3) As seen in Figure 2-figure supplement 2, there was a significant anticorrelation with burst-suppression at the ventricular borders. It is unclear whether the authors have done physiological or white matter/CSF/global nuisance regression as most of the rest-fMRI studies did. Please make it clear. If not, please explain why and discuss whether it would affect their results.

      We chose to analyze the data without CSF or global signal regression. CSF regression typically requires extracting the signal of a few voxels within the ventricles. Accurately placing such voxels is feasible in the human brain but challenging in small animal brains, especially in rodents. Rodent ventricles are very thin, making it difficult to place a CSF voxel that will not overlap with surrounding brain tissue. Since we had prioritized making the analysis as similar as possible across species, we decided to also forgo CSF regression in humans. While this was our original motivation for omitting CSF regression, we later came across an even more important concern. As we show in Figure 2–figure supplement 2, the CSF signal is not ‘noise’; rather, it is directly related to burst-suppression, and most likely caused by it. Regressing it out would remove much of the variance explained by burst suppression. The coherence between neural, hemodynamic, and CSF oscillations that we see in burst-suppression likely also occurs in other states characterized by global synchrony, as has been shown for non-rapid eye movement sleep (Fultz et al., 2019).

      We think that global signal regression makes no sense in our case, given that our goal was to study a nearly global signal fluctuation. Global signal regression relies on the assumption that neuronal activity is variable across brain regions while many non-neuronal sources contribute globally to the brain signal (Murphy and Fox, 2017). This assumption does not hold true in cases where the neuronal activity itself is global.

      4) Three different concentrations of the anesthetic sevoflurane were chosen for human participants. The authors found that the high concentration (3.9-4.6%) induced burst-suppression much better than the other two lower concentrations as expected. However, in rats, almost all asymmetric PCs were found at an intermediate concentration (2%) of isoflurane less at the low (1.5%) or high (2.5%) concentration in Rat 1. At the same time, all fMRI runs from Rat 2 with a 1.3% concentration of isoflurane had a prominent asymmetric PC. That is, it seems that only the high concentration of isoflurane could not induce burst-suppression well in rats, which was opposite to those findings in humans. The authors may explain what reasons may cause such differences and whether such differences may affect the major findings in differences between primates and rodents.

      The three sevoflurane concentrations (‘high’, ‘intermediate’, ‘low’) used in humans do not necessarily correspond to the three isoflurane concentrations used in rats (2.5%, 2.0%, 1.5%). Comparing anesthetic concentrations across our datasets is challenging, since anesthetic potency is expected to vary depending on the drug (sevoflurane or isoflurane), animal species, age, and the co-administration of other drugs. Nevertheless, we may estimate equivalent concentrations across species by expressing them as multiples of the minimum alveolar concentration (MAC), i.e. the concentration that produces immobility in 50% of subjects undergoing a standard surgical stimulus.

      For humans, we can use available age-related MAC charts (Nickalls and Mapleson, 2003) to express the three sevoflurane levels as follows: ~1 MAC (2%), 1.5 MAC (3%), 2.2–2.3 MAC (3.9–4.6%). For rats, we can rely on the previously reported isoflurane MAC value of 1.35% (Criado et al., 2000) to derive the following levels: 1.2 MAC (1.5%), 1.6 MAC (2%), 1.9 MAC (2.5 %), and ~1 MAC (1.3%, Rat 2 dataset). According to these conversions, fMRI-detectable burst-suppression occurred in humans at ~2 MAC (with some cases at 1.5 MAC), in the Rat 1 dataset at 1.2–1.6 MAC, and in the Rat 2 dataset at 1 MAC. There seems to be a difference between rats and humans as well as a discrepancy between the two rat datasets. The latter discrepancy could have arisen from differences in the calibration of isoflurane vaporizers at the two research sites (direct measurements of end-tidal anesthetic concentration were not obtained in rats).

      In order to better interpret the observed human-rat difference we tried to also compute the multiples of MAC values for our nonhuman primate data, but this proved to be hard. For common marmosets, we are not aware of any published isoflurane MAC values. For long-tailed macaques, a value of 1.28% has been reported (Tinker et al., 1977), which gives a range of 0.7 – 1.2 MAC for our macaque dataset. However, that probably underestimates the actual depth of anesthesia in our experiments, since many of our macaques were old and MAC is known to decrease with age (Nickalls and Mapleson, 2003). Moreover, the administration of medetomidine during anesthesia induction may have further reduced the MAC (Ewing et al., 1993). Consequently, we cannot provide good MAC estimates for the nonhuman primate data and thus have no reference for comparison with other species.

      Even if we knew the correct MAC value in all cases, it may be an inappropriate means of standardizing anesthetic concentrations for burst-suppression. The endpoint measured by MAC—immobility—is mainly mediated by anesthetic effects on the spinal cord and my not be a good predictor for effects on the brain (Rampil et al., 1993). In fact, burst-suppression itself has been proposed as a more appropriate endpoint for measuring anesthetic potency. The proposed metric (MACBS) is defined as the concentration that produces suppressions longer than 1 s in 50% of subjects and is not linearly related to MAC (Pilge et al., 2014).

      In conclusion, if we reference anesthetic concentrations against the MAC, humans and rats indeed seem to exhibit burst-suppression at different concentration ranges. We are unable to perform the same referencing for non-human primates, due to lack of accurate MAC values. Moreover, it is unclear whether MAC is the appropriate reference to begin with. Discussing all these nuances would make the manuscript too long. That said, we have now added a new paragraph to the Discussion section, drawing attention to the fact that anesthetic concentrations are not standardized across species.

      Reviewer #2 (Public Review):

      The strong point in their manuscript is the originality of their results. Using the fMRI's spatial resolution, they can successfully reveal that not all brain areas are synchronized during the burst suppression. Furthermore, they can find that the difference is the most obvious when comparing primates with rats, which makes sense considering the distance on the phylogenetic tree. As far as I know, this manuscript first reports these points.

      On the other hand, there is a weak point in their method. As they've already discussed this point, they needed to use arbitrary thresholds to evaluate whether there is burst suppression or not. Furthermore, this study cannot reject the possibility of spatial inhomogeneity and/or anesthesia-specific modulation in hemodynamic response. If there is such a mechanism, one can find different results from those obtained through electrical measurements.

      1) The authors found that some sensory areas in primates are excluded from those highly synchronized during the burst suppression. While it is true, I wonder if each voxel in such areas shows burst suppression-like activity that is not synchronized with others. If this is the case, burst suppression can still be a global phenomenon. Though authors seem to investigate this point, they used in-ROI averaged time-series so that it cannot reject the possibility that each voxel inside the ROI is not synchronized but shows burst suppression in its manner. I recommend the authors look into each voxel if this is the case or not.

      The reviewer raises an interesting point by proposing that it is possible for sub-regions within the excluded areas—e.g. within V1—to exhibit burst-suppression out-of-phase with each other, thus cancelling out in the mean V1 BOLD signal. We do not think this is the case, for several reasons. Firstly, we can exclude the possibility that any part of V1 exhibits bust-suppression in-phase with the rest of the cortex. The original first-level GLM analysis was a voxel-based univariate analysis. If any voxels within V1 were correlated with the global burst-suppression pattern, we would have seen it on the maps. We saw no such effect, except for some subjects in which a subset of V1 voxels was anti-correlated with the asymmetric PC (the effect was not significant in our group analysis). This anticorrelation was mostly located close to the ventral horns of the two lateral ventricles, and thus could have arisen by the same cycle of ventricular shrinkage-expansion that we describe in Figure 2–figure supplement 2. Secondly, no large clusters of V1 voxels exhibited burst-suppression out-of-phase with the dominant asymmetric PC. If this was the case, we would have seen a phase-shifted version of the fluctuation on the carpet plots. This still leaves the theoretical possibility that individual V1 voxels (or a few at a time) exhibit transitions between burst and suppression epochs out-of-phase with each other. In our response to the next point, we will explain why there is no way of detecting this with fMRI and we discuss whether such a possibility would even fit the label of burst-suppression.

      2) The other but similar point is about their way to detect burst suppression. Why did they use the principal component? By definition, burst suppression should be defined by the existence of burst and suppressed periods. I cannot understand why they did not simply use this definition to check whether each voxel shows such an intermittent activity to evaluate whether it is a global phenomenon or not.

      Burst-suppression on EEG is characterized by quasi-periodic suppressions of activity, during which the EEG signal drops close to being isoelectric. We cannot apply the same definition to fMRI, because the BOLD signal only represents relative changes and thus has no natural baseline equivalent to isoelectricity. Hence there is no way of telling whether a BOLD signal decrease corresponds to a complete activity cessation (suppression) or simply a relative decline. Therefore, we instead decided to rely on another defining feature of burst-suppression—synchrony. We knew that burst-suppression appears simultaneously across EEG electrodes, which means that large parts of the cortex (the major contributor to EEG signal) would have to be synchronized. Moreover, we knew that transitions between burst and suppression epochs occur on a very slow timescale and would be resolvable at a TR of 2 s. PCA allowed us to isolate the large slow synchronous component in the cortical BOLD signal, though this is hardly the only approach that would work. We chose PCA because it is a simple, deterministic, and easily interpretable algorithm.

      On a related note, even if we could identify complete cessation of activity in the BOLD signal of a single voxel, it is unclear whether that would qualify as burst-suppression per the EEG definition. EEG electrodes pick up activity from areas much larger than a voxel, and thus the very presence of an EEG fluctuation presupposes synchrony on a larger spatial scale. If individual voxel-sized brain areas engaged in burst-suppression out-of-phase, that would probably not register as burst-suppression on an EEG electrode.

      3) Why is there no synchronization during the slow-wave states under light anesthesia? During the slow-wave sleep, it is shown that the entire cortical network is decomposed into a modular-like network structure. Is there synchronization inside each module while no synchrony between modules?

      We do not claim that there is no synchrony in the slow-wave state. We simply state that this state lacks the nearly global cortex-wide fluctuation that is produced by the abrupt transitions between burst and suppression epochs. In fact, the very presence of slow waves on EEG requires synchrony. However, this slow-wave synchrony occurs at a timescale too fast for fMRI to capture, and thus would not directly translate into a global BOLD fluctuation, as burst-suppression does.

      Though the slow-wave state lacks global synchrony on fMRI, it may well exhibit within-module synchrony, as the reviewer suggests. Modules resembling the resting-state networks of wakefulness and sleep have been detected during isoflurane anesthesia in primates (Hori et al., 2020; Hutchison et al., 2011). These experiments were presumably conducted during the slow-wave state: burst-suppression would generate a global network, while the isoelectric state would erase any modular structure. We suspect that functional networks during the anesthetized slow-wave state resemble those present in slow-wave sleep. However, we have not assessed that in our study, since our primary goal was to map burst-suppression.

      Reviewer #3 (Public Review):

      The authors present a multicenter, multimodal rs-fMRI study of the spatial signature of burst suppression in the brain of humans, non-human primates and rats. They have used EEG to identify burst suppression activity in human data from simultaneous EEG-rs-fMRI measurements of subjects under servoflurane anesthesia. After having identified a (neurovascular) rs-fMRI representation of burst activity, the authors show that bursts can equally be identified from MR data alone. After a principal component analysis, bursts and their spatial signature were identified by an asymmetry of the correlation coefficients. Across species the authors identified similar spatial signatures, which were conserved for all (investigated) primates, but differed for rats. While rats showed a pan-cortical involvement, signatures in primates were more complex, e.g., not including the visual cortex.

      In this study, the authors have presented a novel purely MR-based method to identify burst suppression and its spatial signature. Their method may be used to readily identify burst suppression in fMRI data. However, no general threshold for the median of the cortex-wide correlation could be identified. The authors also establish a conserved signature of burst suppression for primates and reveal subtle but important differences to rodents. Both achievements are novel and represent a major advance in the field of neuroimaging.

      The study was well designed, including important control data to rule out artefacts as source of the observed burst suppression patterns. The particular strengths of this study are: (1) including multicentre data (although only rats were scanned at two different sites); and (2) including four species from humans to rats.

      The manuscript was very carefully and well written (I did not even notice a single typo) and the figures were carefully devised, comprehensively illustrating the large amount of data. The authors further provide a comprehensive account of the relevant literature. Towards the end of their discussion they also clarify the difference in terminology used for burst suppression in some recent rodent studies.

      The only (and in my opinion notable) weakness, is the lack of a general threshold for the asymmetry of the median of the cortex-wide correlation coefficients. With such a threshold, rs-fMRI could be readily used to automatically detect burst suppression across species. However, the authors clearly state this shortcoming and openly discuss its implications. I do not think that an altered experimental design or additional data could provide further remedy.

      To conclude: This very comprehensive study was very well designed, extremely carefully performed, presents a novel tool for identification of burst suppression, and provides insight across species. It has clearly translational potential, which however, is limited by the lack of a general threshold for burst suppression detection.

      I congratulate the authors for this very nice piece of work, and the most typo-free manuscript I have ever read.

      We thank the reviewer for the positive and detailed feedback.

    1. Author Response

      Reviewer #1 (Public Review):

      When theta phase precession was discovered (O'Keefe & Recce, 1993; place cell firing shifting from late to early theta phases as the rat moves through the firing field, averaged over many runs), it was realized that, correspondingly, firing moves from cells with firing fields that have been run through (early phase) to those whose fields are being entered (late phase), with the consequence that a broader range of cells will be firing at this late phase (Skaggs et al., 1996; Burgess et al., 1993; see also Chadwick et al., 2015). Thus, these sweeps could represent the distribution of possible future trajectories, with the broadening distribution representing greater uncertainty in the future trajectory.

      Using data from Pfeiffer and Foster (2013), they examine how neurons could encode the distribution of future locations, including its breadth (i.e. uncertainty), testing a couple of proposed methods and suggesting one of their own. The results show that decoded location has increasing variability at later phases (corresponding to locations further ahead), and greater deviation from the actual trajectory. Further results (when testing the models below) include that population firing rate increased from early to late phases; decoding uncertainty does not change within-cycle, and the cycle-by-cycle variability (CCV) increases from early to late phases more rapidly than the trajectory encoding error (TEE).

      They then use synthetic data to test ideas about neural coding of the location probability distribution, i.e. that: a) place cell firing corresponds to the tuning functions on the mean future trajectory (w/o uncertainty); b) the distribution is represented in the immediate population firing as the product of the tuning functions of active cells or c) (DDC) the distribution is represented by its overlap with the tuning curves of individual neurons; d) (their suggestion) that different possible trajectories are sampled from the target distribution in different theta cycles.

      The product scheme has decreasing uncertainty with population firing rate, so would have to have maximal firing at early phases (corresponding to locations behind the rat), contradicting what was observed in the data, so this scheme is discarded.

      The DDC scheme has an increased diversity of cells firing as the target distribution gets wider within each cycle, whereas the mean and sampling schemes do not have increasing variance within-cycle (representing a single trajectory throughout). The decoding uncertainty in the data did not vary within-cycle, so the DDC scheme was discarded.

      The mean and sampling schemes are distinguished by the increase in CCV vs TEE with phase, which is consistent with the sampling scheme.

      The analyses are well done and the results with synthetic data (assuming future trajectories are randomly sampled from the average distribution) and real data match nicely, although there is excess variability in the real data. Overall, this paper provides the most thorough analyses so far of place cell theta sweeps in open fields.

      We thank the Reviewer for the accurate summary and the encouragement.

      I found the framing of the paper confusing in a way that made it harder to understand the actual contribution made here. As noted in the discussion, the field has moved on from the 1990s and cycle-by-cycle decoding of theta sweeps has consistently shown that they correspond to specific trajectories moving from the current trajectory to potential future trajectories, consistent with continuous attractor-based models (in which the width of the activity bump cannot change, e.g. Hopfield, 2010). Thus it seems odd to use theta sweeps to test models of encoding uncertainty - since Johnson & Reddish (2007) we know that they seem to encode specific trajectories (e.g. either going one way or the other at a choice point) rather than an average direction with variance covering the possible alternatives.

      We thank the reviewer for emphasising the connections to earlier work on theta sweeps during decision making, which suggests that alternative options before a decision point are assessed individually by hippocampal neuron populations in a simple maze. However, as also noted by the reviewer below, previous analysis of theta sweeps in the hippocampus were limited to discrete decisions in a linear maze, which only permits a limited exploration of the alternative hypotheses an animal might experience in a planning situation.

      In particular, the dominant source of future uncertainty in a binary decision task is the chosen option (left or right) providing a distinctly bimodal predictive distribution. Bimodal distributions can not be easily approximated by variational methods (that includes the DDC or product schemes) but can be efficiently approximated by sampling. In contrast, in an open field the available options (changes in direction and speed) are not restricted by the geometry of the environment and the predictive distribution is relatively similar to a Gaussian distribution which can be efficiently approximated by all of the investigated encoding schemes.

      Moreover, it has been widely reported that the hippocampal spatial code has somewhat different properties in linear tracks, where the physical movement of the animal is restricted by the geometry of the environment, than in open field navigation. Specifically, in linear tracks most neurons develop unidirectional place fields and the hippocampal population uses different maps to represent the two opposite running directions, whereas a single map and omnidirectional place fields are used in open fields (Buzsaki, 2005). In terms of representing future alternatives, it remains to be an open question if the scheme that is compatible with planning in a 1D environment generalises to two 2D environments. Our detailed comparison of the alternative encoding schemes provides an opportunity to demonstrate that a sampling scheme can be applied as a general computational algorithm to represent quantities necessary for probabilistic planning, while also demonstrating that alternative schemes are incompatible with it.

      Moreover, these previous studies did not rule out the possibility that, in addition to alternating between discrete options, specific features of the population activity might also represent uncertainty (conditional to the chosen option) instantaneously as in the product or the DDC schemes.

      We added a new paragraph (lines 74-88) to the introduction to clarify that one of the novel contributions of the paper is the generalisation of previous intuitions, largely based on work on binary decision tasks in mazes, to unrestricted open field environments.

      The point that schemes that assume varying-width activity distribution might be unfit for modelling hippocampal theta activity is an interesting insight. Let us note that new results have pointed out that the fixed width activity bump is not a necesssary feature of attractor networks. It has recently been shown that in continuous attractors (modelling head direction cells in the fly) the amplitude of the bump can change and the changes can be consistent with the represented uncertainty (Kutschireiter et al., 2021 Biorxiv; https://doi.org/ 10.1101/2021.12.17.473253). We believe that similar principles also apply to higher-dimensional continuous attractor networks and therefore it is entirely possible to represent uncertainty via the amplitude of the bump (equivalent to the population gain) in the hippocampus.

      Thus, the main outcomes of the simulations could reasonably be predicted in advance, and the possibility of alternative neural models of uncertainty explaining firing data remains: in situations where it is more reasonable to believe that the brain is in fact encoding uncertainty as the breadth of a distribution.

      Having said that, most previous examples of trajectory decoding of theta sweeps have not been for navigation in open fields, and the analysis of Pfeiffer and Foster (2013; in open fields) was restricted to sequential 'replay' during sharp-wave ripples rather than theta sweeps. This paper provides the nicest decoding analyses so far of place cell theta sweeps in open field data. However, there are already examples of theta sweeps in entorhinal cortex in open fields (Gardner et al., 2019) showing the same alternating left/right sweeps as seen on mazes (Kay et al., 2020). Such alternation could explain the additional cycle-by-cycle variability observed (cf random sampling).

      We thank the reviewer for encouraging us to more directly test the idea that alternating left right sweeps could explain the increased cycle-to-cylce variability in the data. We thoroughly analysed the data (see our answer to essential revisions 1.) and found that trajectories at subsequent theta cycles are strongly anticorrelated (Fig. 7, Fig. S11, lines 375-415)

      Reviewer #2 (Public Review):

      This study investigates how uncertainty about spatial position is represented in hippocampal theta sequences. Understanding the neural coding of uncertainty is important issue in general, because computational and theoretical work clearly demonstrates the advantages of tracking uncertainty to support decision-making, behavioural work in many domains shows that animals and humans are sensitive to it in myriad ways, and signatures of the neural representations of uncertainty have been demonstrated in many different systems/ circuits.

      We thank the reviewer for the comment.

      However, studies of whether and how uncertainty is signalled in the hippocampus has remained understudied. The question of how spatial uncertainty is represented is already interesting but recent interest in interpreting hippocampal sequences as important for planning and decision-making provide additional motivation.

      A variety of experimental paradigms such as recordings in light vs. darkness, dual rotation experiments in which different cues are placed in conflict with another, "morph" and "teleportation" experiments and so on, all speak to this issue in some sense (and as I note below, could nicely complement the present study); and a number of computational models of the hippocampus have included some representation of uncertainty (e.g. Penny et al. PLoS Comp Biol 2013, Barron et al. Prog Neurobiol 2020). However, the present study fills an important gap in that it connects a theory-driven approach of when and how uncertainty could be represented in principle, with experimental data to determine which is the most likely scheme.

      The analyses rely on the fundamental insight that states/positions further into the future are associated with higher uncertainty than those closer to the present. In support of this idea, the authors first show that in the data (navigation in a square environment, using the wonderful data from Pfeiffer & Foster 2013), decoding error increases within a theta sequence, even after correcting for the optimal time shift.

      The authors then lay out the leading theoretical proposals of how uncertainty can be represented in principle in populations of neurons, and apply them to hippocampal place cells. They show that for all of these schemes, the same overall pattern results. The key advance of the paper seems to be enabled by a sophisticated generative model that produces realistic probability distributions to be encoded (that take into account the animal's uncertainty about its own position). Using this model, the authors show that each uncertainty coding scheme is associated with distinct neural signatures that they then test against the data. They find that the intuitive and commonly employed "product" and "DDC" schemes are not consistent with the data, but the "sampling" scheme is.

      The final conclusion that the sampling scheme is most consistent with the data is perhaps not surprising, because similar conclusions have been reached from showing alternating representation of left and right at choice points cited by the authors (Johnson and Redish 2007; Kay et al. 2020; Tang et al. 2021) and "flickering" from one theta cycle to the next (Jezek et al. 2011). So, the most novel parts of the work to me are the rigorous ruling out of the alternative "product" and "DDC" schemes.

      We thank the reviewer for helping us to clarify the main novelty of our work compared to previous studies. We have updated the introduction (lines ~74–88) to state more clearly how our analysis extends previous work largely restricted to binary decision tasks in mazes and not explicitly considering alternative probabilistic representations.

      Overall I am very enthusiastic about this work. It addresses an important open question, and the structure of the paper is very satisfying, moving from principles of uncertainty encoding to simulated data to identifying signatures in actual data. In this structure, the generative model that produces the synthetic data is clearly playing an important role, and intuitively, it seems the conclusions of the paper depend on how well this testbed maps onto the actual data. I think this model is a real strength of the paper and moves the field forward in both its conceptual sophistication (taking into account the agent's uncertainty) and in how carefully it is compared to the actual data (Figures S2, S3).

      We thank the reviewer for the encouraging words.

      I have two overall concerns that can be addressed with further analyses.

      First, I think the authors should test which of the components of this model are necessary for their results. For instance, if the authors simply took the successor representation (distribution of expected future state occupancy given current location) and compressed it into theta timescale, and took that as the probability distribution to be encoded under the various schemes, would the same predictions result? Figuring out which elements of the model are necessary for the schemes to become distinguishable seems important for future empirical work inspired by this paper.

      The crucial part of our generative model is its probabilistic nature. Explicit formulation of the generative model under different coding schemes enables us to quantitatively account for the different factors contributing to the variability in the data. Specifically, when we compared sampling and mean codes, we partitioned variability of the represented locations across theta cycles into specific factors related to 1) decoding error; 2) difference between the true position of the animal and its own location estimate; 3) the animal’s own uncertainty about its spatial location; 4) updating this estimate in each theta cycle. This enabled us to derive quantities (CCV, TEE and EVindex) that can discriminate between sampling and mean schemes, and that could be directly measured experimentally. This would not be possible in a simpler model lacking an explicit representation of the animal’s internal uncertainty.

      We believe that the assumptions of the model are rather general and those do not limit the scope of the model. Here we list the specific features of the model for clarity (Fig S1a):

      1) Planned position (Fig S1a, left): the planned position is required to guide movements in the model. The specific way we generated the planned position was not essential for the simulations but we tuned the movement parameters to generate trajectories matching the real movement of the animal. It is defined as a random walk process for velocity which is the simplest model for smooth trajectories.

      2) The inference part (Fig S1a, middle) is crucial for the model since we believe that hippocampal population activity is driven by the animal’s own beliefs about its position, which tells our approach apart from earlier studies (see paragraph around line 466). If the animal represents its predictions optimally then the predictions should be consistent with its movement within the environment. Thus, the consistency of the inference is a critical statistical property of the model, which can be guaranteed if the predictions are generated by the same model that is used for inferring the animal’s position. The simplest model that can be used for inference and predictions is the Kalman filter, which we opted for in our simulations.

      3) The assumptions of the encoding model (Fig S1a, right and Fig 1b) are solely determined by the representational scheme being tested. All of the schemes rely on encoding the result of inference in population activity during theta cycles and the scheme determines how this encoding happens. This part of the model is clearly necessary for the analysis.

      Alternatively, we could use the above mentioned successor representation (SR) framework (Dayan 1993) to represent possible trajectories and their associated uncertainty in our models of hippocampal population activity. However, this option introduces extra challenges: First, in the SR framework (Stachenfeld et al., 2017) neuronal firing rates are proportional to the discounted expected future number of times a particular location is going to be visited given the current policy and position. Thus, the SR does sum over all possible future visits and does not specify when exactly a particular state might be reached in the future which is inconsistent with the idea that trajectories are represented during theta sequences. Second, the SR represents the probability of occupying all future states in parallel without providing possible trajectories defining specific combinations of future state visits. This property is consistent with the product and the DDC encoding schemes but not with the other two. These two properties of the SR implies that this framework per se does not provide a fine-scale temporal description of how expected future state probabilities are related to the dynamics of the hippocampal population activity during theta oscillation.

      Taken together, implementing theta time-scale dynamics using the SR framework would also require several additional model choices to generate consistent temporal trajectories from the expected future state occupancies, and even in this case the subjective uncertainty of the animal would not be consistently represented in the simulated data. Representing the animal’s subjective uncertainty in our model was an important component in contributing to the EV-index and had profound implications on the signatures of generative cycling in a two dimensional arena.

      We have to note that on a slower time scale (calculating the average firing rate over multiple theta cycles) all of our encoding schemes are consistent with the SR framework (line 548).

      Second, the analyses are generally very carefully and rigorously performed, and I particularly appreciated how the authors addressed bias resulting from noisy estimation of tuning curves (Figure S7). However, the conclusion that the "sampling" scheme is correct relies on there being additional variance in the spiking data. This is reminiscent of the discussions about overdispersion and how "multiple maps" account for it (Jackson & Redish Hippocampus 2007, Kelemen & Fenton PLoS Biol 2010), and the authors should test if this kind of explanation is also consistent with their data. In particular, the task has two distinct behavioral contexts, when animals are searching for the (not yet known) "away" location compared to returning to the known home location, which extrapolating from Jackson & Redish, could be associated with distinct (rate) maps leading to excess variance.

      We thank the reviewer for this constructive comment. We note that the signature of the sampling scheme is variability in the decoded trajectory across subsequent theta cycles while overdispersion is usually defined as the supra-Poisson variability in the spiking of individual neurons evaluated across multiple runs or trials. Nevertheless, we tested the existence of multiple maps corresponding to the two distinct task phases and found that the maps representing the two task phases are very similar (Fig S11).

      Such an analysis could also potentially speak to an overall limitation of the work (not a criticism, more of a question of scope) which is that there are no experimental manipulations/conditions of different amounts of uncertainty that are analyzed. Comparing random search (high uncertainty, I assume) to planning a path to a known goal (low uncertainty) could be one way to address this and further bolster the authors' conclusions.

      We agree with the reviewer that the proposed framework provides additional insights into the way the population activity should change with specific experimental manipulations and can therefore inspire further experiments. In particular, a hallmark of probabilistic computations is that experimental manipulations that control the uncertainty of the animal should be reflected in population responses. In the visual processing such manipulations are indeed reflected in changing response variability, as predicted by sampling (Orban et al, Neuron 2016). In the current experimental paradigm there was no direct manipulation of uncertainty (we discuss this around lines 573-576). While one might argue that there are differences in the planning strategy in trials where the animal was heading for away reward and in those heading for home, this is not a very explicit test of the question. Still, to check if we can find traces of changes in uncertainty in the two conditions, we analysed the EV-index separately on home and away trials (Fig. S11e). We did not find systematic differences in the EV-index across these trial types.

      Reviewer #3 (Public Review):

      Summary of the goals:

      The authors set out to test the hypothesis that neural activity in hippocampus reflects probabilistic computations during navigation and planning. They did so by assuming that neural activity during theta waves represents the animal's location, and that uncertainty about this location should grow along the path from the recent past to the future. They next generated empirical signatures for each of the main four proposals for how probabilities may be encoded in neural responses (PPC, DDC, Sampling) and contrasted them with each other and a non-probabilistic representation (scalar estimate of location). Finally, the authors compared their predictions to previously published neural activity and concluded that a sampling-based representation best explained neural activity.

      Impact & Significance: This manuscript can make a significant impact on many fields in neuroscience from hippocampal research studying the functions and neural coding in hippocampus, through theoretical works linking the representation of uncertainty to neural codes, to modeling experimental paradigms using navigation tasks. The manuscript provides the following novel contribution to cognitive neuroscience:

      • It exploits the inherent change in uncertainty about a parsimonious internal variable over time during planning to test hypotheses about probabilistic computations.
      • A full model comparison of competing hypotheses for the neural implementation of probabilistic beliefs. This is a topic of wide interest and direct comparisons using data have been elusive.
      • The study presents substantial empirical evidence for a sampling-based neural representation of the probability distribution over trajectories in the hippocampus, a finding with potential implications for other parts of neural processing. Strengths:
      • Creative exploitation of a naturally occurring change in uncertainty over a parsimonious latent variable (location).
      • Derivation of three empirical signatures using a combination of analytical and numerical work.
      • Novel computational modelling & linking it to neural coding using 4 existing implementational models
      • Comprehensive and rigorous data analysis of a large and high-quality neural dataset, with supplemental analyses of a second dataset
      • Mostly very clear and high quality presentation We thank the Reviewer for the summary and for the positive feedback on the manuscript. Weaknesses:
      • It is unclear to what degree the "signatures" depend on the details of the numerical simulation used by the authors to generate them. At least two of them (gain for the product scheme and excess variability for the sampling scheme) appear very general, but the degree of robustness should be discussed for all three signatures.

      The generality of the signatures follows from the fact that we derived them from the fundamental properties of the encoding schemes. We tested their robustness using both idealised test data (Fig S6c-d, Fig S7b) and our simulated hippocampal model (Fig. 4c, Fig5b-c, Fig6b-g).

      The reviewer is right that the sensitivity and robustness is a potential issue. These schemes have been originally proposed to encode static distributions ie., the neuronal activity was supposed to encode a specific probability distribution for an extended period of time. Therefore, when we test the signatures we make the simplifying assumption that a static distribution is encoded in the three separate phases of the theta cycle. It is currently unknown whether during theta sequences the trajectories are represented via discrete jumps in positions or as continuously changing locations. Therefore we used our numerical simulations to test whether the proposed signatures are sufficiently sensitive to discriminate the encoding schemes using the limited amount of data available and in the face of biological noise but also robust to the parameter choices and modelling assumptions.

      Regarding the product code, the inverse relationship between the gain and the variance has been previously derived analytically for special cases (Ma et al., 2006). In the manuscript we show numerically that the same relationship holds for general tuning curve shapes (Fig. S6d). Finally we demonstrate that the gain is a robust signature that changes systematically along the theta cycles in the case of a product coding scheme.

      Second, in the case of the DDC code we used the decoded variance of the posterior as the signature. Since DDC code relies on the overlap between the target distribution and the neuronal basis functions, potentially the most important source of error is if we overestimate the size of the encoding basis functions. To control for this factor, we first explored this effect in an idealised setting (in fig S7) and found that the decoded variance correlates with the encoded uncertainty both if we used the estimated basis functions or the empirical tuning curves for decoding. Next we performed the analysis in our simulated dataset in 4 different ways - either using empirical tuning curves (Fig 5c-d) or the estimated basis functions (Fig S8a-b), focusing on high spike count theta cycles or including all theta cycles. The fact that all these analyses led to similar results confirms the robustness of this signature.

      Our third measure, the EV-index measures the variability of the encoded trajectories across theta cycles. The cycle-to-cycle variability is also affected by factors independent of whether a randomly sampled trajectory or the posterior mean is encoded. In particular, the encoded trajectory can start at different distances in the past and can be played at different speeds in different theta cycles. These factors are probably present in the data and all inflate the CCV. Another factor is the start and end time of the trajectories, which we may not be able to accurately find in the real data and confusing the end of a previous trajectory with the start of a new one can also inflate CCV. In our simulations we tested how these potential errors influence our analysis, and found that the EV index is surprisingly robust to such changes (Fig 6fg). An additional factor that the EV-index is sensitive to is the specific sampling algorithm used to sample the posterior: an algorithm that produces correlated samples is hard to distinguish from the MAP scheme. Our newly introduced analysis (Fig 7b) demonstrates this and explores the level of correlation between subsequent trajectories, providing evidence that trajectories decoded during exploration reflect the properties of anticorrelated samples, also a signature of efficient inference.

      • The claims about "efficiency" lack a definition of what exactly is meant by that, and empirical support.

      We thank the reviewer for pointing out this inconsistency in our terminology. What we generally meant by efficiency was a claim that pertains the computational level, according to Marr’s classification, i.e.that computations are probabilistic, that is, representation in the hippocampus takes into account uncertainty by representing a full posterior distribution. We performed an additional test, which concerns the algorithmic-level efficiency of the computations. We explored the efficiency of the sampling process by assessinga signature of efficientsampling, the expected number of sampled trajectories required to represent the distribution of possible future locations. We found that subsequent samples tended to be anti-correlated which is a signature of efficient sampling algorithms (Fig 7). In the revised manuscript we thus use the word efficient solely when we refer to the anticorrelated samples.

    1. Author Response:

      Reviewer #2:

      The authors investigated changes in the unstressed and stressed oligomeric states of the mammalian endoplasmic reticulum (ER) stress sensor, IRE1a. Previous biochemical and microscopy studies in mammalian cells and studies of the related protein Ire1 in yeast, describe an increase in oligomerization of the stress sensor upon treatment of cells with chemical agents that impair the ER protein folding environment. The general view has been that IRE1 in unstressed cells is a monomer and varying degrees of misfolded protein stress stimulate dimerization, activation, and higher order oligomerization. Distinguishing between monomers and dimers, as well as tetramers or other small oligomers is technically challenging, especially for integral membrane proteins. To address this challenge, the authors turned to single particle tracking fluorescence microscopy of Halo-tagged endogenous IRE1. Using a clever combination of random labeling with two fluorescent dyes and oblique angle illumination to visualize single molecules, as well as dimers, the authors surprisingly find that their endogenous IRE1 reporter appears to be dimeric in homeostatic cells. This observation challenges the predominant model in which IRE1 is monomeric in unstressed cells and that even dimerization represents a switch into an active state. The authors claim to detect evidence for higher order oligomers following treatment with stressors. The authors then use a series of IRE1 mutants to identify how oligomerization is regulated and present a new model to reconcile the different models of IRE1 activation in the literature.

      The authors have extensively characterized their novel experimental system in terms of protein expression levels, functionality, and ability to distinguish monomers and dimers. The data are well presented and the authors are clearly familiar with the arguments that have surrounded the IRE1 oligomer question. That the authors observe the characteristic XBP1 mRNA splicing activity in the absence of visible large IRE1 clusters may suggest that the large clusters reported by others may have distinct roles, perhaps in more permissive mRNA cleavage.

      The present study is undermined by two major weaknesses. First, while the authors persuasively demonstrate that they can detect IRE1a dimers, a major claim of the manuscript rests upon detection of tetramers and possibly higher order oligomers. Unfortunately, the authors provide no independent controls to show what tetramer or higher order oligomer data would look like. Thus, the authors can only infer that higher order oligomers are detected, based on modest shifts in the percent of correlated particle trajectories observed in some cells. More robust evidence is needed to make claims of oligomerization. Tools have been developed by others that can induce reversible oligomerization of proteins. Application of these tools would provide powerful controls for tetramers or even higher order oligomers in this study.

      The second, deeper concern, is the discrepancy between the Halo Tag clustering results in this study and studies by this lab and several other labs that report a distinct stress phenotype. In mammalian cells and yeast, IRE1 and Ire1, tagged with different fluorescent proteins or even a small HA peptide epitope tag, undergo quantitative visible formation of puncta or clusters upon treatment with stressors. The small number of bright clusters that form effectively deplete the rest of the ER of IRE1 signal. In the present study, the authors observe no visible change in IRE1-Halo localization in stress cells. The authors do not investigate the cause of this difference. While one might argue that the presence of stress-inducible IRE1 activity is sufficient to argue that the reporter in this study is functional, IRE1 reporters (that do cluster) described in previous studies by the Walter lab and other groups are also demonstrably functional. Does IRE1 normally cluster? Is it cell-type dependent? Tag-dependent? Notably, the Pincus et al. PLoS Biology paper from the Walter lab used two different fluorescent protein tags that do not heterozygously dimerize. Robust colocalization and FRET signals were detected upon treatment of cells with stressors and clustering was subsequently observed. A 2007 Journal of Cell Biology study from Kimata et al. reported clustering in yeast with an Ire1 tagged with an HA epitope peptide. The HA peptide seems unlikely to be prone to any oligomerization propensities that GFP tagged reporters might experience. Importantly, a 2020 PNAS paper from the Walter lab (Belyy et al.) studied clustering of a robustly monomeric mNeonGreen-tagged IRE1 in U2-OS cells and mouse embryonic fibroblasts and this construct readily clustered following stress induction.

      When evaluated against the backdrop of the extensive literature describing the visual behavior of IRE1a in live cells, the absence of stress-induced clustering is both puzzling and disconcerting. Given the focus of this study is to use visual techniques to study IRE1a interactions, the burden of proof is on the authors to resolve this significant discrepancy with the rest of the IRE1a literature. One can easily imagine that incorporation of the majority of the pool of IRE1a into 10-100 clusters could produce very different correlated trajectory behavior. Until the authors can determine why their reporters behave differently from other IRE1a reporters and establish which version accurately reflects physiologic IRE1a behavior, the potential impact of the findings of this manuscript are of unknown value.

      We thank the reviewer for this detailed assessment of our work. We agree that the question of apparent discrepancy in the formation of observable IRE1 clusters between this manuscript and earlier work is important. We have now addressed this issue both in the revised version of the manuscript and in specific point-by-point responses to reviewers’ comments. As a brief summary, we addressed the reviewer’s first concern (lack of controls larger than dimers) by cloning and validating a tetrameric HaloTag construct, the measurements from which were entirely consistent with the model we presented in the original version of the manuscript. To address the reviewer’s second concern, we present several lines of evidence showing that the discrepancy between the formation of microscopically visible IRE1 clusters in earlier studies and the absence of such clusters in the present work almost certainly results from differences in expression levels. First, our IRE1-HaloTag construct is perfectly capable of forming stress- induced clusters, as we show in the new Figure 1 – Figure Supplement 3. Second, we point to a parallel study by Gómez-Puerta et al., who demonstrate that a more “conventional” IRE1-GFP construct does not form visible stress-dependent puncta when it is expressed at a low level comparable to that of untagged IRE1 in HeLa cells, despite being fully active. Third, our earlier work in the 2020 PNAS paper referenced by the reviewer actually showed that even in the overexpression context, IRE1-mNeonGreen only forms visible puncta in just over half of all cells, despite the fact that XBP1 processing is nearly 100% effective in bulk assays. Furthermore, in the same paper we show that, rather than all IRE1 molecules being sequestered in clusters, only a small fraction (~5%) of IRE1-mNeonGreen assembles into large puncta while the remaining 95% of IRE1 stays uniformly distributed throughout the ER. Taken together, we believe that IRE1 does have the propensity to assemble into larger clusters when its expression levels are high (regardless of the tag used), but that these clusters are not strictly required for its activation. We have made significant changes to the discussion section of the manuscript to clarify the above points and directly address the apparent discrepancy between the present work and earlier studies.

      Reviewer #3:

      In this paper, the authors' aim was to test how IRE1's oligomerization state relates to its activation status without relying on ectopic overexpression. The principle underlying the work is a rather simple one, which is that, if the population of IRE1 can be labeled stochastically with either of two different fluorescent probes, then if the protein dimerizes, presuming single molecules can be visualized, correlated migration of a spot of each fluorophore should be observed for some of those dimers. Any correlated migration, maintained for long enough, will by necessity by some sort of dimer or multimer. In principle, if my math is right, the correlation should be 50% of spots of each color, assuming all the molecules are in a dimer, all molecules are labeled with one fluorophore or the other, and the koff of the fluorophores is very low. In practice, the correlation appears closer to 10%, which the authors establish using a control molecule that should not dimerize except by chance, and another for which pseudo-dimerization is enforced due to the two HALO domains used to bind the fluorophores being conjugated to the same molecule in cis. Much of the paper is devoted to establishing the fundamentals of the system. For these experiments, the authors replaced endogenous IRE1 with the HALO-tagged version to generate near-normal expression and show that the IRE1-HALO behaves similarly to endogenous. They also show that correlated migration is observed in the dimer control to a much greater extent than in the monomer.

      Using these findings, they demonstrate, in my mind quite conclusively, that IRE1 exists as a dimer even in the unstimulated state. During ER stress, the authors observe a state that is more highly ordered. Mathematical modeling suggests a transition from predominantly dimers to a mix of dimers and something more highly ordered, with tetramers being the simplest explanation. Satisfyingly, a mutation that breaks the known dimer interface causes the protein to exist solely in monomers, as does deletion of the IRE1 lumenal domain, while disrupting the oligomerization interface keeps the protein as dimers. Mutation or deletion of the kinase and RNase domains does not affect higher order status, suggesting that activation of these domains is not a prerequisite for assembly. It is clear from this that the central claims of the paper, which is that IRE1 exists in a dimer in the basal state and transitions to a higher ordered structure in the activated state, are supported. Moreover, the general approach is likely to be appealing to the study of other molecules activated by multimerization.

      We thank the reviewer for this thoughtful and helpful analysis of our work.

      The principal advance of the paper is the technological approach for tracking IRE1 (and, presumably, other molecules whose activity is regulated by dimerization). The approach is quite elegant for that purpose. Its impact in terms of conclusions about IRE1 is perhaps less clear. The authors rationalize their endogenous-replacement approach by describing how their previous efforts and those of others relied on ectopic overexpression of GFP-tagged IRE1. The authors take great pains to claim that the observed multimerization status of the IRE1-HALO constructs is not a function of expression level, which would imply then that expression level alone is not responsible for the previously observed IRE1 oligomeric puncta. It is not clear why exactly the authors' results differ from this group's previous studies on the topic nor where the truth lies, including whether something inherent to the GFP-tagged overexpression approach favors non-physiologic structures, whether the difference is fundamentally one of cell type, or whether multimerization and activation are correlated but not causally related, with multimer-breaking mutations killing IRE1 by some other mechanism.

      The question of reconciling our present data with earlier work (including work from our group) is clearly and understandably a central question for all three reviewers. As we detailed above in our responses to reviewers 1 and 2, we are convinced that the formation of large IRE1 clusters is largely dependent on expression level rather than the differences between fluorescent protein tags and the HaloTag. We added new supplementary figures and substantially revised the text of the manuscript to address this question directly.

      Interpreting the data is also complicated by the fact that, while the authors point out that the percent of correlated trajectories (i.e., the measurement of multimerization state) does not itself correlate with expression level (using trajectories-per-movie as a proxy), the proper conclusion from that lack of correlation is not that variance in expression level does not account for the changes in apparent multimerization status, but instead that it cannot be the only factor. In some sense, the authors are attempting to play the argument both ways, by arguing that expression level matters for IRE1 activation (from previous studies) and that it doesn't (from this study). I think to address this the authors will need to better account, one way or another, for why the findings presented here differ from their previous findings and why these are the more salient (if in fact they are).

      This is a very important point, and we thank the reviewer for raising it. We are not arguing that expression levels do not matter for the formation of oligomers; quite the contrary, as detailed above and in the revised version of the text, we believe that the formation of massive IRE1 oligomers observed in previous studies and in the new Figure 1 – Figure Supplement 3 is mainly a function of elevated concentration. What we do claim is that our approach can reliably pick out oligomeric differences within the relatively narrow range of concentrations used for single-particle tracking experiments in this paper. We are using the very weak truncated CMVd3 promoter in all transient transfection experiments, and we are only analyzing data from cells that have a comparable density of single-molecule spots to the density we observe in endogenously tagged IRE1-HaloTag cells. In fact, the metric of “trajectories per movie” used as a proxy for expression levels in Figure 5 – Figure Supplement 1 is an overestimation of the true variability of expression levels, since each movie only covers a small fraction of each cell’s area and the number of observed molecules varies depending on cell morphology. Practically speaking, all cells that we image have expression levels that are clustered together rather narrowly, roughly within differences of no more than a factor of 3. These levels, in turn, are significantly lower than the expression levels used in earlier papers by our group and others.

      The other somewhat substantial issue is that there is no control for what higher order structures look like. The authors give no sense for the dynamic range of the multimerization assay. I would presume that tetramers would show a higher percentage of correlated trajectories than dimers, and octamers higher still, and that the mathematical model accounts for this theoretical possibility in calculating an average protomer number of 2.7 in the stress condition, but it would be better to see that in practice; at first glance it would seem that engineering a tetrameric and/or higher order control and validating it would be straightforward.

      This is another great point raised by all reviewers. In the revised version of the manuscript, we engineered a new tetrameric control construct (See Figure 2 – Figure Supplement 1), the results from which agree remarkably well with the mathematical model we developed in the original version of the manuscript (see Figure 2 – Figure Supplement 3)

      Lastly, the data analysis lacks statistical justification for its conclusions. I presume given the high number of readings that the observed changes are all statistically significant, but that should be indicated, as in most cases the 95% confidence intervals shown are overlapping.

      This is another excellent point. The reviewer is correct that all relevant conclusions are statistically supported by the data, and our analysis code immediately calculates pairwise p- values for every plot using one of several relevant tests. Our preferred test is the permutation test, since it makes no assumptions about the underlying distributions being compared. To avoid cluttering the main plots, we have included tables of pairwise p-values for each plot in the revised version of the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Responses to reviewers’ comments are in blue text, original reviewers’ comments in black text.

      Response to Reviewer 1.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this manuscript Neiro et al. aim to expand our knowledge on the regulation of gene expression in stem cells of the planarian model organism. As a first step the authors used published available data to expand the repertoire of the planaria transcriptome. By combining 183 RNAseq datasets the authors were able to identify thousands of new coding and non-coding transcripts. They then screened for TF motifs in the new annotations, identifying 551 putative TFs, of which 248 were already described in the planarian literature. The most substantial contribution of this work to the field of stem cells and planaria biology is the characterization of new putative enhancers that were identified by performing H3K27ac ChIP-seq and ATAC-seq and combining these data with previously published H3K4me1 ChIPseq dataset.

      We thank the reviewer for their careful assessment of our work, we agree that the identification of likely enhancers genome wide is a substantial contribution. Equally the improved annotation of all genes, including transcription factors we choose to focus on here, is a substantial step forward for the planarian research community.

      By overlapping H3K27ac and H3K4me1the authors find 5,529 new enhancers, for which they report a higher chromatin accessibility than random points in the genome as assessed by ATAC-seq. By using ATAC-footprints Neiro et al. refined the subset of TFs that have binding motifs in the predicted enhancer-like regions and present a list of 22,489 such factors. The manuscript is well written and organized and overall, the reported data will provide an important resource to study gene expression regulation in planaria's stem cells. However, this manuscript would greatly benefit from some functional validation to support the predicted gene regulatory networks. One option would be to use a CRISPR-dCas9-KRAB system to silence the putative enhancers identified in the manuscript and check by qPCR the expression of nearby genes.

      Currently mis-expression technologies, in order too directly test enhancer elements in driving expression, are still not available in planarians. This also preempts us using the suggested silencing system used in mammals and other animals with robust mis-expression tools.

      If this type of experiment is not feasible in planaria (I am not an expert in this model organism) another simple but key experiment would be to perform a knockdown of one (or more) putative enhancer-bound TFs identified in this study followed by RNA-seq. This would allow the authors to verify what are the target genes of the putative enhancer-bound TFs and if they correspond to the predicted gene networks they identified. Simultaneously, this experiment would allow the authors to verify if there are any changes in the expression of differentiation/pluripotency markers as a result of the knockdown of the putative enhancer-bound TF.

      These experiments are possible, but this would be the work of many labs in the future expert in studying those TFs and their roles in planarian stem cells and regeneration. However, what we can do is analyze existing RNA-seq data further. There are a number of studies where TF have been studied and RNA-seq performed after RNAi. Although these studies are performed in specific experimental regenerative contexts, and not specifically in stem cells, it will be possible to look at expression changes of genes with predicted enhancers bound by these TFs. We propose to execute this analysis and add it to the manuscript, rather than perform further TF RNAi experiments. This analysis is feasible within a 3-month revision time. We would add that currently their no genes are implicated in controlling pluripotency in the same way we might consider, for example, OSKM in mammals. Our identification of the TFs enriched in stem cell expression and implicated in binding predicted enhancers suggests future candidates.

      Minor revision: • The authors have mostly focused on the identification of enhancer-bound TFs. However, it would be interesting to look at differential enrichment of TFs in promoters versus enhancers and identify if there are specific factors that are enriched specifically at the planarian newly identified enhancer regions.

      We have not looked at potential TF binding sites near promoters/transcriptional start sites. We will try to add an analysis that considers this in our revision.

      • All tornado plots are missing a colorbar (Fig3 and FigS2)

      We will fix this error

      • There is a typo in the discussion: "the combined use of chip-seq data, RNAi of a histone methyltransferase combines with chip-seq" should be changed to "combined".

      We will fix this and other typographical errors.

      Reviewer #1 (Significance (Required)):

      The manuscript is well written and organized and overall the reported data will provide an important resource to study gene expression regulation in planaria's stem cells.

      We thank the reviewer for their appreciation of our work

      **Referees cross-commenting**

      I agree with the other reviewers that additional functional data should be added to support the author's claims (such as knock down of potential TFs that are identified by computational analyses and assessing the impact on gene expression).

      See response above, with regard to adding further analysis for testing this possibility.

      In addition, as noticed by the third reviewer, all data should be made publicly available to the scientific community.

      We have made all data publicly available and will submit all relevant data to public database repositories in advance of final publication after final peer review.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This manuscript aims at identifying enhancers in the planarian Schmidtea mediterranea. The authors start with the integration of transcriptome with genome sequencing data to more precisely annotate the genome of the planarian Schmidtea mediterranea. The second part of the manuscript actually then deals with the identification of potentially active enhancer elements in adult stem cells of this regenerating organism using genomic techniques like ATAC-seq and ChIP-seq of histone marks combined with motif searches and in silico footprint analysis. Using these data, the authors predict regulatory interactions potentially critical for pluripotency and regeneration in planarian adult stem cells.

      MAJOR COMMENTS:

      • Are the key conclusions convincing? 1) The authors claim (already in the abstract) that their study identifies enhancers regulating adult stem cells and regenerative mechanisms. This is an over-statement found throughout the manuscript, as none of these enhancers are functionally tested nor is it shown that target gene expression changes when transcription factors predicted to interact with such enhancers are knocked down.

      We agree and it was not our intention to overstate our results, this is why we have tried to refer to putative enhancers, enhancer-like elements etc in manuscript from the title onwards. Only once we have demonstrated a set of elements with key conserved and widely supported characteristics do we suggest we have a set of higher confidence enhancers to study. However, we will adjust the manuscript to reflect that our claims await direct testing as is the case for all enhancers implicated with the approaches used here.

      Another example is at the end of paragraph 1 of section 2.4. Here the authors claim that identifying many fate-specific transcription factor genes in the vicinity of potential enhancers is a further proof that the identified regions represent "real enhancers". It strongly supports this hypothesis, but no evidence for real enhancer activity.

      We agree the total body of evidence strongly supports that we have identified enhancer elements, but as above will adjust the language to suggest further directed functional work will follow from many groups.

      Thus, although the authors state that the regulatory interactions and networks they predict from their data can be studied now in future, they should be more careful with their wording and correct these over-statements. Therefore, the key conclusion is that they identified by various techniques potential enhancers, which are close to genes controlling adult stem cells and potentially controlling these genes, which has to be shown by further analyses.

      We agree

      Thus, also the title needs to be changed.

      We propose changing ‘enhancer-like’ to “predicted enhancers” in the title, and "defines" to "predicts" as well as broadly adjusting the text to caveat that further work will clarify their functions and roles.

      The authors have no proof that the networks are active in planarian adult stem cells, as they do not show that the predicted networks are active in the presented way.

      We agree, see comments above. It was not our attention to claim we are showing pathways that were definitely active, rather predicted by our experiments and analyses of the data from these experiments.

      2) Similarly, the identification of TF motifs within these potential motifs strongly suggests but not shows that these factors are binding, even when these sites were found to be bound by a protein using the ATAC-seq footprinting analysis. Thus, the authors need to be careful with their wording. One example is in the second paragraph of section 2.5, where the authors write that "We found that numerous FSTFs were binding to putative intronic enhancers ... ". The motif suggests that these factors bind, however, they have no experimental confirmation that these sequences are indeed bound by the planarian TFs.

      We agree. We will clarify that ATAC foot printing is the only data suggestive of these motifs being bound and that further experiments will be required for more evidence. We will state this in the section of results and add this explicitly to the discussion

      In sum, this manuscript uses existing genomic tools to define potential enhancer regions in the planarian Schmidtea mediterranea. The manuscript is informative yet descriptive, as tit presents no functional evidence for any of the predictions. If further toned down, the key conclusions are valid.

      Future functional experiments to test the roles of all TFs and enhancers is now possible due to our work.The combination of data and analyses provides strong support of enhancer elements activity in stem cells across the genome.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? The experiments performed are well designed and in line with what is known in the field about enhancer architecture. However, as this model system is not very well characterized on that level and the authors do not provide real experimental evidence that any of the identified regions has really enhancer activity and that any of the identified motifs binds indeed the predicted TF, the authors need to be very careful with their statements. The authors should maybe emphasize even stronger that all the GRNs predicted under section 2.6 are really preliminary and need to be validated.

      Yes, we are happy to be even clearer about this as the reviewer suggests

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. One experiment that could provide more evidence for their predicted regulatory interactions is to knock-down one of the FSTFs for which motifs have been identified in potential enhancer regions and to study expression of associated genes (to confirm that the enhancers potentilla bound by these TFs control the expression of associated genes) or by analyzing the chromatin status of selected chromatin regions (by Q-PCR). These experiments would strongly support the claims of the authors. However, it also depends strongly on the journal whether I would consider these experiments essential or "nice to have".

      This suggestion of possible extra experiments is very similar to that of Reviewer 1. We are copying our earlier comment as this also addresses this point.

      “These experiments are possible, but this would be the work of many labs in the future expert in studying those TFs and their roles in planarian stem cells and regeneration. However, what we can do is analyze existing RNA-seq data further. There are a number of studies where TF have been studied and RNA-seq performed after RNAi. Although these studies are performed in specific experimental regenerative contexts, and not specifically stem cells, it will be possible to look at expression changes of genes with predicted enhancers bound by these TFs. We propose to execute this analysis and add it to the manuscript, rather than perform further TF RNAi experiments. This analysis is feasible within a 3-month revision time. We would add that currently their no genes implicated in controlling pluripotency in the same way we might consider OSKM in mammals. Our identification of the TFs enriched in stem cell expression and implicated in binding predicted enhancers suggests future candidates.”

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. This reviewer is not an expert in Schmidtea mediterranea, thus it is hard to judge how time consuming these experiments would be. Cost-wise they should be feasible, as it would include primarily Q-PCR experiments. And some functional back-up of their claims would be very helpful.

      See previous comment regarding additional analysis.

      • Are the data and the methods presented in such a way that they can be reproduced? For the parts I can judge, yes.

      • Are the experiments adequately replicated and statistical analysis adequate? It is not clear from the manuscript how many replicates of the ChIP-seq experiments were done.

      Chip-Seq replicate data description will be explicitly added to the methods

      MINOR COMMENTS:

      • Specific experimental issues that are easily addressable.

      • Are prior studies referenced appropriately? For the literature I can judge, yes.

      • Are the text and figures clear and accurate? The figures are clear, the text (besides over-statements) is clear. However, the writing can be improved. A few examples: section 2.2 paragraph 1: "... we found 248 to be described in the planarian literature in some way." In which way described?; same paragraph: "... but significantly we could identify new homologs of ..." what does significantly mean? Which test etc? section 2.2, last paragraph: "Most TFs assigned to the X1 and Xins compartments and the least to the X2 compartment", "Very few TFs had expression in X1s and Xins to the exclusion of X2 expression as would be expected by overall lineage relationships"; what do these sentences mean?

      We thank the reviewer for paying careful attention to the language in our manuscript throughout. We will provide clearer explanation of the sentences indicated. We will better explain terms specific to the planarian model system that are obviously not intuitive

      . - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No over-statements.

      See previous comments agreeing with the need to carefully adjust our language to avoid this

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. This manuscript identifies genome-wide potential enhancers in adult planarian stem cells, and thus represents a very valuable resource for the community to study these enhancers and the gene regulatory networks they control in the future.

      • Place the work in the context of the existing literature (provide references, where appropriate). As I am not a planarian scientist, it is hard to judge this part.

      • State what audience might be interested in and influenced by the reported findings. In my opinion, this work will be primarily interesting for people working with planarian. When functional data exist, this might be also interesting for researchers working generally on regeneration.

      Given the nature of our data we also think all groups working on animal stem cells would be interested in our data and analyses

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. My field of expertise is transcriptional regulation using genomic techniques, however I am not familiar with the model Schmidtea mediterranea.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Neiro et al. capitalize on existing genomic data for the planarian Schmidtea mediterranea and new ChIP-seq and ATAC-seq data to use computational approaches to identify putative enhancers in the planarian genome. They integrate analysis of enhancers with transcription factor binding sites to generate testable hypotheses for the regulatory function of transcription factors active in stem cells or control of cell lineage trajectories. Their work creates an excellent resource for future work to resolve the regulatory logic underpinning stem cell biology and tissue regeneration in planarians.

      We are glad the reviewer likes our research.

      Major: Overall, the work in this manuscript and methodology are well executed and presented. However, the authors should consider the following comments to improve the clarity and accessibility of the data and interpretations.

      1) The new transcriptome does not appear to be publically accessible. The links to Github resources are broken, and there is nothing on Neiro's Github page. Will the new transcriptome be integrated with Planmine?

      The new annotation has been available for over a year as we wished the community to have access to it ASAP (see Garcia Castro, 2021, Genome Biology https://doi.org/10.1186/s13059-021-02302-5). We tested the links in the paper before depositing our preprint and after review and they seemed to work for us both within and outside our institutional network. We can only apologize if they were broken or have not worked for the reviewer. We are unclear if this new annotation will be included in Planmine, but we will ask the colleagues maintaining this database to consider including it.

      2) Figure 1: Ternary plot in 1F. The legend is not clear or could be explained better. What is the metric? It could be my misunderstanding, but I didn't consider the ternary plots as insightful or unnecessary. Perhaps the authors can expand on what they are showing.

      These plots are important in demonstrating the distribution of mRNA expression of all genes across cell sorted compartments. Given the broad lineage relationship between sorted cell compartments This analysis allows us to identify genes expressed predominantly in one cell compartment or another, or across a specific transition. For example, genes enriched in X2 cells and Xins, but not X1 are likely to be enriched in post-mitotic differentiating progeny and differentiated cells. In contrast to single cell data where expression data can be sparse this analysis with bulk data allows identification and assignation of low expressed genes, like transcription factors. We will provide further explanation of this in the revised text.

      1I is a map of exons, not alternative splicing. So, it isn't clear what the authors intend t show. Are the specific exons that are more likely to be spliced? Is the figure necessary?

      We wish to demonstrate the power of annotation approach and the richness of the annotation for looking at alternate splicing. We propose to a more informative figure that indicates the variety of splice forms. We apologize for this oversight.

      3) Figure 2: 2A labels Xins as irradiation responsive. Is this the case (just making sure)?

      The reviewer is correct, this is wrong! This should read “irresponsive” or “irradiation resistant” In Figure 1A. We thank the reviewer for spotting this error. We will fix this.

      2F-G: Ternary plot in F seems redundant with G, but that could be my lack of understanding. In 2G, what is represented on the plots on the right of the hierarchical clusters?

      The ternary plot (2F) and heatmap of hierarchical clustering (2G) are complementary ways to visualize the proportional expression values of transcription factors. The ternary plot (2F) allows an overview of all the proportional expression values, while the heatmap (2G) shows how the proportional values may be grouped into clusters of similar expression profiles and displays the relative size of these clusters. For example, the heatmap shows that the clusters of X1 and Xins are more prominent than X2, suggesting that there are realtivey a few X2-specific transcription factors. We will add text to better to explain this difference.

      4) Figure 3: The heat maps need a legend (i.e., please define the colors). In addition, labeling the figures could help the reader. For example, in G-J, a header about the different experiments above each map, such as "enhancers" and "random," etc., would make the figure more accessible.

      We agree we label the figures to be more easily interpretable and provide an independent scale and legend for the heatmaps.

      5) Figure 5: Although it is in the figure legend, the authors could label the 6th track as "RNA-seq in X1."

      We will add this to the figure.

      6) Section 2.6 second page last sentence of the first paragraph "GRN of asexual reproduction is not active in neoblasts" data in the supplement? Is it not shown?

      We apologize for this poorly written sentence. In line with Reviewer 2s comments this statement needs to be toned down and clarified. The raw information is included in the general table of enhancers (Supplementary Table 2), but the genomic tracks visually highlighting the motifs at the promoters of lox5b and post2b were not included. We will add these to the Supplementary information and clarify Supplementary Table 2.

      7) Discussion: The discussion about pluripotency factors in planarians could be expanded. The authors could contrast the study's findings with Önal et al. 2012.

      We agree we will expand our discussion to compare with previous studies and also summarize what is available from other animals with pluripotent adult stem cells

      Minor: The manuscript has no page numbers or line numbers, so I'll provide a general location of the potential issues.

      1) Section 2 - newly identified isoforms are shorter (1656 vs. 1618). Is the order of the median length reversed?

      Yes, we will correct this.

      2) No mention of Figure S1B in the text.

      It is mentioned in the paragraph regarding splicing, but perhaps not in a useful context. We will add a correct reference to this figure in the presentation of transcript diversity.

      3) Figure 1H should be 1I in the text?

      Yes, we will correct this

      4) The discussion contains some minor typos and grammatical errors.

      We will address with careful rereading.

      We thank the reviewer for spotting these errors and we will fix them in revision.

      Reviewer #3 (Significance (Required)):

      Neiro et al. provide an excellent resource for the planarian community. The paper is generally very well written and easy to read. The new transcriptome described, which improves the annotation of the planarian genome, should be made readily available. It would be excellent if the transcriptome could be incorporated in Planmine.

      We will ask Planmine and the Rink lab to consider this. The annotation (without broad analysis) has been available since the pre-print for Garcia Castro, 2021, Genome Biology was deposited in BioRxiv.

      Furthermore, the authors provide a comprehensive list of transcription factors in the planarian Schmidtea mediterranea. Their work provides insight into which factors are highly expressed in the stem cell compartment. Their computational identification of transcription factors and putative enhancers will be helpful to the growing community of researchers studying stem cell and regenerative biology using planarians. In addition, the large dataset generated in this study could inform studies in the evolution of regulatory sequences and transcription factor function.

      **Referees cross-commenting**

      The data presented are well supported by previous studies. As noted by the authors, it is not possible to make transgenic planarians, and thus the field needs to rely on indirect methods. The authors focus on using the stem cell population, which can be isolated from the animals. Overall, I don't think additional experiments are necessary. Additional RNAi experiments combined with RNA-seq (using the stem cells) could take 6-12 months to complete. I believe this is a solid contribution that should be framed as a resource paper. The authors should pay close attention to Reviewer #2's suggestions and edit the paper accordingly.

      I have 20 years of experience in the field. It would be unreasonable to ask the authors to do more experiments, especially in this post-pandemic environment. I hope this helps.

      We thank the reviewer for the comments.

    1. Reviewer #3 (Public Review): 

      In this paper, Troendle et al investigate changes in alpha oscillation across childhood and adolescence. The main goal of this investigation is to examine how alpha oscillations change across these age ranges, by investigating a large open dataset and adopting new methods that should help to address methodological limitations of many previous analyses. In particular, a key goal is to examine changes in periodic alpha power, and control for potential confounds due to changes in peak frequency and/or aperiodic activity. To do so, they employ a novel spectral parametrization method, and systematically compare measures of isolated periodic alpha activity to conventional measures. Overall, they find that they can replicate the age-related decrease of total alpha power when using conventional methods. However, when explicitly measuring and controlling for aperiodic activity, they find that periodic alpha activity actually increases with age. They suggest this discrepancy can be explained by changes in aperiodic activity, as the aperiodic slope and intercept are found to systematically change across age, in a way that likely drives the finding decrease of total alpha power, while the periodic alpha power actually increases. There are also some follow up analyses, including relating alpha power to anatomical measures of the thalamus, and to performance on an attention task. 

      Strengths of this investigation include that it analyzes multiple, large datasets with well motivated methods. I think the goal of this paper addresses an important question, in terms of seeking to clarify some basic patterns of oscillation changes across development, and doing so in a rigorous way, both in terms of employing methods that are robust to estimating different features of the data, and in terms of using multiple, large datasets, including an internal replication of the main findings. I find the main goal and analysis compelling in terms of examining how alpha activity changes across this age range. 

      I also find some limitations to some aspects of this paper and analysis that could be improved, as they do not always clearly describe the context or support the claims that are made for some of the follow-up analyses, as described in the following. 

      1. Framing and prior literature 

      I find some limitations in the organizing of this paper and it's relationship to prior work that could be improved, as I find that the paper could do better situating the analyses here with prior work, in particular in relation to the methodological issues it is addressing, and prior work on aperiodic activity. 

      For example, in the abstract it is stated that "simulations in this study show that conventional measures of alpha power are confounded". Despite this statement, simulations are not a core feature of this study. There are a couple simulated examples in the supplement, which are referred to in lines 89-95, however it's worth nothing noting that while this section does not include any citations, the described issues, and related simulations, are very similar to points that have been made previously in the literature, that seem like they should be cited here: <br /> - Donoghue, T., Dominguez, J., & Voytek, B. (2020). Electrophysiological Frequency Band Ratio Measures Conflate Periodic and Aperiodic Neural Activity. ENeuro, 7(6), ENEURO.0192-20.2020. https://doi.org/10.1523/ENEURO.0192-20.2020 <br /> - Donoghue, T., Schaworonkow, N., & Voytek, B. (2021). Methodological considerations for studying neural oscillations. European Journal of Neuroscience, ejn.15361. https://doi.org/10.1111/ejn.15361 

      The paper also understates previous work on aperiodic activity, and the degree to which it is known to vary with age, in line 116-117 stating "there is insufficient evidence for the reported significant association between age and aperiodic signal components". This seems to ignore the large number of studies that have replicated this finding, including (some non-exhaustive examples): <br /> - Thuwal, K., Banerjee, A., & Roy, D. (2021). Aperiodic and Periodic Components of Ongoing Oscillatory Brain Dynamics Link Distinct Functional Aspects of Cognition across Adult Lifespan. Eneuro, 8(5), ENEURO.0224-21.2021. https://doi.org/10.1523/ENEURO.0224-21.2021 <br /> - Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-Related Changes in 1/f Neural Electrophysiological Noise. Journal of Neuroscience, 35(38), 13257-13265. https://doi.org/10.1523/JNEUROSCI.2332-14.2015 <br /> Perhaps this claim is supposed to more specifically reflect the age-range analyzed here, in which case recent studies examining this (in relatively large datasets) are also not mentioned here, including, for example: <br /> - Donoghue, T., Dominguez, J., & Voytek, B. (2020). Electrophysiological Frequency Band Ratio Measures Conflate Periodic and Aperiodic Neural Activity. ENeuro, 7(6), ENEURO.0192-20.2020. https://doi.org/10.1523/ENEURO.0192-20.2020 <br /> - Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/j.dcn.2022.101076 

      The notes above do not undermine the utility of examining alpha oscillations in detail, but I think the specific contribution of this work could be better contextualized in terms of other existing work. In the introduction, for example, the following review is an important piece of work that could be cited when introducing aperiodic activity: <br /> - He, B. J. (2014). Scale-free brain activity: Past, present, and future. Trends in Cognitive Sciences, 18(9), 480-487. https://doi.org/10.1016/j.tics.2014.04.003 

      2. Model quality control 

      A limitation to the methods employed in this study is a lack of description of if and how model fit quality was evaluated. For the method of parametrizing neural power spectra that is employed, it is important to validate that models fit the data well, otherwise the estimated parameters may be unreliable. This is especially important in developmental and clinical data, as analyzed here, as this data can be quite noisy, and differences in levels of noise across ages or between clinical groups could plausibly lead to differences in model fit quality. Useful quality checks for this kind of analysis would be to report the average r-squared (or model error) for the parametrized data, and to examine whether model fit quality is significantly related to age, or clinical status. 

      Note that there is also a detailed guide for how best to apply spectral parametrization to developmental datasets, including notes on quality control, that may be useful: <br /> - Ostlund, B., Donoghue, T., Anaya, B., Gunther, K. E., Karalunas, S. L., Voytek, B., & Pérez-Edgar, K. E. (2022). Spectral parameterization for studying neurodevelopment: How and why. Developmental Cognitive Neuroscience, 54, 101073. https://doi.org/10.1016/j.dcn.2022.101073 

      Not reporting any quality control metrics of the model fits also deviates from the analysis of the validation dataset as described in the pre-registered analysis (https://osf.io/7uwy2), which includes the note that the plan is for data to be excluded from the analysis if there is a bad model fit (R-squared < 0.9). It is unclear from the manuscript if this was done at all - and if so, why it was not described, and if not, why this deviates from the pre-registration. Note that though examining and reporting model fit quality is important, it is unclear where the value of 0.9 in the pre-registration came from, and it is unclear if this is an appropriate threshold for these specific datasets. 

      3. The analysis of the relationship between the aperiodic intercept and aperiodic exponent 

      There is an analysis in this paper that attempts to evaluate whether the change in aperiodic intercept that is observed is more than expected due to the measured change in aperiodic exponent. The approach taken for this analysis is ill-posed, and the interpretations made of this analysis are not supported. The issue is that the degree to which the intercept changes due to a change in exponent depend on the rotation frequency, which is not acknowledged or addressed in the analysis employed here. 

      For example, for spectra rotated at 0 Hz, there is no measured change in offset from a change in exponent, whereas for a rotation at 100 Hz, there is a large influence of exponent on the change in offset, with different degrees of impact in between. The results of this analysis are therefore heavily influenced by the rotation frequency that is used. The analysis by the authors uses a rotation frequency of 19 Hz, however, there is no justification provided for this value. It is noted as being the middle point of the analyzed range, however, this itself is unrelated to whether it is an appropriate rotation frequency (since which frequency the spectrum rotates at is unrelated to the experimenter's decision of which frequency range to analyze). 

      In real data, we don't a priori know what the rotation frequency point is, and in general it need not be a single, consistent point, and between subjects, is difficult to measure. To get a sense of what it might be, anecdotally, we can see in Figure 2C that in this particular subset, the rotation point is not at 19 Hz, and appears to be at a higher frequency. If the rotation point is actually higher than 19 Hz, then the analysis employed will systematically under-estimate the impact of the measured exponent change - leading to the conclusion that intercept is changing over and above the influence of the exponent. However, this conclusion is only valid if the rotation point of 19 Hz is accurate, and we would likely arrive at a different conclusion by picking a different rotation point. This analysis, by itself, is therefore invalid. Such an analysis would require a clear motivation of having measured the correct rotation frequency to be interpretable. 

      4. Flanker Analysis 

      Also relating to organization (similar to point 1) it is unclear why the analysis of the Flanker task, which is alluded to in the abstract, is only mentioned in the Discussion section. Given that this appears to be a key analysis, it is unclear why it is not presented in detail in the Results. The Flanker task and analysis is also not described in much detail in the methods. An issue with the Flanker analysis only being mentioned in the Discussion, with a link to supplemental table, is that the details of the results are somewhat obfuscated from the reader. When looking at these results, two key features seem notable - the first that though it is significant effect of aperiodic-adjusted alpha power, the beta value is very small (many times smaller than the coefficients for age and gender), and second, that although it doesn't quite pass significance, the estimated beta value for the total alpha power has the same magnitude as for the individualized alpha power. Between these two features, it is not clear if the relationship between aperiodic-adjusted alpha power and the Flanker performance is of sufficient magnitude to interpret that alpha power is related to attentional performance, and it's not clear that aperiodic-adjusted alpha power is more related to attentional performance than total alpha power (since a difference in significance does not necessarily imply a significant difference in the parameters). I think this analyses, as presented, therefore does not clearly support the claim made in the abstract that alpha power is found to relate to improved attentional performance.

    1. Discussion, revision and decision


      Author response


      To: Adam Marcus, co-founder Retraction Watch & Alison Abritis, PhD, researcher at Retraction Watch

      Major Problems: I found serious deficits in both for this article, and thus I have serious concerns as to the usefulness of this article. Therefore, I have not proceeded in a line-by-line, as I consider the overall problems to be grave enough to require attention and revision before getting to lesser items of clarity.

      I would like to point out that the authors show a marvelous attention to their work, and they have much to contribute to the field of retraction studies, and I do honestly look forward to their future work. However, in order for the field to move ahead with accuracy and validity, we must no longer just rely on superficial number crunching, and must start including the complexities of publishing in our analyses, as difficult and labor-intensive as it might be.

      We do not consider that our article presents serious problems nor that it would be useless.

      It is possible that a different view on the subject, some tendency to forbearance (understandable) for the difficult life of the publishing industry, along with some difficulties in understanding the ideas presented in the article, may have led to a series of points of view that we would like to comment on below.

      We would first like to thank the reviewers for their comments, some of which will allow us to improve and nuance, using objective elements, the analysis of this bumpy field represented by the ecosystem of retracted publications. Because we have based our study on data from freely accessible sources of information, we will not insist too much on commenting on this issue.

      The authors stated that they used the search protocol (and therefore presumably the same dataset) as described in Toma & Padureanu, 2021, and do not indicate any process to compensate for its weaknesses. In the referenced study, the authors (same as for this article) utilized a PubMed search using only “Retracted Publication” in Publication Type. This search method is immediately insufficient, as some retracted articles are not bannered or indexed as retracted in PubMed. This issue is well-understood among scholars who search databases for retractions, and by now one would expect that these searches would strive to be more comprehensive.

      A better method, if one insists on restricting the search to PubMed, would have been to use Publication Type to search for “retracted publication,” and then to search for “retraction of publication,” and to compare the output to eliminate duplications. There are even more comprehensive ways to search PubMed, especially since some articles are retitled as “Withdrawn” – Elsevier, for example, uses the term instead of “Retracted” for papers removed within a year of their publication date – but do not come in searches for either publication type. Even better would have been to use databases with more comprehensive indexing of retractions.

      In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.

      Thank you very much for the suggestions on the search strategy. We do not consider that the use of "Retracted Publication [PT]" should be compensated in any way but, if it should be compensated, we wouldn't want to add "Retraction of publication". We consider that using a search protocol more specific to systematic reviews is not very useful in our case: data are added/updated continuously (sometimes late), incorrect indexing can be corrected, the number of retracted articles increases from month to month; the same strategy can give different results at different times regardless of its complexity. Putting extra effort into detecting problematic articles without knowing the benefit but expecting it only highlights issues that can be improved at the publisher/editor(content delivery) and database level(indexing).

      The dataset analyzed is a snapshot of a particular time interval and nothing more. Even during the analysis we found, in the case of one publisher, the addition of details to the initially incomplete retraction notes. Hence the need for follow-up studies. Therefore in the case of retractions, unlike the reviewer, we prefer an approach based on simple and easily reproducible strategies, widely accessible sources of information, and several steps. The first step in this strategy is the "number crunching" stage which includes this article.

      1. The authors are using the time from publication to retraction based on the notice dates and using them to indicate efficacy of oversight by publishers. However, this approach is seriously problematic. It takes no notice of when the publisher was first informed that the article was potentially compromised. Publishers who respond rapidly to information that affects years/decades old publications will inevitably show worse scores than those who are advised upon an article’s faults immediately upon its publication, but who drag their heels a few months in dealing with the problem.

      Indeed, the article uses the time between publication and retraction(exposure time – ET) as one of the SDTP score components for assessing editorial/publisher performance. Data on when a publisher or editor has been informed of problems with an article, apart from being relatively rare, is not a substitute for a retraction note. Moreover, the use of such information may induce a risk of bias.

      We mention in the article the need to use reporting standards for retraction notes, and one element that might be useful is, indeed, the date on which the publisher or editor was informed of problems with an article. Unfortunately, as the author of this review knows very well, information precedes investigation; the retraction note contains (or should contain) much more data than the initial information about the quality problems of an article.

      Our article aims to suggest a score for measuring publication performance in the context of retracted articles that would also allow an assessment of the dynamics of the activity of correcting the scientific record and, more importantly, how publishers engage in post-publication quality control. ET is only one component of this score.

      It is quite clear from the data presented in the article that a publisher/journal that emphasizes systematic back-checking will have an increasingly longer average lifespan of retracted articles, logically higher than one that does not do this type of checking. We don't see precisely where the reviewer thinks there is a problem: once the checking is done, the ET will decrease, and a publisher that takes concrete steps to correct the literature will ultimately have a better reputation. This does not mean that a higher ET is laudable, it suggests that there is a post-publication quality control but also that the peer review process has let problematic articles through and that the control of these articles has been carried out late. This is an argument for more active involvement of publishers (as potential generators of editorial policies) in post-publication control.

      Second, there is little consistency in dealing with retractions between publishers, within the same publishers or even within the same journal. Under the same publisher, one journal editor may be highly responsive during their term, while the next editor may not be. Most problems with articles quite often are first addressed by contacting the authors and/or journal editors, and publishers – especially those with hundreds of journals – may not have any idea of the ensuing problem for weeks or months, if at all. Therefore, the larger publishers would be far more likely to show worse scores than publishers with few journals to manage oversight.

      It is exactly this inconsistency that we highlight in the article. Differing policies, attitudes, and responsiveness does not mean that a publisher cannot/should not ask questions about the effectiveness of internal processes and resources used for post-publication quality control or the implementation of uniform measures across journals in its portfolio.

      Third, the dates on retraction notices are not always representative of when an article was watermarked or otherwise indicated as retracted. Elsevier journals often overwrite the html page of the original article with the retraction notice, leaving the original article’s date of publication alone. A separate retraction notice may not be published until days, weeks or even years after the article has been retracted. Springer and Sage have done this as well, as have other publishers – though not to the same extent (yet).

      Historically, The Journal of Biological Chemistry would publish a retraction notice and link it immediately to the original article, but a check of the article’s PDF would show it having been retracted days to weeks earlier. They have recently been acquired by Elsevier, so it is unknown how this trend will play out. And keep in mind, in some ways this is in itself not a bad thing – as it gives the user quicker notice that an article is unsuitable for citation, even while the notice itself is still undergoing revisions. It just makes tracking the time of publication to retraction especially difficult.

      We used the same date for all articles in our study (the one listed in PubMed), thus ensuring a uniform criterion for all publishers. If this date was not in PubMed we used the date from the retraction notes on the journal website but this was for a small number of articles. How different publishers handle retraction processes or the delay with which these are published is primarily related to internal editorial procedures, and these delays are reflected in the ET. In our experience, most articles retracted by Elsevier are available online, supplemented, and not replaced by retraction notes, which we think is an excellent policy.

      1. As best as can be determined, the authors are taking the notices at face value, and that has been repeatedly shown to be flawed. Many notices are written as a cooperative effort between the authors and journal, regardless of who initiated the retraction and under the looming specter of potential litigation.

      Shown to be flawed by who? Indeed, in our study, we refer to the retraction notes published by the journals. The fact that they are incomplete or formulated under the threat of litigation only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction. The way the retraction note is worded should be an editorial prerogative and should primarily aim at correcting scientific literature, not at appeasing egos, careers, or financial interests.

      Trying to establish who initiated a retraction process strictly by analyzing the notice language is destined to produce faulty conclusions. Looking just at PubPeer comments, questions about the data quality may be raised days/month/years before a retraction, with indications of having contacted the journal or publisher. And yet, an ensuing notice may be that the authors requested the retraction because of concerns about the data/image – where the backstory clearly shows that impetus for the retraction was prompted by a journal’s investigation of outside complaints. As an example, the recent glut of retractions of papers coming from paper mills often suggest the authors are requesting the retraction. This interpretation would be false, however, as those familiar with the backstory are aware that the driving force for many of these retractions were independent investigators contacting the journals/publishers for retraction of these manuscripts.

      Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed. The retraction notes represent the material available to a researcher doing documentation on a particular topic. The clarity and information contained in the note is the editor's or publisher’s responsibility, reflecting their performance and concern for the integrity of the science. Interpretation of a retraction note/analyzing an article occurs in this context. Not everyone has time for further investigation or to search third-party sites for information that is, with a notable exception, the result of a selection bias.

      Assigning the reason for retraction from only the text of the notice will absolutely skew results. As already stated, in many cases, journal editors and authors work together to produce the language. Thus, the notice may convey an innocuous but unquestionable cause (e.g., results not reproducible) because the fundamental reason (e.g., data/image was fabricated or falsified) is too difficult to prove to a reasonable degree. Even the use of the word “plagiarism” is triggering for authors’ reputations – and notices have been crafted to avoid any suggestion of such, with euphemisms that steer well clear of the “p” word. Furthermore, it has been well-documented that some retractions required by institutional findings of misconduct have used language in the notice indicating simple error or other innocuous reasons as the definitive cause.

      We understand your point of view and the situations presented may be accurate. However, from our point of view, the only valid reference remains the retraction note published on the journal's website. The existence of wording difficulties and various other problems that may arise are more likely to do with a tendency of the reviewer to make excuses for journals reluctant to indicate precisely what the reasons for retracting the article are. There are plenty of retraction notes in which the images with problems (including whether they were plagiarized, reused, manipulated, fabricated, etc.) are indicated with great precision, there are equally plenty of notes in which the word plagiarism is used without hesitation, indicating the sources, how they were informed, what was plagiarized. No matter how many hesitant publishers/editors there are, it should not be forgotten that there are many journals/publishers who take their role seriously, acknowledge and learn from their mistakes, thus providing a real service to the scientific community.

      The authors also discuss changes in the quality of notices increasing or decreasing in publishers – but without knowing the backstory. Having more words in a notice or giving one or two specific causes cannot in itself be an indicator of the quality (i.e., accuracy) of said notice.

      "Knowing the backstory" is not part of our objectives, and neither is assessing the quality of the retraction notes. This is also very difficult to do due to the lack of an accepted standard format. We are trying to propose a score composed of several parameters resulting from existing (or non-existing) data in the retraction notes so that we can have a picture of retractions at publisher level. Knowing the backstory is not relevant, reading and interpreting the official retraction note is relevant.

      1. The authors tend to infer that the lack of a retraction in a journal implies a degree of superiority over journals with retractions. Although they qualify it a bit ( “Are over 90% of journals without a retracted article perfect? It is a question that is quite difficult to answer at this time, but we believe that the opinion that, in reality, there are many more articles that should be retracted (Oransky et al. 2021) is justified and covered by the actual figures.”), the inference is naive. First, they have not looked at the number of corrections within these journals. Even ignoring that these corrections may be disproportionate within different journals and require responsive editorial staff, some journals have gone through what can only be called great contortions to issue corrections rather than retractions.

      We believe that this is a case of reviewer confusion generated either by the insufficiently precise wording of the text or a lack of understanding of our study objectives. We are trying to point out that more than 90% of the journals in the NLM catalogue-PubMed subset have not retracted a single article. We are not trying to say that journals without retracted articles are superior to the others. As explained in the article, we referred to retraction notes, not corrections.

      Second, the lack of retractions in a journal speaks nothing to the quality of the articles therein. Predatory journals generally avoid issuing retractions, even when presented with outright proof of data fabrication or plagiarism. Meanwhile, high-quality journals are likely to have more, and possibly more astute, readers, who could be more adept at spotting errors that require retraction.

      Of course, the quality level of articles in a journal is not determined by the number of articles removed.

      Third, smaller publishers/journals may not have the fiscal resources to deal with the issues that come with a retraction. As an example, even though there was an institutional investigation finding data fabrication, at least one journal declined to issue a retraction for an article by Joachim Boldt (who has more than 160 retractions for misconduct) after his attorneys made threats of litigation.

      Threats of lawsuits are instead a failure of a publisher/journal to adapt to the realities of the publishing business or to the risk of misconduct. This is something that needs to change.

      Simply put, the presence or lack of a retraction in a journal is no longer a reasonable speculation about the quality of the manuscripts or the efficiency of the editorial process.

      We have not attempted to suggest this, we have only analyzed the retracted articles and their associated retraction notes. On the other hand, the way a journal/publisher handles the retraction of problematic articles still reflects, to some extent, the quality/performance of the editorial processes.

      1. I am concerned that the authors appear to have made significant errors in their analysis of publishers. For example, they claim that neither PLOS nor Elsevier retracted papers in 2020 for problematic images. That assertion is demonstrably false.

      This is wrong. In our dataset, there are eleven PLOS articles related to human health with the publication year 2019 and 2020. None of these have images as retraction reasons.

      Regarding the 21 Elsevier articles published in 2020, there is nothing in the retraction notes to indicate that the article was retracted because of the images. In 2 retraction notes there is mention of the comments made by Dr. Bik (The Tadpole Paper Mill - Science Integrity Digest) but the text of these (retraction notes) stops at the authors' inability to provide the raw data underlying the article.

      Our study is based only on the content of the retraction notes published and assumed by the journal, not on opinions/comments appearing on other sites, which, for unknown/unmentioned reasons, are not officially assumed in the retraction note. Therefore, we consider the statement in the review to be questionable at best, as the use of material other than the retraction notes has severe implications for the internal and external validity of the study and the suggestion to use such methods is, in our opinion, wrong. We would also like to draw attention to the fact that many retraction notes are explicitely mentioning the request to provide raw images and the authors' inability to provide them.

      Anyway, as far as images are concerned, our article suggested that there are publishers which seem to adopt image analysis technologies faster than others. The numbers are not really relevant in this case but the trend is: it describes the publishing activity complexity better than the numbers.

      Reviewer response

      We appreciate the authors’ zeal in standing by their work.

      In regard to the deficits in the search process, the author states, “We do not consider that the use of ‘Retracted Publication [PT]’ should be compensated in any way but, if it should be compensated, we wouldn't want to add ‘Retraction of publication’”

      There is a lack of appreciation for the complexities of indexing retracted materials in an indexing site such as PubMed. To have a comprehensive search, one should not be choosing to use either “Retracted Publication [PT]” OR “Retraction of Publication [PT].” One would use both, and then filter out the duplicates, because some retractions are indexed by retraction notices, some only have “Retracted” added to the indexed title and the publication type changed to “Retracted Publication.” Use of only one or the other guarantees that the search is far less comprehensive than it should be.

      The authors state, “In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.”

      There is at least one database (http://retractiondatabase.org) that has a far more comprehensive indexing of retractions and is publicly available for use.

      In Item 3, where it is pointed out that retraction notices themselves are inaccurate and cannot be taken at face value as to the reason behind the retraction, the authors responded, “Shown to be flawed by who?” — By an article cited in the manuscript:

      Fang, Ferric C.; Steen, R. Grant; Casadevall, Arturo (2012): Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America 109 (42), pp. 17028–17033. DOI: 10.1073/pnas.1212247109.

      “To understand the reasons for retraction, we consulted reports from the Office of Research Integrity and other published resources (7, 8), in addition to the retraction announcements in scientific journals. Use of these additional sources of information resulted in the reclassification of 118 of 742 (15.9%) retractions in an earlier study (4) from error to fraud.” Followed by “These factors have contributed to the systematic underestimation of the role of misconduct and the overestimation of the role of error in retractions (3, 4), and speak to the need for uniform standards regarding retraction notices (5).”

      The authors then choose to state that it is the “editorial prerogative” – and that when notices “are incomplete or formulated under the threat of litigation [it] only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction.”

      Following our attempt to explain why understanding the real reason behind a retraction is important to study the publication of notices, the authors respond: “Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed.”

      First, yes, we do understand the study. We read a lot of these. Second, the “third-party websites” we prefer include the Office of Research Integrity and the Retraction Watch blog, where background investigations into the causes of retraction notices are described. If the authors are challenging the reference to PubPeer, keep in mind that journals initiate investigations based on comments on that website, and have taken to citing the website in their notices.

      Had the authors not chosen to categorize the reasons for retraction, their reasoning may have had more support – but they did, and in doing so, by just using the notice with no further review, their findings address only the notice itself, with no context.

      We recommend that the manuscript be substantially revised with strong attention to the comments we made in our original review.

    2. Discussion, revision and decision


      Author response


      To: Adam Marcus, co-founder Retraction Watch & Alison Abritis, PhD, researcher at Retraction Watch

      Major Problems: I found serious deficits in both for this article, and thus I have serious concerns as to the usefulness of this article. Therefore, I have not proceeded in a line-by-line, as I consider the overall problems to be grave enough to require attention and revision before getting to lesser items of clarity.

      I would like to point out that the authors show a marvelous attention to their work, and they have much to contribute to the field of retraction studies, and I do honestly look forward to their future work. However, in order for the field to move ahead with accuracy and validity, we must no longer just rely on superficial number crunching, and must start including the complexities of publishing in our analyses, as difficult and labor-intensive as it might be.

      We do not consider that our article presents serious problems nor that it would be useless.

      It is possible that a different view on the subject, some tendency to forbearance (understandable) for the difficult life of the publishing industry, along with some difficulties in understanding the ideas presented in the article, may have led to a series of points of view that we would like to comment on below.

      We would first like to thank the reviewers for their comments, some of which will allow us to improve and nuance, using objective elements, the analysis of this bumpy field represented by the ecosystem of retracted publications. Because we have based our study on data from freely accessible sources of information, we will not insist too much on commenting on this issue.

      The authors stated that they used the search protocol (and therefore presumably the same dataset) as described in Toma & Padureanu, 2021, and do not indicate any process to compensate for its weaknesses. In the referenced study, the authors (same as for this article) utilized a PubMed search using only “Retracted Publication” in Publication Type. This search method is immediately insufficient, as some retracted articles are not bannered or indexed as retracted in PubMed. This issue is well-understood among scholars who search databases for retractions, and by now one would expect that these searches would strive to be more comprehensive.

      A better method, if one insists on restricting the search to PubMed, would have been to use Publication Type to search for “retracted publication,” and then to search for “retraction of publication,” and to compare the output to eliminate duplications. There are even more comprehensive ways to search PubMed, especially since some articles are retitled as “Withdrawn” – Elsevier, for example, uses the term instead of “Retracted” for papers removed within a year of their publication date – but do not come in searches for either publication type. Even better would have been to use databases with more comprehensive indexing of retractions.

      In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.

      Thank you very much for the suggestions on the search strategy. We do not consider that the use of "Retracted Publication [PT]" should be compensated in any way but, if it should be compensated, we wouldn't want to add "Retraction of publication". We consider that using a search protocol more specific to systematic reviews is not very useful in our case: data are added/updated continuously (sometimes late), incorrect indexing can be corrected, the number of retracted articles increases from month to month; the same strategy can give different results at different times regardless of its complexity. Putting extra effort into detecting problematic articles without knowing the benefit but expecting it only highlights issues that can be improved at the publisher/editor(content delivery) and database level(indexing).

      The dataset analyzed is a snapshot of a particular time interval and nothing more. Even during the analysis we found, in the case of one publisher, the addition of details to the initially incomplete retraction notes. Hence the need for follow-up studies. Therefore in the case of retractions, unlike the reviewer, we prefer an approach based on simple and easily reproducible strategies, widely accessible sources of information, and several steps. The first step in this strategy is the "number crunching" stage which includes this article.

      1. The authors are using the time from publication to retraction based on the notice dates and using them to indicate efficacy of oversight by publishers. However, this approach is seriously problematic. It takes no notice of when the publisher was first informed that the article was potentially compromised. Publishers who respond rapidly to information that affects years/decades old publications will inevitably show worse scores than those who are advised upon an article’s faults immediately upon its publication, but who drag their heels a few months in dealing with the problem.

      Indeed, the article uses the time between publication and retraction(exposure time – ET) as one of the SDTP score components for assessing editorial/publisher performance. Data on when a publisher or editor has been informed of problems with an article, apart from being relatively rare, is not a substitute for a retraction note. Moreover, the use of such information may induce a risk of bias.

      We mention in the article the need to use reporting standards for retraction notes, and one element that might be useful is, indeed, the date on which the publisher or editor was informed of problems with an article. Unfortunately, as the author of this review knows very well, information precedes investigation; the retraction note contains (or should contain) much more data than the initial information about the quality problems of an article.

      Our article aims to suggest a score for measuring publication performance in the context of retracted articles that would also allow an assessment of the dynamics of the activity of correcting the scientific record and, more importantly, how publishers engage in post-publication quality control. ET is only one component of this score.

      It is quite clear from the data presented in the article that a publisher/journal that emphasizes systematic back-checking will have an increasingly longer average lifespan of retracted articles, logically higher than one that does not do this type of checking. We don't see precisely where the reviewer thinks there is a problem: once the checking is done, the ET will decrease, and a publisher that takes concrete steps to correct the literature will ultimately have a better reputation. This does not mean that a higher ET is laudable, it suggests that there is a post-publication quality control but also that the peer review process has let problematic articles through and that the control of these articles has been carried out late. This is an argument for more active involvement of publishers (as potential generators of editorial policies) in post-publication control.

      Second, there is little consistency in dealing with retractions between publishers, within the same publishers or even within the same journal. Under the same publisher, one journal editor may be highly responsive during their term, while the next editor may not be. Most problems with articles quite often are first addressed by contacting the authors and/or journal editors, and publishers – especially those with hundreds of journals – may not have any idea of the ensuing problem for weeks or months, if at all. Therefore, the larger publishers would be far more likely to show worse scores than publishers with few journals to manage oversight.

      It is exactly this inconsistency that we highlight in the article. Differing policies, attitudes, and responsiveness does not mean that a publisher cannot/should not ask questions about the effectiveness of internal processes and resources used for post-publication quality control or the implementation of uniform measures across journals in its portfolio.

      Third, the dates on retraction notices are not always representative of when an article was watermarked or otherwise indicated as retracted. Elsevier journals often overwrite the html page of the original article with the retraction notice, leaving the original article’s date of publication alone. A separate retraction notice may not be published until days, weeks or even years after the article has been retracted. Springer and Sage have done this as well, as have other publishers – though not to the same extent (yet).

      Historically, The Journal of Biological Chemistry would publish a retraction notice and link it immediately to the original article, but a check of the article’s PDF would show it having been retracted days to weeks earlier. They have recently been acquired by Elsevier, so it is unknown how this trend will play out. And keep in mind, in some ways this is in itself not a bad thing – as it gives the user quicker notice that an article is unsuitable for citation, even while the notice itself is still undergoing revisions. It just makes tracking the time of publication to retraction especially difficult.

      We used the same date for all articles in our study (the one listed in PubMed), thus ensuring a uniform criterion for all publishers. If this date was not in PubMed we used the date from the retraction notes on the journal website but this was for a small number of articles. How different publishers handle retraction processes or the delay with which these are published is primarily related to internal editorial procedures, and these delays are reflected in the ET. In our experience, most articles retracted by Elsevier are available online, supplemented, and not replaced by retraction notes, which we think is an excellent policy.

      1. As best as can be determined, the authors are taking the notices at face value, and that has been repeatedly shown to be flawed. Many notices are written as a cooperative effort between the authors and journal, regardless of who initiated the retraction and under the looming specter of potential litigation.

      Shown to be flawed by who? Indeed, in our study, we refer to the retraction notes published by the journals. The fact that they are incomplete or formulated under the threat of litigation only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction. The way the retraction note is worded should be an editorial prerogative and should primarily aim at correcting scientific literature, not at appeasing egos, careers, or financial interests.

      Trying to establish who initiated a retraction process strictly by analyzing the notice language is destined to produce faulty conclusions. Looking just at PubPeer comments, questions about the data quality may be raised days/month/years before a retraction, with indications of having contacted the journal or publisher. And yet, an ensuing notice may be that the authors requested the retraction because of concerns about the data/image – where the backstory clearly shows that impetus for the retraction was prompted by a journal’s investigation of outside complaints. As an example, the recent glut of retractions of papers coming from paper mills often suggest the authors are requesting the retraction. This interpretation would be false, however, as those familiar with the backstory are aware that the driving force for many of these retractions were independent investigators contacting the journals/publishers for retraction of these manuscripts.

      Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed. The retraction notes represent the material available to a researcher doing documentation on a particular topic. The clarity and information contained in the note is the editor's or publisher’s responsibility, reflecting their performance and concern for the integrity of the science. Interpretation of a retraction note/analyzing an article occurs in this context. Not everyone has time for further investigation or to search third-party sites for information that is, with a notable exception, the result of a selection bias.

      Assigning the reason for retraction from only the text of the notice will absolutely skew results. As already stated, in many cases, journal editors and authors work together to produce the language. Thus, the notice may convey an innocuous but unquestionable cause (e.g., results not reproducible) because the fundamental reason (e.g., data/image was fabricated or falsified) is too difficult to prove to a reasonable degree. Even the use of the word “plagiarism” is triggering for authors’ reputations – and notices have been crafted to avoid any suggestion of such, with euphemisms that steer well clear of the “p” word. Furthermore, it has been well-documented that some retractions required by institutional findings of misconduct have used language in the notice indicating simple error or other innocuous reasons as the definitive cause.

      We understand your point of view and the situations presented may be accurate. However, from our point of view, the only valid reference remains the retraction note published on the journal's website. The existence of wording difficulties and various other problems that may arise are more likely to do with a tendency of the reviewer to make excuses for journals reluctant to indicate precisely what the reasons for retracting the article are. There are plenty of retraction notes in which the images with problems (including whether they were plagiarized, reused, manipulated, fabricated, etc.) are indicated with great precision, there are equally plenty of notes in which the word plagiarism is used without hesitation, indicating the sources, how they were informed, what was plagiarized. No matter how many hesitant publishers/editors there are, it should not be forgotten that there are many journals/publishers who take their role seriously, acknowledge and learn from their mistakes, thus providing a real service to the scientific community.

      The authors also discuss changes in the quality of notices increasing or decreasing in publishers – but without knowing the backstory. Having more words in a notice or giving one or two specific causes cannot in itself be an indicator of the quality (i.e., accuracy) of said notice.

      "Knowing the backstory" is not part of our objectives, and neither is assessing the quality of the retraction notes. This is also very difficult to do due to the lack of an accepted standard format. We are trying to propose a score composed of several parameters resulting from existing (or non-existing) data in the retraction notes so that we can have a picture of retractions at publisher level. Knowing the backstory is not relevant, reading and interpreting the official retraction note is relevant.

      1. The authors tend to infer that the lack of a retraction in a journal implies a degree of superiority over journals with retractions. Although they qualify it a bit ( “Are over 90% of journals without a retracted article perfect? It is a question that is quite difficult to answer at this time, but we believe that the opinion that, in reality, there are many more articles that should be retracted (Oransky et al. 2021) is justified and covered by the actual figures.”), the inference is naive. First, they have not looked at the number of corrections within these journals. Even ignoring that these corrections may be disproportionate within different journals and require responsive editorial staff, some journals have gone through what can only be called great contortions to issue corrections rather than retractions.

      We believe that this is a case of reviewer confusion generated either by the insufficiently precise wording of the text or a lack of understanding of our study objectives. We are trying to point out that more than 90% of the journals in the NLM catalogue-PubMed subset have not retracted a single article. We are not trying to say that journals without retracted articles are superior to the others. As explained in the article, we referred to retraction notes, not corrections.

      Second, the lack of retractions in a journal speaks nothing to the quality of the articles therein. Predatory journals generally avoid issuing retractions, even when presented with outright proof of data fabrication or plagiarism. Meanwhile, high-quality journals are likely to have more, and possibly more astute, readers, who could be more adept at spotting errors that require retraction.

      Of course, the quality level of articles in a journal is not determined by the number of articles removed.

      Third, smaller publishers/journals may not have the fiscal resources to deal with the issues that come with a retraction. As an example, even though there was an institutional investigation finding data fabrication, at least one journal declined to issue a retraction for an article by Joachim Boldt (who has more than 160 retractions for misconduct) after his attorneys made threats of litigation.

      Threats of lawsuits are instead a failure of a publisher/journal to adapt to the realities of the publishing business or to the risk of misconduct. This is something that needs to change.

      Simply put, the presence or lack of a retraction in a journal is no longer a reasonable speculation about the quality of the manuscripts or the efficiency of the editorial process.

      We have not attempted to suggest this, we have only analyzed the retracted articles and their associated retraction notes. On the other hand, the way a journal/publisher handles the retraction of problematic articles still reflects, to some extent, the quality/performance of the editorial processes.

      1. I am concerned that the authors appear to have made significant errors in their analysis of publishers. For example, they claim that neither PLOS nor Elsevier retracted papers in 2020 for problematic images. That assertion is demonstrably false.

      This is wrong. In our dataset, there are eleven PLOS articles related to human health with the publication year 2019 and 2020. None of these have images as retraction reasons.

      Regarding the 21 Elsevier articles published in 2020, there is nothing in the retraction notes to indicate that the article was retracted because of the images. In 2 retraction notes there is mention of the comments made by Dr. Bik (The Tadpole Paper Mill - Science Integrity Digest) but the text of these (retraction notes) stops at the authors' inability to provide the raw data underlying the article.

      Our study is based only on the content of the retraction notes published and assumed by the journal, not on opinions/comments appearing on other sites, which, for unknown/unmentioned reasons, are not officially assumed in the retraction note. Therefore, we consider the statement in the review to be questionable at best, as the use of material other than the retraction notes has severe implications for the internal and external validity of the study and the suggestion to use such methods is, in our opinion, wrong. We would also like to draw attention to the fact that many retraction notes are explicitely mentioning the request to provide raw images and the authors' inability to provide them.

      Anyway, as far as images are concerned, our article suggested that there are publishers which seem to adopt image analysis technologies faster than others. The numbers are not really relevant in this case but the trend is: it describes the publishing activity complexity better than the numbers.

      Reviewer response

      We appreciate the authors’ zeal in standing by their work.

      In regard to the deficits in the search process, the author states, “We do not consider that the use of ‘Retracted Publication [PT]’ should be compensated in any way but, if it should be compensated, we wouldn't want to add ‘Retraction of publication’”

      There is a lack of appreciation for the complexities of indexing retracted materials in an indexing site such as PubMed. To have a comprehensive search, one should not be choosing to use either “Retracted Publication [PT]” OR “Retraction of Publication [PT].” One would use both, and then filter out the duplicates, because some retractions are indexed by retraction notices, some only have “Retracted” added to the indexed title and the publication type changed to “Retracted Publication.” Use of only one or the other guarantees that the search is far less comprehensive than it should be.

      The authors state, “In an ideal world, if any effort were to be made, it would be aimed at better indexing and managing existing databases, not at generating query strategies to make up for their shortcomings.”

      There is at least one database (http://retractiondatabase.org) that has a far more comprehensive indexing of retractions and is publicly available for use.

      In Item 3, where it is pointed out that retraction notices themselves are inaccurate and cannot be taken at face value as to the reason behind the retraction, the authors responded, “Shown to be flawed by who?” — By an article cited in the manuscript:

      Fang, Ferric C.; Steen, R. Grant; Casadevall, Arturo (2012): Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America 109 (42), pp. 17028–17033. DOI: 10.1073/pnas.1212247109.

      “To understand the reasons for retraction, we consulted reports from the Office of Research Integrity and other published resources (7, 8), in addition to the retraction announcements in scientific journals. Use of these additional sources of information resulted in the reclassification of 118 of 742 (15.9%) retractions in an earlier study (4) from error to fraud.” Followed by “These factors have contributed to the systematic underestimation of the role of misconduct and the overestimation of the role of error in retractions (3, 4), and speak to the need for uniform standards regarding retraction notices (5).”

      The authors then choose to state that it is the “editorial prerogative” – and that when notices “are incomplete or formulated under the threat of litigation [it] only supports our view that publishers and editors need to make a more significant effort to correct the biomedical literature, including avoiding litigation when the retraction note clearly describes the reasons for retraction.”

      Following our attempt to explain why understanding the real reason behind a retraction is important to study the publication of notices, the authors respond: “Once again, the author of this review does not seem to fully understand our study, apparently favouring information published on third-party websites over that the journals officially assumed.”

      First, yes, we do understand the study. We read a lot of these. Second, the “third-party websites” we prefer include the Office of Research Integrity and the Retraction Watch blog, where background investigations into the causes of retraction notices are described. If the authors are challenging the reference to PubPeer, keep in mind that journals initiate investigations based on comments on that website, and have taken to citing the website in their notices.

      Had the authors not chosen to categorize the reasons for retraction, their reasoning may have had more support – but they did, and in doing so, by just using the notice with no further review, their findings address only the notice itself, with no context.

      We recommend that the manuscript be substantially revised with strong attention to the comments we made in our original review.

    1. Author Response

      Reviewer #1 (Public Review):

      Liu et al investigated the role of Wnt/β-catenin pathway in the genesis of thermogenic adipocytes. Their study shows that some adipocytes exhibited Wnt/β-catenin signaling ("Wnt+ adipocytes") in intrascapular brown adipose tissue (iBAT), inguinal white adipose tissue (iWAT), epidydimal WAT (eWAT), and bone marrow (BM). There was a different level of the possession of Wnt+ adipocytes between the different depots with iBAT expressing 17%, iWAT expressing 6.9%, and eWAT expressing the least at 1.3%. Expression of these adipocytes was noted on embryonic day 17.5 and was present in a higher percentage in female mice compared to male mice and in younger mice compared to older mice, which aligns with their observation that Wnt+ adipocytes are thermogenic.

      The authors also noted that Wnt+ adipocytes can differentiate from human stromal cells. In regards to the pathway, Wnt/β-catenin adipocytes are distinct from classical brown adipocytes at molecular and genomic levels. It was noted that Tcf7L2 was largely expressed in Wnt+ adipocytes but other Tcf proteins (Tcf 1, Tcf 3, and Lef1) were not. Wnt- cells showed a reversible delay in maturation with LF3, however, no cell death was noted. Wnt/β-catenin adipocytes seem to depend on AKT/mTOR signaling. It was further shown that insulin is a key factor in mTOR signaling and Wnt+ adipocyte differentiation.

      Upon cold exposure, UCP1+/Wnt- beige fat emerges largely surrounding Wnt+ adipocytes, implicating that Wnt+ adipocytes serve as a "beiging initiator" in a paracrine manner. Lastly, mice with implanted Wnt+ adipocytes had a significantly better glucose tolerance which suggests that Wnt+ adipocytes have a beneficial impact on whole-body metabolism. I found no major flaws in the method and data largely supports their conclusion that Wnt+ adipocytes have (at least some) a significant role in thermogenesis/metabolism, which I think is a very impressive and innovative finding.

      Thanks so much for the outstanding summary of our manuscript. We feel sorry that we somehow did not make it clear in the original manuscript that the percentage of Wnt+ adipocytes is higher in male mice than that in females.

      Reviewer #2 (Public Review):

      Liu et al present evidence for the surprising finding of Tcf/Lef-active, "Wnt+" mature adipocytes. They report that Wnt+ adipocytes arise during embryogenesis and regulate cold-induced beiging in surrounding adipocytes. Tcf/Lef transcriptional activity in these cells is Wnt-ligand independent and instead appears to be stimulated by insulin-dependent AKT/mTOR signaling. Using a diphtheria toxin inducible depletion mouse model, the authors show that Wnt+ cells play an important role in glucose homeostasis.

      As the authors have acknowledged, proper assignment of adipocyte nuclei is a notoriously difficult histological challenge. Mesenchymal cells sit directly adjacent to the adipocyte plasma membrane and their nuclei are often incorrectly assigned to the adipocyte both in vivo and in vitro. Pparg nuclear co-staining is helpful, however, Pparg is very highly expressed by endothelial cells and Col15a1+ committed preadipocytes, which are intercalated throughout the adipose. The authors have made an impressive attempt to address this concern by generating a Tcf/Lef-CreER mouse line to fluorescently label Wnt+ adipocytes, however, it is not entirely clear if the images presented support the conclusion that mature adipocytes are being labeled. Given that Wnt+ mature adipocytes are the core conclusion of this manuscript, and because this hypothesis runs counter to a large body of literature concluding that Wnt signaling inhibits adipogenesis, the authors have assumed a very high burden of proof that these are indeed Wnt+ mature adipocytes in vivo.

      Thanks for the outstanding summary of our manuscript.

      To address these concerns, the authors could utilize the specificity of in vivo single-nuclei RNA-Seq. Several data resources have been published (https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue), and the authors should re-analyze these data for subpopulations of mature adipocytes that express a transcriptional signature of active Tcf/Lef signaling. It is unfortunate that the authors were unable to successfully perform single-nuclei analysis of the Wnt+ adipocytes as this would significantly enhance this manuscript. The physiologic relevance of the single-cell analysis of immortalized, in-vitro differentiated clonal cell lines is questionable.

      We took the advice by Reviewer 2 and intersected our scRNA-seq data on Wnt+ adipocytes with the published single-nucleus sequencing (sNuc-seq) dataset of mouse iWAT (Emont et al., 2022). Because the activation of Tcf/Lef signaling in the Wnt+ adipocytes is relied on AKT/mTOR signaling but not the conventional Wnt ligands and receptors, those traditional downstream markers of Wnt signaling such Axins were not found specifically enriched in the Wnt+ adipocytes. Therefore, the AKT/mTOR-dependent Wnt signaling in Wnt+ adipocytes appears to regulate expression of genes distinct from that controlled by the conventional Wnt signaling pathway. This conclusion is supported by our recent studies that inhibition of this AKT/mTOR-dependent Wnt signaling by LF3 in Wnt+ adipocytes negatively impact pathways implicated in “PI3K/Akt signaling”, “insulin signaling”, “thermogenesis”, and “fatty acid metabolism” et al (see below for details). However, we found that one cluster (mAd3) of sNuc-seq dataset, which is relatively enriched in Tcf7l2, expresses remarked high levels of Cyp2e1 as well as Cfd that encodes Adipsin. These genes, regarded as hallmark of mAd3 cluster, are also uniquely or highly expressed in Wnt+ adipocytes. Interestingly, the percentage of mAd3 among the total iWAT adipocytes in chow-fed male group is about 5%, which is very close to that of Wnt+ adipocytes in vivo (~7%). Thus, mAd3 possibly represents Wnt+ adipocytes in iWAT. These analyses are included in the revision.

      Reviewer #3 (Public Review):

      It is becoming increasingly clear that adipocytes are not homogenous, but rather comprise several distinct subtypes with specific physiological functions. The mechanisms that underlie the development and distinct roles of each adipocyte subtype are of great interest for understanding the biology of metabolic regulation and its impairments in metabolic disease. In this manuscript, the authors describe a previously unknown population of adipocytes in mice, which are characterized by a special form of beta-catenin signaling. They perform a comprehensive series of experiments in cultured cells, in mouse models of in-vivo lineage tracing, and transplantation experiments to define the origin and function of these adipocytes. They find that the formation of these Wnt+ adipocytes is dependent on insulin signaling, and find possible roles in thermogenic adipose tissue development. Overall, the conclusions of this study are very convincing in their identification of a subpopulation of adipocytes displaying non-canonical Wnt signaling. The proposed role of these adipocytes as regulators of thermogenesis is more ambiguous, and their physiological function remains unclear.

      Thanks for the good comments. To distinguish this AKT/mTOR dependent intracellular Wnt signaling in Wnt+ adipocytes from the conventional non-canonical Wnt signaling, we feel that it would be appropriate to call this signaling as atypical Wnt signaling.

      • The new adipocyte types are identified through expression of a reporter for TCF/Lef signaling. This reporter is classically activated by Wnt/beta-catenin and using both siRNA depletion of beta-catenin as well as an allele lacking its transcriptional activation domain, the authors confirm the reporter expression is dependent on the presence of beta-catenin and TCF7L2, but independent of canonical Wnt signaling.

      • The involvement of TCF7L2 is also probed using a specific inhibitor of the beta-catenin/TCF7L2 interactions, LF3, which inhibited reporter expression. Inhibition of canonical Wnt signaling was without effect.

      • The authors isolate clonal lines of precursor cells that give rise to Wnt+ or Wnt- adipocytes from mouse brown adipose tissue. They find that Wnt+ adipocytes are dependent on the Wnt pathway, as inhibition by LF3 induces cell death.

      • To further probe the nature of Wnt+ and Wnt- adipocytes, the authors perform scRNASeq on cells after 7 days of adipose induction and find 2 distinctive cell populations. The finding of 2 distinct populations is expected, given the a priori separation of cells as a function of GFP expression. It is not clear why scRNASeq was chosen over RNASeq on the population, since the fat content of adipocytes may preclude full characterization of the most differentiated cells.

      With scRNA-seq, it would be more convincing to identify specific subpopulation of cells, as adipocytes are well known to be heterogenous.

      Overall, this experiment is less informative on the mechanisms by which Wnt+ adipocytes display Wnt signaling dependency for viability, and what their functional role might be.

      Yes, these are major questions to be addressed in our future studies.

      • The non-canonical nature of Wnt signaling in Wnt+ adipocytes prompted the authors to explore the role of the insulin/PI3K/AKT/MTOR pathway. They find enhanced basal activity of this pathway in Wnt+ adipocytes. It was not explored whether this enhanced activity persists under insulin stimulation; this is relevant as feedback mechanisms within the signaling pathway may result in lower signaling under stimulated conditions.

      • To test the relevance of insulin signaling in-vivo on non-canonical Wnt signaling in adipocytes the authors use the Akita mouse, which lacks the insulin-2 gene and find a marked decrease in reporter activity, confirming the requirement for insulin signaling for expression of this non-canonical Wnt pathway.

      • To determine the functional role of Wnt+ adipocytes, the authors explore their relationship to mitochondrial respiratory activity and thermogenesis. They perform experiments to monitor mitochondrial membrane potential and oxygen consumption rate and find higher overall O2 consumption, and lower membrane potential in adipocyte populations vicinal to Wnt+ adipocytes. Overall these results are not fully convincing: The traces are highly variable from cell to cell, and rigorous quantification of uncoupled respiration is limited by the small number of cell lines analyzed; only one cell line of Wnt- and two Wnt+ adipocytes are analyzed. In situ differences in membrane potential would be more convincing if performed on homogenous collections of Wnt- and Wnt+ adipocytes to better understand stochastic variance.

      Thanks for the suggestions. Actually, the results of mitochondrial membrane potential assay on mixed adipocyte culture gave us the initial hint of the potential paracrine effect of Wnt+ adipocytes.

      • To determine the role of Wnt+ adipocytes in-vivo thermogenesis, the authors expose mice to cold temperature and monitor the proportion of UCP1+ adipocytes in relation to Wnt signaling. They find a proportion of Wnt+ adipocytes expressing UCP1. Whether this proportion is higher or lower than that of Wnt- adipocytes is not quantified, so it is unclear whether Wnt+ adipocytes preferentially develop beige characteristics. The authors find that UCP1+, Wnt- adipocytes are topologically close to Wnt+ adipocytes, and hypothesize a paracrine signaling role. However, this correlation may be explained by known topological biases in inguinal fat pad beiging, where adipocytes closer to lymph node preferentially induce UCP1. The Wnt+ adipocyte population may coincidentally be present in this region.

      As shown in Figure 5-figure supplement 1E, while all Wnt+ adipocytes were co-stained with UCP1, the percentage of Wnt+ adipocytes did not increase after cold challenge. As shown in Figure 5-figure supplement 1C, the initial beiging response is closely associated with Wnt+ adipocytes, but not topological bias.

      • To functionally determine the role of Wnt+ adipocytes in thermogenesis, the authors ablate the Wnt+ lineage through expression of diphtheria toxin using a Fabp4-Flox-DTA mouse crossed to Tcf/Lef-CreERT2 mice. Less than 50% of these mice displayed impaired thermogenesis upon cold exposure. The authors interpret this finding to signify a partial role for Wnt+ adipocyte beiging in thermogenic regulation. This conclusion is not fully supported, as Fabp4 is expressed in many cells other than adipocytes, and therefore the phenotype of the affected mice is not unambiguously attributable to loss of Wnt+ adipocytes. An additional concern is that diphtheria toxin-induced cell death will lead to tissue inflammation, with potential functional effects on thermogenesis. The degree of cell death and inflammation should be measured and reported.

      While Fabp4 is expressed in some SVFs, the Fabp4-Flox-DTA allele is not activated by Tcf/Lef-CreERT2 allele, as T/L-GFP reporter is not seen in freshly isolated SVFs of iWAT (Figure 2-figure supplement 1A). To avoid potential side effects of DTA-induced cell death on adipose tissues, we compounded the Tcf/Lef-rtTA allele with TRE-Cre and floxed Pparg alleles (PpargF/F) to prevent the differentiation of Wnt+ adipocytes. These new results are included in the revision as supplemental results (Figure 5-figure supplement 2G).

      • The finding that Akita mice lack Wnt+ adipocytes was used to determine whether these mice are susceptible to cold-induced challenges. The authors report a decrease in cold-induced UCP1 expression in these mice. This conclusion, derived from a single immunofluorescence image, is not fully convincing in the absence of additional metrics.

      Additional analyses are included in the revision, as Figure 5-figure supplement 3.

      • To further explore the role of Wnt+ adipocytes in systemic metabolism, the authors conduct implantation studies of Wnt+ adipocytes and measure effects on glucose tolerance. They show a significant difference in glucose excursions in mice harboring fat pads developed from Wnt+ adipocytes. These results are convincing, but the conclusion may be due to enhanced volume of additional functional fat developing from Wnt+ adipocytes.

      In this experiment, unbiased mBaSVF adipocytes were used in parallel as control.

    1. Author Response

      Reviewer #2 (Public Review):

      1. The manuscript seems to claim that the study shows that S4 is the voltage sensor and S4 moves in KCNQ2. This has been repeated in Abstract, Introduction and Results. However, by this time S4 movements as a voltage sensor are well accepted mechanisms. The importance of the work is actually that it defines parameters of the VSD movement in KCNQ2 such as the stretch of S4 in and out of the membrane, and the relationship between VSD activation and pore opening. These points should be brought out as the rationale and significance of this work, rather than the well-known S4 function.

      We thank Reviewer# 2 for this important comment that was also brought up by Reviewer# 3. We apologize for over emphasizing that the 4th TM segment is the voltage sensor and that the S4 moves in KCNQ2 channels. This might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. We are very happy to learn that this is now a well-accepted mechanism.

      In the revised version, we now state:

      Abstract: “Here, we define parameters of voltage sensor movements in wt-KCNQ2 and channels bearing epilepsy-causing mutations using cysteine accessibility and voltage clamp fluorometry (VCF).”

      Introduction: “Similar to that seen in other Kv channels, the fourth transmembrane segment contains several highly conserved positively charged amino acid residues that move in response to changes in membrane voltages that functions as the voltage sensor(25-28)[…]Although these studies provided insight into S4 rearrangements, they did not define parameters of S4 movement, such as the dynamic relationship between S4 activation and pore opening during voltage-controlled gating of KCNQ2 channels.

      Results: We deleted: “Collectively, these close correlations in time (Figure 3) and voltage dependence (Figure 2C) of fluorescence and current suggest that the environmental changes around labeled F192C at the outer end of S4 rendered fluorescence signals that seem to report on S4 motion associated with the opening and closing of the channel gate.”

      And simply state: “The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43)”

      We also rewrote in its entirety the subsection: “Disease-causing mutations differentially affect S4 and gate domains” (Pages 10-11).

      1. The closeness of fluorescence and current traces and FV and GV curves led to the conclusion that the movement of a single VSD could trigger channel opening. The rationale for connecting the experimental observations to this conclusion needs to be well explained when the conclusion is first made. References that have made similar arguments such as Osteen et al PNAS 2010; Westhoff et al PNAS 2019 should be cited. In addition, as the authors recognized in Discussion, the same observations can also lead to an alternative conclusion such that the movements of four VSDs highly cooperative to all activate and then open the pore. However, this alternative mechanism is not mentioned until at the end of the manuscript, while "the movement of a single VSD opening the pore" is firmly claimed in Abstract and Results. Some justifications need to be provided for this.

      Thank you for this important observation, the wording we used was clumsy. Since we removed the kinetic model (Figure 6 in the original manuscript), we have also deleted any sentences that discuss concerted or independent S4 movement in the Abstract and Result sections. We only discussed that these alternatives, concerted or independent S4 movement, might explain our VCF data which shows that both the steady-state voltage dependence of S4 transitions and the kinetics closely follow those of ionic currents. Both references – Osteen et al PNAS 2010 and Westhoff et al PNAS 2019 have also been added – as recommended by the reviewer and apologize for overlooking these references in the original manuscript.

      1. An explanation is needed for how same the covalent MTS modification of N190C at two voltages resulted in different GV relations (Fig 1E).

      Thank you for pointing out this important point. We have spent a good deal of time since we received the reviews answering this important point that was also raised as a concern by Revewer# 1. To that end, we have included additional data that support the idea that N190C channels are accessible in both the open and closed states. This is now clearly addressed in Recommendations for the Authors, first Specific Suggestions from Reviewer #1. See above Response to the first Specific suggestions from Reviewer# 1 on Pages 2-5.

      In the original submission, we only used the protocols shown old Figure 1. We applied MTSET only at +20-mV for the open state and – 80-mV for the closed state. We used – 100-mV and – 120 mV for the closed state of A193C and S199C, respectively, because compared to the wt channels, these cysteine mutants shifted the GV relationship to negative voltages.

      In the revised version, to further strengthen our conclusions, we have used a new protocol: For each cysteine mutant, we have designed a protocol in which we first apply MTSET at hyperpolarized voltages (closed) before switching to depolarized voltages (open) on the same cell, in a pairwise manner.

      This is now described in the Result subsection “State-dependent external S4 modifications consistent with S4 as voltage sensor”, Pages 6-8 of the revised manuscript and new Figure 1 and Figure 1-figures supplement 3 and 4.

      We also apologize for the lack of clarity in citing reference 40 in the original submission. This reference is deleted in the revised version, in light of our new data on N190C (new Figure 1 and Figure 1-figures supplement 3 and 4), which strengthen our claims that N190C modification occurs in in both states (open and closed).

      1. The model in Fig 6F raises several concerns including vertical transitions having the rates of VSD activation and detailed balance is violated.

      The reviewer raises an important concern in our original Figure 6F (model). Based on the Editors and reviewers comments, we have removed Figure 6 from the original manuscript to eliminate any of potential misunderstanding about the data presented. In future studies, we will gather additional fluorescence and current data using different protocols and dimer constructs to provide a more in depth description of KCNQ2 gating.

      1. Discussion. The argument of no intermediate open state based on K/Rb permeability ratio assumes that the pore properties such as ion selection and permeability of KCNQ2 are the same as that of KCNQ1. The evidence for this assumption is not provided or discussed. On the other hand, some evidence suggests that the VSD of KCNQ2 may activate in two steps. For instance, the time course of VSD activation can be fitted with two exponentials, and the fluorescence increases after a plateau at voltages > 0 mV in FV curves (Fig 2C). How these results affect the conclusion should be discussed.

      We agree with the reviewer that the claim of a lack of an intermediate open state in KCNQ2 channels based on the Rb/K data provided in the original submission assumed that the pore properties of KCNQ2 are the same as those seen in KCNQ1 channels. Since we did not show sufficient experimental evidence to prove this point, we have removed Figure 6 (the model) from the revised manuscript. In the future, we will provide more evidence to build stronger support for the potential existence of intermediate and active open states in KCNQ2 channels. As such, we have removed the model shown in the original manuscript. Future studies will be performed to refine the KCNQ2 model, including the use of mutations that can lock the S4 in the intermediate or activated states in KCNQ2, as has been performed in the KCNQ1 channel by Zaydman et al; PMID: 25535795). These experiments will provide more conclusive results regarding the different S4 states.

      We have now re-analyzed the data and concluded that while the time course of the fluorescence appeared to have multiple exponentials, our fluorescence data lacked sufficient resolution to reliably estimate the first (fast) component. This might be because of the low signal-to-noise ratio of our VCF or/and because the filtering might have limited the tau-on from the optical signal (shown to be 20 ms in Figure 3C of the original submission).

      As suggested by reviewers # 3, we have removed the kinetics comparison of fluorescence and current in the revised version of Figure 3, and simply state: …” There is a close correlation between the time course of fluorescence signals and ionic currents at all the voltages tested (Figure 3B, D). The close correlations in time (Figure 3) and voltage dependences (Figure 2G) of S4 motion (fluorescence) and activation gate (ionic current) resemble those observed for homologous KCNQ1 (without KCNE1)(42) and KCNQ3 channels(41, 43).”

      As for the last part of the reviewer comments, the apparent increase in fluorescence after a plateau at voltages > 0mV has now also been revised. We have attempted new VCF at voltages more positive than + 40 mV to probe if a putative second fluorescence component after the plateau phase develops or if it is just artifacts of the experimental system. To get reliably fluorescence signals, we need a huge expression of labeled KCNQ2* channels (often producing currents larger than 100uA). It is considerably more difficult to properly clamp these high expressing cells, especially at extreme voltages. This experimental limitation makes it challenging to draw conclusions about the occurrence of a second fluorescent component. It may be possible to perform the cut—open technique coupled with VCF in order to shed light on this issue, but these experiments would require significant upgrade of the set up that we currently do not have this in place.

      Reviewer #3 (Public Review):

      1. I am convinced that the fluorescence signals reflect the voltage sensor conformation in the system. The authors focus quite a lot of attention on demonstrating that the fluorescence signals are not an experimental artifact, which is fine.

      We thank Reviewer# 3 for this observation. We apologize for over emphasizing that the fluorescence signals reflect the voltage sensor conformation in the system. As state above in response to a similar comment from Reviewer #1, this might be the result of the author’s past struggle to convince earlier reviewers that the fluorescence signals at a given position are not an experimental artifact, but S4 moving during channel opening. This has been amended in the revised version.

      However, I feel the authors could be more cautious in terms of describing how the mutations or dye conjugation may alter some of the gating properties. A place where this may be very important is in the description or characterization of activation kinetics as lacking sigmoidicity, which is part of the argument that these channels may open with only a fraction of voltage sensors activated. This may be correct in the modified (dye-conjugated) channel recordings, but many other recordings of unmodified channels (Figure 1) or WT KCNQ2 or 3 channels exhibit some sigmoidicity. I wonder if this difference may arise because the dye labeling may prevent complete VSD deactivation or interfere with gating in some other way. I would also add that this comment isn't meant to diminish the importance of the findings, I just think it would be wise to qualify some of the description of data with these possible caveats.

      We thank the reviewer for this suggestion, which we believe improves the flow and description of data considering all possible limitations. The reviewer is right. The mutation F192C on its own accelerates the kinetics of activation and causes a leftward shift in the GV curve of KCNQ2 channels. Moreover, labeling F192C with either fluorophore further shifts the GV towards negative potentials.

      In the revised version, we have rewritten the Result subsection ‘Tracking S4 movement of KCNQ2 channels using voltage-clamp fluorometry (VCF)’ almost in its entirety. In this subsection, we now bring to the forefront the changes associated with the measurement of gating properties caused by the mutations or dye conjugation that we agree helps with data interpretation. We made a direct comparison of voltage dependence and kinetics between wt, unlabeled KCNQ2-F192C, and labeled-KCNQ2F192C channels (new Figures 2 and Figure 2-figure supplement 1).

      These differences are also discussed on Pages 12-13 of the revised manuscript. See also below response to Recommendations for the authors:

      1. A brief aside on this point is that a lack of sigmoidicity does not necessarily imply a single transition required for opening - it can also arise if there is a rate-limiting step during a sequence of pre-open transitions.

      Thanks -good point-. We will keep this possibility in mind for future studies where the model will be developed.

      1. The generation of a quantitative model is a useful application of the data. It was not clear to me whether there was a benefit to using multiple-exponential components to fit the fluorescence signals and generate a more complex model. This may add complexity where it may not be necessary, as it is not clear whether the fluorescence signals require multiple components for an adequate fit.

      Thank you for your comment. We agree with the reviewer that our model is underdeveloped and needs additional VCF data to better describe KCNQ2 gating. Based on all three reviewers concerns and as suggested by the Reviewing editor in his summary, we removed the kinetic model from this manuscript and will work to refine this model in our future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper address the "origins and drivers of Neotropical diversity." The Neotropics have high diversity of plants and animals relative to other global regions. There are also many hotspots of global biodiversity (species richness) within the Neotropics.

      This paper aggregates 150 time-calibrated phylogenies from different groups of plants and animals that occur predominantly in the Neotropics. They analyze the diversification dynamics of these clades over time primarily using the method of Morlon et al. (2011; PNAS) as implemented in RPANDA (Morlon et al. 2016). The authors find that most clades have constant rates of speciation and extinction over time.

      Thank you for having reviewed our study and for your feedback.

      The strength of the paper is that it aggregates many previously published phylogenies of Neotropical organisms. However, it is unclear whether the method used gives meaningful inferences about diversification dynamics over time (e.g. Burin et al. 2019; Syst. Biol.). Therefore, the overall contribution of the study is somewhat questionable.

      This is a legitimate comment, and we understand the skepticism on a study that relies on macroevolutionary models of questionable robustness (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Crisp & Cook 2009 - Evolution; Quental & Marshall 2010 - TREE; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution).

      The methodology used here has been thoroughly tested with both simulations (e.g. Morlon et al. 2011 - PNAS; Lewitus & Morlon 2018 - Syst. Biol.; Condamine et al. 2019 - Ecol. Lett.) and empirical cases (e.g. Lewitus et al. 2018 - Nat. Ecol. Evol.; Condamine et al. 2019 - Ecol. Lett.). We cannot deny that such a methodology is fully free from issues, which affect all birth-death models, and brings the question: are we able to reliably infer the diversification model and identify parameter values of this model (Louca & Pennell 2020 - Nature)? These concerns are not likely to be resolved in the short term. Although many studies are making progress in understanding the behavior of diversification rate functions, showing, for example, that equally likely diversification functions (i.e. the congruent parameter space of Louca & Pennell 2020 - Nature) can share common features, with diversification rate patterns being robust despite non-identifiability (Höhna et al., 2022 - bioRxiv; Morlon et al., 2022 - TREE).

      Being aware of these concerns, we also relied on the recently developed Pulled Diversification Rates method (Louca & Pennell 2020 – Nature; Louca et al., 2018 - PNAS) that is supposed to correct for the identifiability issue raised by recent studies. Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification. Our empirical study is also one of the first to perform such a large-scale methodological comparison in diversification analyses (pulled vs. traditional birth-death models) while addressing a key question in evolutionary biology. We have now emphasized this point in the conclusions of our study: “To the extent possible, these results are based on traditional diversification rates, and on the recently developed Pulled Diversification Rates method that is supposed to correct for the identifiability issue raised by recent studies associated with traditional diversification rates (71). Hence, applying both traditional and pulled birth-death models to all phylogenies, we have shown a good consistency in the inferred models, which suggests that our study can provide meaningful estimates of diversification”.

      The design of the study is also somewhat problematic. There is no comparison to other regions outside the Neotropics, so the study cannot address why the Neotropics are so diverse relative to other continental regions. Similarly, within the Neotropics, the authors do not find significant differences in diversification rates or dynamics among regions. As far as I can tell, they do not attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics (and presumably they would find no significant patterns if they did).

      We agree with this remark. We are sorry for this confusion. Our study does not aim at addressing why the Neotropics are more diverse than other regions in the world. We simply wanted to establish that the Neotropics are the richest region in the world based on previous studies, and that we are interested in understanding what are the patterns/drivers behind such a diversity. In the Introduction, we state that such diversity is not evenly distributed within the Neotropics, and that some regions are richer (e.g. Andes) than others (e.g. southern cone of South America). Diversity models, from Stebbins (1974), have long been proposed to explain this unbalanced diversity. Our study has then defined different bioregions within the Neotropics in which we have looked for differences in diversification patterns. In other words, we do “attempt to relate patterns of diversification to patterns of species richness among regions within the Neotropics”, although we were not able to explain the observed differences in species richness by differences in diversification dynamics (i.e. diversification dynamics are similar across regions). Please, see our response to the essential revision point 1 addressing this comment.

      In the revised version, we have changed the title of the study as: “Diversification dynamics of plants and tetrapods in the Neotropics through time, clades and biogeographic regions”. We hope you will find this new title better fits the content of the article. In addition, to avoid any confusion in light of your comment, we have deleted the following sentence from the introduction: “But such an assessment is required to understand the origin of Neotropical diversity and why the Neotropics are more diverse than other regions in the world”.

      The authors set up their study by claiming that most previous attempts to explain Neotropical diversity relied on two evolutionary models: cradles vs. museums of diversity. The justification cited for this thinking comes mostly from papers from the last century or before. I do not think that this represents the cutting edge of modern thinking about this topic. Many researchers moved on from this dichotomy long ago.

      Thank you for this interesting comment. You are right. The cradle and museum models of diversity are indeed old definitions (Stebbins 1974 - Flowering Plants: Evolution Above the Species Level), but they were convenient to formulate clear and testable hypotheses on the processes underlying the observed patterns of diversity that Stebbins described. We agree that Stebbins’ view is likely outdated, and that is why we took advantage of these models to draw a series of hypotheses relying on evolutionary processes, which has been argued as a “cutting edge of modern thinking about this topic” (Vasconcelos et al. 2022 - Am. Nat.). In the revised version, we have extended the explanation for our rationale to rely on Stebbins’ models and propose process-based hypotheses to explain diversity patterns. We also cite Vasconcelos et al. (2022 - Am. Nat.). We have modified the introduction as follows: “Although the concepts of cradle and museum have contributed to stimulate numerous macroevolutionary studies, a major interest is now focused on the evolutionary processes at play rather than the diversity patterns themselves (23). Four alternative evolutionary trajectories of diversity dynamics could be hypothesized to explain the Neotropical diversity observed today: …”.

      However, we will argue as well that some contemporary studies still rely on the cradle and museum framework to frame their studies, for example: McKenna et al. (2006 - PNAS), Couvreur et al. (2011 - BMC Biol.), Condamine et al. 2012 (BMC Evol. Biol.), Moreau & Bell (2013 - Evolution), Dornburg et al. (2017 - Nat. Ecol. Evol.). A search in Google Scholar with "Neotropic AND cradle AND diversif*" returns 1,700 results since 2010. That is why we would like to emphasize that this framework should be abandoned, because it does not rely on evolutionary processes and does not consider the full spectrum of hypotheses explaining Neotropical diversity. In the revised version, we have qualified our assertion that most studies are based on these models, which we agree is not entirely true. We have modified the corresponding paragraph as follows: “Attempts to explain Neotropical diversity traditionally relied on two evolutionary models. In the first, tropical regions are described as a “cradle of diversity”, [...] Although not mutually exclusive (15), the cradle vs. museum hypotheses primarily assume evolutionary scenarios in which diversity expands through time without limits (16). However, expanding diversity models may be limited in their ability to explain the entirety of the diversification phenomenon in the Neotropics. For example, expanding diversity models cannot explain the occurrence of ancient and species-poor lineages in the Neotropics (17–19) or the decline of diversity observed in the Neotropical fossil record (20–22). Although the concepts of cradle and museum have contributed to stimulate many macroevolutionary studies, the major interest is now focused on the evolutionary processes at play rather than the diversity pattern (23)”. We hope you will find this new paragraph better represents current thinking in the field.

      There are potentially interesting differences in the diversification dynamics of plants and animals, but this depends on whether we can believe the inferences of the diversification dynamics or not.

      Thank you for pointing this out. We understand the concern because of the general (not new) skepticism on macroevolutionary models (e.g. Kubo & Iwasa 1995 - Evolution; Rabosky & Lovette 2008 - Evolution; Burin et al. 2019 - Syst. Biol.; Louca & Pennell 2020 - Nature; Pannetier et al. 2021 - Evolution). Unfortunately, the study of PDR did not help to confirm/reject this particular conclusion.

      We thus remain cautious with our results, and we have acknowledged several caveats that should be kept in mind when interpreting them. Here, the same methodological treatment has been applied to both animals and plants, and yet the results indeed indicate different diversification patterns. In addition, our results remained stable to AIC variations (Figure 5 - figure supplement 1), and regardless of the paleo-temperature curve considered for the analyses. Still, we do not “believe” the inferences made with birth-death models in general are accurate, but as long as these models are applied in a well-defined framework and thoroughly performed with a hypothesis-driven approach, recent studies have shown that one can interpret the results and draw conclusions (Helmstetter et al. 2021 - Syst. Biol.; Morlon et al. 2022 - TREE).

      For this new version of the manuscript, and following the suggestions of reviewer 3, we have conducted new analyses to assess whether the contrasted diversification dynamics found here between plants and tetrapods could be explained by differences in their datasets (i.e. differences in tree size, crown age, or sampling fraction of the phylogenies). We found that the higher proportion of increasing dynamics observed in plants cannot be explained by significant differences in these factors, strengthening our conclusions.

      Reviewer #2 (Public Review):

      In this study, the authors explored the evolution dynamics of Neotropical biodiversity by analyzing a very large data set, 150 phylogenies of seed plants and tetrapods. Furthermore, they compared diversification models with environment-dependent diversification models to seek potential drivers. Lastly, they evaluated the evolutionary scenarios across biogeographic regions and taxonomic groups. They found that most of the clades were supported by the expansion model and fewer were supported by saturation and declining models. The diversity dynamics do not differ across regions but differ substantially across taxa. The data set they compared is impressive and comprehensive, and the analysis is rigorous. The results broadened our understanding of the evolutionary history of the Neotropical biodiversity which is the richest in the world. It will attract broad interest to evolutionary biologists as well as the public interested in biodiversity.

      Thank you very much for your review and the positive input.

      Reviewer #3 (Public Review):

      This manuscript seeks to address a series of questions about lineage diversification in the Neotropics. The authors first fit a range of lineage diversification models to over 150 neotropical seed plant and tetrapod phylogenies to characterize diversification dynamics. Their work indicates that a constant diversification model was most frequently the best fit model, while time-, temperature- and Andean uplift-dependent models were far less frequently favored. The authors then attempted to determine whether distinct biogeographic clusters existed by using clade abundance patterns as a proxy for long-term diversification within regions. They found that while clades were widespread across ecoregions, regional assemblages could be binned into five clusters reflecting clade endemism. Finally, they asked whether diversification dynamics of individual lineages varied by parent clade, by environment (temperature through time, and Andean uplift) and by biogeographic region, finding that diversity trajectories best explained by environmental drivers and parent clade identity, while no significant association was detected with biogeographic region. I especially appreciated the detailed model-testing procedure, the inclusion of pulled rates, tests for phylogenetic signal in the results, and the acknowledgment of caveats. By using a massive dataset and, and a battery of cutting-edge analyses, the authors provide new insight into questions that have intrigued biologists for decades.

      Thank you for reviewing our study and for your positive feedback.

      1. The neotropics, as defined here, extends from Tierra del Fuego to Central Florida, rather than from the Tropic of Cancer-Capricorn. I was confused by this broad circumscription, and wondered whether the findings presented here could be biased by the inclusion of these exclusively or primarily extra-tropical regions (such as "elsewhere" and "Chaco+Temperate south America") and lineages.

      Thank you for this comment, which is also in line with the second comment of Reviewer 1. We understand the confusion. The Neotropics, as originally defined by Alfred Wallace, represent a broad region including many types of ecosystems and biomes (not only tropical ones): i.e. the Neotropical realm. It also has a paleobiogeographic significance, as the whole South American continent was isolated for tens of millions of years (Simpson 1983). This definition is well accepted in the field of biogeography and evolutionary biology and we followed it to avoid adding a new definition. A Google Scholar search with keywords “Neotropic AND phylogen AND diversificat*” returns >24,000 hits. Our biogeo-regionalization and clustering results also corroborate the strong connection between South American temperate and tropical biotas: very few clades were restricted or exclusive to a single region, and in most cases, clades comprised species from tropical regions (Cerrado, Caatinga) together with species from the temperate South America zones (Chaco, Temperate South America; Figure 6, Source Data 1).

      That being said, we did not find significant differences in diversification rates (or diversity dynamics) across temperate and tropical regions (indeed, between any region), even if temperate regions were analyzed separately (Figure-6-figure supplement 2), suggesting that our results would have been similar if we had confined the Neotropics to tropical latitudes, as in a more climatic circumscription. Although, if we would have circumscribed the Neotropics to the tropical latitudes, many of the 150 clades would have not been selected. Hence, our study would have less insights into our understanding of the diversification processes explaining the Neotropical biodiversity in the broad sense.

      1. Model categories and clade diversification dynamics were also linked to the size and age of the phylogeny, such that small and young clades tended to exhibit constant diversification, while exponential and declining dynamics were linked to more diverse and older clades. As one of the main conclusions is that seed plant diversification is more frequently characterized by constant diversification (relative to that of tetrapods), I cannot help but wonder if seed plant phylogenies tend to also be younger and less diverse than those of tetrapods. Figure S1 shows distributions an overview of the distribution but lacks a formal, statistical comparison.

      This is a very good point. We agree this comparison is relevant to support our conclusions, but it was missing from our results. We have now compared tree size, crown age and sampling fraction across taxonomic groups, and found that the higher proportion of increasing dynamics, characteristic of plants, cannot be explained by significant differences in these factors. As can be seen in new Figure-2-figure supplement 2 on the manuscript, tree size does not differ among plants, mammals, birds and squamates. Crown age does not differ among plants, mammals and birds. Groups do differ on sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than the phylogenies of other groups. Yet plants show a higher frequency of increasing dynamics than squamates, and other tetrapods (Figure 4). Incomplete taxon sampling has the effect of flattening out lineages-through-time plots towards the present, and thus artificially increasing the detection of diversification slowdowns rather than diversification increases (Cusimano & Renner 2010 – Syst. Biol.).

      We have included this important piece of information in the results “In our dataset, amphibian phylogenies are significantly larger than those of other clades (p < 0.05) (Figure 2 - figure supplement 2). Amphibian and squamate phylogenies are also significantly older (p < 0). Groups also differ in sampling fraction: plant (p < 0.01) and squamate (p < 0) phylogenies are significantly worst sampled than phylogenies of other groups.”; and in the discussion section: “Differences in the phylogenetic composition of the plant and tetrapod datasets do not explain this contrasted pattern. On average, plant phylogenies are not significantly younger or species-poorer than tetrapod phylogenies (Figure 2 - figure supplement 2). Yet, the proportion of clades experiencing increasing dynamics is significantly higher for plants (Figure 4). Plant phylogenies are significantly worst sampled than those of most other tetrapods, though, as explained above, incomplete taxon sampling has the opposite effect: flattening out lineages-through-time plots towards the present (83).”

      1. I wondered whether it was possible to disentangle time-dependent decreasing diversification from decreasing temperature in young trees? I raise this because it appears that (generally speaking) most of the clades have diversified over periods in which temperature has generally been declining.

      This is also a very good point. It is common to observe that two different models are equally likely or close in terms of statistical support. Previously, Condamine et al. (2019 - Ecol. Lett.) reported that the ΔAIC between the best and second-best diversification model was often below the threshold of 2, which is typically chosen to statistically distinguish models (see Fig. 3 and Fig. S5 in Condamine et al. 2019). Simulation analyses confirmed that it was not enough to distinguish the best and second-best models with confidence (see Fig. S6 in Condamine et al. 2019). This applies to any kind of clade.

      However, in the case of time-dependent decreasing diversification and temperature-dependent decreasing diversification, one can further test the effect of past temperatures by smoothing more the temperature curve so that the features of ups and downs are removed. Previously, Condamine et al. (2019 - Ecol. Lett.) found that smoothing strongly decreased the support for temperature-dependent models (Fig. S13a) to the point where it was lost (Fig. S13b), showing that the support for temperature-dependent models was not simply due to a temporal trend in diversification rates potentially unlinked to temperature.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Major comments:

      Are the key conclusions convincing?

      We discuss 4 key conclusions.

      __# 1 __A PRC of the segmentation clock was constructed.

      Although the authors have produced an interesting phase map, the regulation function F(\phi) of the circle map does not give the phase response curve (PRC) (Hoppensteadt & Keener 1982, Guevara & Glass 1982). This holds only when the system is stimulated with very short pulses (ideally Dirac delta), but the experimental pulses here are a quarter of the intrinsic period.

      There are several definitions of the PRC (Dirac pulses PRCs, linear PRCs, etc.). We use the general definition from Izhikievich, 2007: “In contrast to the common folklore, the function PRC (θ) can be measured for an arbitrary stimulus, not necessarily weak or brief. The only caveat is that to measure the new phase of oscillation perturbed by a stimulus, we must wait long enough for transients to subside“.

      The corresponding equation from Izhikievich (section 10.1.3) is

      PRC(θ)= θ_new-θ

      which is equivalent to our Equation 1.

      Hence, the key assumption we make is that after perturbing the system, we are back on the limit cycle as pointed out by Izhikievich. We think this is a reasonable assumption, because the perturbation we impose is relatively weak, despite pulsing for almost one quarter of the intrinsic period. The concentrations of DAPT we used in this current study are just enough to elicit a measurable response, and further lowering the concentration does not result in entrainment within our experiment time (0.5uM, Figure S7B in submitted version of the manuscript). Additionally, we previously reported that periodic pulsing with 2uM DAPT did not result in change of the Notch signaling activity with respect to control samples (Sonnen et al., 2018). Along similar lines, the DAPT drug concentrations we used are much lower compared to what has been used in previous studies aiming to perturb signaling levels, e.g. 100uM and 50uM used in study of segmentation clock in zebrafish embryos (Özbudak and Lewis, 2008 and Liao et al., 2016, respectively), and 25uM used in study of the segmentation clock in mouse PSM cells (Hubaud et al., 2017). Combined, we reason that we apply weak perturbations that allow to extract the PRC of the segmentation clock during entrainment. Additional evidence that indeed we have revealed a meaningful PRC is provided below, please see our response to point #3.

      __# 2 __Furthermore, in eq. 1 T_ext must be the winding number, and the modulus must be in units of

      phase, either one or two pi, for the circle map to be correct. Thus, calling the measured response of the system a PRC is not convincing.

      We thank the reviewer for pointing this out. We indeed rescaled everything to express the PRC in units of phase. We made this more explicit and updated equations throughout the text.

      __# 3 __The system is being entrained. Technically, It would also be easier to get the stroboscopic maps

      in the quasi-periodic regime since all the points in the circle will be sampled. Since no quasi-periodic response was demonstrated, the claim of entrainment is not convincing.

      While, in principle, PRC can be indeed obtained from responses in the “quasi-periodic” regime, such an approach is, in practice, challenging due to the intrinsic noise. The closest approximation to this is the phase response after the first pulse, that we reproduce below and compare to our inferred PRC, where we indeed clearly see a high noise level. Nevertheless, also the PRC based on the 1st pulse is in agreement with the PRC we derived from the entrainment data.

      In the entrained regime, one can get a much more reliable estimate of the phase response despite the noise. The level of noise in the stroboscopic map lowers as the samples approach entrainment (Figure S12), and the entrainment phase itself is a reliable statistical quantity that can be used to infer regions of the PRC as the detuning is varied.

      In addition, and maybe even more importantly, we identify several key features characteristics of entrainment, such as the change of entrainment phase as a function of detuning (Figure 7, Figure S6-S7 in submitted version of the manuscript) and the dependency of the time to entrainment as a function of initial phase (Figure 6). While additional features can be linked, in theory, with entrainment, i.e. period-doubling, higher harmonics (Figure 5), quasi-periodicity, we do not agree with the reviewer that all of these need, or in fact, can be found in the experimental data, in particular because of the influence of the noise. Conversely the positive experimental evidence that we provide for the presence of entrainment, combined with the theoretical framework we develop, justifies, in our view, the conclusions we make.

      __# 4 __The response of the system to external pulses is compatible with a SNIC. This is compatible, but

      it is equally compatible with other explanations. Assuming that the PRC is the same as the regulation function F(\phi), the PRC in Kotani 2012 (PRL 2012 fig. 3C) would be a similar shape as that shown by the authors. Similar models to that in Kotani et al., have been studied, but a SNIC has not been found (an der Heiden & Mackey 1982). It is relatively straightforward to construct a phenomenological model with a SNIC, but having underlying biological insight is not guaranteed. No argument for choosing a SNIC is given, so this emphasis of the paper is not convincing.

      It is true that the mapping of PRCs to oscillators is undetermined, in the sense that many systems could potentially give rise to similar PRCs. That said, there is value in parsimonious models, which often generalize very well despite their simplicity. This explains why in neuroscience, constant sign PRCs are generally associated with SNIC. There is a mathematical reason for this : 1-D oscillators with resetting (such as the quadratic fire-and-integrate model) are the simplest models displaying constant sign PRCs, and are the “normal” form for SNICs. In other words, SNIC bifurcations are among the simplest ones compatible with constant sign PRCs, and we think it is informative to point this out. In our manuscript, we go one step further by actually fitting the experimental PRC with a simple, analytical model that allows us to compute Arnold tongue for any values of the perturbation (contrary to more complex models).

      Other models such as Kotani 2012 can display similar PRC shapes, but they are of mathematically higher complexity, and furthermore it is not clear how such systems might behave when entrained. For instance that model in particular uses delayed differential equations, and as such contains long term couplings, so that a perturbation might have effects over many cycles, which is not consistent with the hypothesis we here make of a relatively rapid return to the limit cycle. Furthermore, for more complex models, PRCs are analytical only in the linear regime, while our model is analytical for all perturbations. That said, we agree that other types of oscillators can be associated with constant sign PRCs, and we have given more details in this part, in particular we better emphasize the Class I vs Class II oscillators as a way to broaden our discussion on PRC, and emphasize the “infinite period” bifurcation category which is more intuitive and further includes saddle node homoclinic bifurcations.

      __# 5 __The work demonstrates coarse graining of complex systems.

      This conclusion is correct, but coarse graining theory-driven analysis and control of dynamical systems has been established for many years. What is new here is that it is applied specifically to the in vitro culture system of the mouse segmentation clock.

      We agree it is new to successfully apply coarse-graining analysis and, importantly, control, to the in vitro culture system of the mouse segmentation clock. We also agree that such an approach has been pioneered and established for many years, especially in (theoretical) physics, but indeed, the key question is whether and how this can be applied to complex biological systems. Insights coming from theoretical considerations on idealized physical systems might not necessarily apply to biology, as already pointed out by Winfree.

      There are still very few examples in biology with coarse graining similar to what we do here. We think there is immense value in demonstrating that quantitative insights, and control of the biological systems, can be obtained without precise knowledge of molecular details, which is still counter-intuitive to many biologists. In this sense, we think our report will be of interest to both colleagues within the field of the segmentation clock and also to anyone interested to in the question, how theory and physics guided approaches can enable novel insight into biological complexity.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Following on the points above, each of these needs to be corrected or re-done, and/or the conclusions need to be modified accordingly.

      We have modified the manuscript in response to all those points.

      # 6 Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. If the authors wish to make the strong claim of determining a true PRC, Dirac delta-like perturbation needs to be applied, or approximated by short time duration pulses compared to the intrinsic period.

      Please refer to our response to point #1 and #3..

      # 7 *Are the suggested experiments realistic in terms of time and resources? It would help if you could *

      add an estimated cost and time investment for substantial experiments.

      It's not clear to this reviewer if it is feasible to deliver a very short pulse and record a response. But this may not be relevant, see above.

      Please refer to our response to point #1 and #3 .

      Are the data and the methods presented in such a way that they can be reproduced?

      Yes.

      Are the experiments adequately replicated and statistical analysis adequate?

      Yes.

      Minor comments:

      Specific experimental issues that are easily addressable.

      No issues.

      Are prior studies referenced appropriately?

      Yes.

      # 8 Are the text and figures clear and accurate?

      Figure 1D illustrates how a PRC should be obtained, but doesn't show the experimental protocol applied in the paper.

      Figure 1D is a general introduction on the phase description of oscillators and phase response. It demonstrates how a perturbation can change the phase and is not supposed to represent the experimental protocol. We describe how data are analyzed and how phases are extracted in Supplementary Note 1.

      __# 9 __In Figure 5B, 10 uM DAPT, the traces are already synchronized before the pulse train starts,

      which makes the subsequent behavior difficult to interpret.

      It appears here that by chance, the samples were already almost synchronized. We notice however that the establishment of a stable rhythm with the pulses (which here is not a multiple of the natural period) supports entrainment, and is already evident when looking at the timeseries with respect to the perturbation. The temporal evolution of the instantaneous period further confirms this, showing a change in period close to ½ zeitgeber period (which is very different from the natural period of ~140 mins). This also relates to point #35, in reply to both comments we have further expanded this figure to better show the 2:1 entrainment, adding statistics on the measured period and period evolution for a zeitgeber period of 300 mins.

      # 10 Do you have suggestions that would help the authors improve the presentation of their data and Conclusions? The text includes several paragraphs reviewing broad principles of coarse graining and making general conclusions. This is confusing, because, as mentioned above, there is no new general advance in this paper. The interesting contributions here are specific to the applications to the segmentation clock, and the text should be focused on this aspect.

      As commented above for #3 , we respectfully disagree that there is no “new general advance” in this paper. It is far from obvious that a complex ensemble of coupled oscillators implicated in embryonic development would be amenable to such coarse-graining theory. Of note, we still do not have a full understanding of neither the core oscillators in individual cells, nor what slows these down and eventually stops the oscillations, and multiple recent works suggest that both phenomena are under transient nonlinear control (e.g. our own work in Lauschke 2013). It is remarkable that despite this lack of detailed mechanistic insight, general entrainment theory can be applied to the segmentation process at the tissue level. We further show that classical entrainment theory alone is not sufficient to account for the experimental findings. Specifically, we need to account for a period change that we interpret as an internal feedback, an insight that would be impossible without our coarse-graining approach. While the results might of course be specific to the segmentation process, we think our approach motivated by coarse-graining theory and leading to new insights into the process is of general interest. We tried to make these points explicit in our conclusion.

      Reviewer #1 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Description of the complex mouse segmentation clock in terms of a simple model and its PRC is an interesting, original and non-trivial result. The proposal that the segmentation clock is close to a SNIC bifurcation provides a consistent dynamical explanation of slowing behavior that has been recognized for some time, but not fully understood. This proposal also raises a hypothesis about the behavior of the underlying molecular regulatory networks, which may be tested in the future. The increase or decrease of the intrinsic period due to the zeitgeber period is not expected from theory, pointing to structures in internal biochemical feedback loops, an idea which again may be tested in the future. Also surprising from a theoretical perspective, the spatial gradient of period in the system persisted after entrainment. Although the categorization of the generic behavior is interesting, by its nature there is little from this that might give a typical developmental biologist any conclusions about pathways or molecules. The successes and limits of the theoretical description do nevertheless focus future attention on interesting behaviors.

      # 11 Place the work in the context of the existing literature (provide references, where appropriate).

      Such an analysis of the segmentation clock is based strongly on the experimental system and results in Sonnen et al., 2018, and goes well beyond it in terms of the dynamical analysis. It provisionally categorizes the mouse segmentation clock as a Class I excitable system, allowing its dynamics at a coarse grained level to be compared to other oscillatory systems. In this aspect of simplification, it is similar to approach of Riedel-Kruse et al., 2007 who used a mean-field model of oscillator coupling to explain the synchrony dynamics observed in the zebrafish segmentation clock in response to blockade of coupling pathways, thereby allowing a high-level comparison to other synchronizing systems.

      It is interesting the reviewer sees similarities with the work of Riedel-Kruse et al, which uses a mean-field variable Z that corresponds to a classical approach, as described in Pikovsky’s textbook, to quantify synchronization of oscillators. In our view, while of course we work in the same context of coupled oscillators in the PSM, our approach based on perturbing and monitoring the system’s PRC in real-time provides a novel strategy to gain insight. This is evidenced by the fact that our quantifications of synchronization and insight into the PRC is the basis to exert precise control of the pace and rhythm of segmentation.

      State what audience might be interested in and influenced by the reported findings.

      Developmental biologists, biophysicists

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Developmental biology, somitogenesis, dynamical systems theory, biophysics, cell signaling


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This is a beautifully elegant study that tests how previously published theoretical predictions about entraining nonlinear oscillators applies to a biological oscillator, the segmentation clock. The authors use a combination of state of the art experimental techniques, signal processing and analytical theory to reach a series of interesting and novel conclusions.

      They show that the segmentation clock period can be entrained through Notch inhibitor (DAPT) pulses acting as an external clock (referred to as zeitgeber) using a previously developed and sophisticated microfluidic perfusion system. Pulsing DAPT every 120 to 180min can change the internal clock period while entrainment beyond this range leads to higher order coupling to the zeitgeber period, i.e. entrainment of every other pulse. They then perform entrainment experiments where the concentration of DAPT is changed to elicit a change in the strength of interaction between the internal clock and the external stimulus (referred to as zeitgeber strength); interestingly at low strength response to entrainment is more variable leading to entrainment occurring in some samples while others remain unaffected (Figure 4A); overall, higher concentration leads to faster entrainment (Figure 4C). The experimental data is then analysed using stroboscopic maps to reveal that a stable entrainment phase shift is achieved between the internal clock and the external zeitgeber. Phase response curve (PRC) analysis indicates that the system response is not sinusoidal but predominantly characterised by negative PRC, a behaviour consistent with saddle-node on invariant cycle (SNIC); it also reveals that the intrinsic period changes in a non-linear way and that this effect is reversible when external stimulation stops. Finally, a theoretical model is proposed to represent the segmentation clock as a dynamical system; this is based upon Radial Isochron Cycle with Acceleration (ERICA), an extension motivated by the PRC analysis results which are incompatible with a Radial Isochron Cycle (RIC); this model has predictive capability and could be used to design new control strategies for entrainment of the segmentation clock.

      This study makes a series of key conclusions which are of particular importance in understanding the dynamic response of a biological oscillators. Firstly, given it's the characteristics of the dynamic response to entrainment, the segmentation clock is likely close to a SNIC bifurcation and this can explain the tendency for relaxation of the period over time. Secondly, the clock period was changed in a non-linear way in the direction of the zeitgeber period, a finding which is interpreted to indicate the presence of feedback of the segmentation clock onto itself, potentially via Wnt. This makes an excellent prediction that if tested experimentally would greatly improve the impact of the study. It is also noted that the entrainment of the segmentation clock does not abolish spatial periodicity and phase wave emergence suggesting that single cell oscillators can adjust to periodic perturbation while maintaining emergent properties. This is also a significant result that would need to be followed up with experiments and computation however would be best suited to a separate study.

      Major comments:

      __# 12 __The coarse graining is a major point that would need to be clarified since the rest of the analysis

      and theoretical modelling in the paper flow from this. Firstly, the interpretation of the schematic in Figure 1A on experimental data collection is not immediately obvious to the reader, lacks a clear flow between the different panels or steps (which could be numbered for example) and does not have a legend to indicate the different colour mapping.

      We are grateful to the reviewer for this comment. We have implemented in Figure 1A all the changes suggested by the reviewer: we numbered the different steps and have added a colour mapping. In addition we have rephrased the caption of Fig 1A to better connect the experimental steps.

      __# 13 __Secondly, Figure 2A which explicitly addresses coarse graining is not clear enough. Is the

      message here that by excluding the inner parts of the sample with a radial ROI, a similar dynamic response is observed over time?

      Yes, indeed this is the point and we have adjusted the figure and text to explain this better. Our goal is to focus on the quantification of segmentation pace and rhythm. This is best captured by reporters such as LuVeLu, which has maximum intensity in regions where segment forms, and which dynamics is known to be strongly correlated to segmentation (Aulehla et al., 2007; Lauschke and Tsiairis et al., 20132). The global ROI is thus expected to precisely capture these segmentation and clock dynamics and we have now included more validation data and have also edited the text to make this very important point clearer:

      “To perform a systematic analysis of entrainment dynamics, we first introduced a single oscillator description of the segmentation clock. We used the segmentation clock reporter LuVeLu, which shows highest signal levels in regions where segments form \cite{Aulehla_A_2007}. Hence, we reasoned that a global ROI quantification, averaging LuVeLu intensities over the entire sample, should faithfully report on the segmentation rate and rhythm, essentially quantifying 'wave arrival' and segment formation in the periphery of the sample.”

      Figure 2A indeed shows that the dynamics (from the timeseries) is very similar when considering the entire field of view (global ROI) or when considering only the periphery of the 2D-assay (excluding central regions). We modified Figure 2A to clarify this point by indicating each measurement as either global ROI or global ROI minus the diameter of the excluded circular region (e.g. global ROI - 50px). We also emphasized in the caption that timeseries are obtained using global ROI, unless otherwise specified. We included a link (https://youtu.be/fRHsHYU_H2Q) in the caption to a movie of 2D-assay subjected to periodic pulses of DAPT (or DMSO) and corresponding timeseries from global ROI.

      Since the inner part of the sample corresponds to the posterior side how do we interpret similarities and differences between signals with different ROIs?

      As stated above, the global ROI measurements essentially capture the signal at the periphery where segments form and faithfully mirrors segmentation rate and rhythm. We have now included a comparison to the center ROI, also in response to reviewer’s comments, see our response #34.

      The result shows that the period and PRC in the center matches the one found in the periphery, i.e. global ROI. We have shown previously that center and periphery differ in their oscillation phase by 2pi, i.e., one full cycle (Lauschke et al., 2013). We interpret these findings as confirmation of our analysis strategy, i.e. the global ROI allows a very reproducible, unbiased quantification that reports on segmentation clock and period.

      __# 14 __A quantitative analysis of essential coarse-grained properties such as period and amplitude

      should be performed for different ROIs and across multiple samples. As this effectively masks any spatial differences, limitations of this approach should be clearly stated in the Discussion. For example in lines 466-470 where it is difficult to interpret the slowing down tendency and relate back to single cell level.

      As outlined in our response to comment #13 and also #34, we chose an analysis that allows to determine the segmentation pace and rhythm, i.e. segment formation, which is well captured by LuVeLu signal and a global ROI analysis. We agree that a spatially resolved analysis of dynamic behaviour is important (and indeed a gradient of amplitude might be relevant in such context), but we think this is beyond the scope of the current study focused on the system level segmentation clock behaviour. We have revised the discussion as suggested by the reviewer to make this point approach and the need for future studies clearer.

      __# 15 __The functional characterisation of the sample using LFNG, AXIN2 and MESP2 is unclear. The

      images included in Figure 2D representing expression observed when tissue explants are grown within the microfluidic chip are difficult to interpret and would require a more detailed description of anterior-posterior, pillars etc; it is also difficult to view the bright-field since it is presented as a merged image.

      It is particularly difficult to see the somite boundaries for the same reason. In lines 113-117 the authors state that the global oscillation period matches the periodic boundary formation. How do we reach this conclusion from these images? What is the variability between samples?

      If these two issues would be addressed it would increase confidence in the coarse graining argument and thus would strengthen the importance of the findings in the study.

      We thank the reviewer for this feedback, and we have added more quantifications to address this point directly in the modified Figure 2. Importantly, we added the quantification of the rate of segmentation in multiple samples based on segment boundary formation (new Figure 2D) and compared this to the global ROI quantifications using the reporter lines LuVeLu. This data provides clear evidence that the quantification of global ROI reporter intensities closely matches the rate of morphological segment boundary formation. In addition, we show that segment formation and also Wnt-signaling oscillations (Axin2-Achilles) and the segmentation marker Mesp2 (Mesp2-GFP) are all entrained to the zeitgeber period. We have also revised the text to clarify this important validation of our quantitative approach.

      In addition, we provide, in the revised Figure Suppl. 2, details of entrained samples, focusing on the segmenting regions. The brightfield and reporter channels were separated, emphasizing the segment boundaries and the expression pattern of the reporters. For ease of visualization, these samples were also re-oriented so that the tissue periphery (corresponding to anterior PSM) is at the top while the tissue center (corresponding to the posterior PSM) is at the bottom. This now additionally better shows the localization of the different reporters with respect to the segment boundary. We also included supplementary movies showing timelapse of samples expressing either Axin2-GSAGS-Achilles or Mesp2-GFP that were subjected to periodic DAPT pulses, with their respective controls.

      Several minor points could be addressed to improve the manuscript and are listed below:

      # 16 Figure 1 A the colormap and axes for the oscillatory traces should be defined

      We thank the reviewer, and we have modified the figure accordingly (related to point # 12). A colormap and axes for the illustrated timeseries are now included.

      # 17 Strength of zeitgeber is not defined and there is no analytical expression provided; how does it

      relate to DAPT concentration? Is the fact that low DAPT concentration corresponds to weak strength expected or is it a result?

      Zeitgeber strength generally refers to the magnitude of the perturbation periodically applied to an oscillator. With DAPT pulses, our expectation was that both the duration of the pulse and the drug concentration could influence the strength. Practically, the pulse duration was kept constant for all experiments and the concentration was varied. We thus expected that DAPT concentration would indeed be correlated to zeitgeber strength. We have discussed multiple evidence supporting this assumption in the main text, and this is indeed a result. In particular, as explained in the section “The pace of segmentation clock can be locked to a wide range of entrainment periods”, higher DAPT concentration gives rise to faster and better entrainment, as expected from classical theory. In the context of Arnold tongue, weaker zeitgeber strength corresponds to narrower entrainment region, which is experimentally observed (Fig 8F, showing regions where the clock is entrained).

      From a modelling standpoint, Zeitgeber strength corresponds to parameter A which is the amplitude of the perturbation. Possible zeitgeber strength was inferred from the model by matching the experimental entrainment phase with that obtained from the model isophases. As explained in Supplementary Note 2, we tested four concentrations of DAPT (0.5, 1, 2, and 3 uM) respectively corresponding to A values of 0.13, 0.31,0.43, 0.55. As we can see, those A values are not linear in DAPT concentrations, which is expected since multiple effects (such as saturation) can occur.

      __# 18 __In some figures it looks like the amplitude of oscillations may change with DAPT concentration

      and hence zeitgeber strength? Is this expected?

      We have not systematically analyzed the amplitude effect and have, intentionally, focused on the period and phase readout as most robust and faithful parameters to be quantified. Regarding the amplitude of LuVeLu reporter, we are cautious given that it is influenced, potentially, by the (artificial) degradation system that we included in LuVeLu, i.e. a PEST domain. This effect concerns the amplitude, but not the phase and period, explaining our strategy.

      That said, we agree with the referee that DAPT concentrations might change the amplitude of oscillations. Such change could even play a role in the change of intrinsic period (in fact a similar mechanism drives overdrive suppression for cardiac oscillators, Kunysz et al., 1995). But since the change of period can be more easily measured and inferred, we prefer to directly model it instead of introducing a new hypothesis on amplitude/period coupling, at least for this first study of entrainment.

      __# 19 __Figure 2A including the black area creates confusion and it is unclear which ROI is used in the

      rest of the study; consider moving this to a supplementary figure perhaps

      We thank the reviewer for this feedback (related to point #13), and we have modified the figure accordingly. As we responded to point # 13: We modified Figure 2A, by indicating each measurement as either global ROI or global ROI minus the diameter of the excluded circular region (e.g. global ROI - 50px). We also emphasized in the caption that timeseries are obtained using global ROI, unless otherwise specified.

      __# 20 __What type of detrending is used in Figure 2 and throughout (include info in the figure legend)?

      We used sinc-filter detrending, described and validated in detail previously (Mönke et al., 2020), as specified in Supplementary Note 1: Materials and methods > H. Data analysis > Monitoring period-locking and phase-locking: In this workflow, timeseries was first detrended using a sinc filter and then subjected to continuous wavelet transform. We thank the reviewer for pointing out that this detail is lacking in the figure captions, and we have modified the captions accordingly.

      __# 21 __Figure 2D merged images are difficult to read/interpret (see major comments)

      We thank the reviewer for this comment, and we have modified the figure accordingly (please see response to related point #15).

      __# 22 __Kuramoto order parameter is used to quantify the level of synchrony across the different samples

      however it is not defined in the text. Is it also possible to assess variability in each sample? For example how quickly does entrained occur in each sample? How faithfully the peaks of expression beyond 80min (to exclude initial unsynchronised state) match with zeitgeber time? This would help make the point that weak strength leads to a more variable response which is an interesting finding.

      We have now added a mathematical definition of the Kuramoto parameter in Supplementary Note 1.

      A high order parameter corresponds to coherence between samples, as also elaborated in respective figure captions (e.g. in the caption for polar plots in Figure 4D).

      In terms of variability in response to entrainment, we thank the reviewer for the comments, which has prompted us to perform an additional analysis, now included as Figure S13 in the Supplement.

      Briefly, we represent below figures showing how different samples get synchronized with the zeitgeber. To do this, we first represent the zeitgeber signal as a continuous uniformly increasing phase (“zeitgeber time”) with period : . The initial condition for is chosen so that the zeitgeber phase at the moment of last pulse is matching the experimental entrainment phase for each . We plot for each sample (dotted lines) and the zeitgeber phase (magenta line). To quantify how well each sample is following the zeitgeber time, we compute the Kuramoto parameter: . By the end of experiment most samples reach , indicating entrainment. Most samples need zeitgeber cycles to become entrained. For min the entrainment takes much longer (edge of the Arnold tongue). For min there is much variability, which can be explained by the horizontal region in the PRC around the entrainment phase. As suggested by the referee, synchronization is faster for higher DAPT concentration. So those dynamics are indeed consistent with the expectation from classical PRC theory.

      # 23 Do samples change period to Tzeit in similar ways - i.e. patterns over time. It looks like the

      kuramoto order parameter and period drop initially - why?

      We do not have a direct answer as to why the Kuramoto first order parameter and the period drop for the condition the reviewer specified. It has to be noted though that because of how wavelet analysis is done (cross-correlation of the timeseries with wavelets), the period and phase determination at the boundaries of the time series are less reliable (edge effects, see Mönke et al., 2020). Because of this, we should take caution when considering data to and from the first and last pulses, respectively. This was explicitly stated in the generation of stroboscopic maps: “As wavelets only partially overlap the signal at the edges of the timeseries, resulting in deviations from true phase values (Mönke et al., 2020), the first and last pulse pairs were not considered in the generation of stroboscopic maps.

      # 24 In Figure 4C why is the Kuramoto order parameter already higher in the 2uM DAPT conditions at

      the start of the experiment?

      Samples can, by chance, start synchronously and this results in a high Kuramoto first order parameter. Because of this likelihood, it is thus important to interpret the entrainment behaviour of multiple samples using various readouts, in addition to a high Kuramoto first order parameter. We investigated entrainment of the samples based on several measures: multiple samples remaining (or becoming more) synchronous (because each sample actively synchronizes with the zeitgeber), period-locking (where the pace of the samples match the pace of the zeitgeber, which can be distinct from natural pace), and phase-locking (where there is an establishment of a stable phase relationship between the samples and the zeitgeber).

      # 25 Figure 3C and Figure S2 require statistical testing between CTRL and DAPT in each condition

      p-values were calculated for the specified conditions and were added in the caption of the figures. These values are enumerated here:

      • Figure 3C
      • 170-min 2uM DAPT (vs DMSO control): p
      • Figure S2
      • 120-min 2uM DAPT (vs DMSO control): p = 0.064
      • 130-min 2uM DAPT (vs DMSO control): p = 0.003
      • 140-min 2uM DAPT (vs DMSO control): p = 0.272
      • 150-min 2uM DAPT (vs DMSO control): p = 0.001
      • 160-min 2uM DAPT (vs DMSO control): p To calculate p-values, two-tailed test for absolute difference between medians was done via a randomization method (Goedhart, 2019). This confirms that the period of samples subjected to pulses of DAPT is not equal to the controls, except for the 140-min condition (where the zeitgeber period is equal to the natural period, i.e. 140 mins).

      # 26 Figure 3A gray shaded area not clearly visible on the graph

      We have decided to remove the interquartile range (IQR) in the specified figure as it does not serve a crucial purpose in this case. By removing it in Figure 3A, the timeseries of individual samples are now clearer.

      # 27 Figure 6C colour maping of time progression is not clearly visible on the graph; the interpretation

      of this observation is unclear in the text and the figure

      We agree that the low quality of the image is unfortunate, and it seems that our file was greatly compressed upon submission. We have checked the proper quality of figures in the resubmitted version of the manuscript.

      Regarding the interpretation of Figure 6C, we conclude that in our experiments the entrainment phase is an attractor or stable fixed point, in line with theory (Granada and Herzel, 2009; Granada et al., 2009),. We had elaborated this in the text (lines 248-252 of the submitted version of the manuscript): at the same zeitgeber strength and zeitgeber period, faster (or slower) convergence towards this fixed point (i.e. entrainment) was achieved when the initial phase of the endogenous oscillation (φinit) was closer or farther to φent.

      # 28 Figure 7A circular spread not clearly visible on the graph

      Similar to point #27, we have provided a high resolution graph for the re-submission and hopefully resolved this issue.

      # 29 Figure S7A difficult to see the difference between colours

      See point #28.

      # 30 Is it possible to compare the PRC and the plots of period over time during entrainment? The PRC

      is mainly negative (Fig 8A1,A2), in my understanding this means a delay, however the periods seem to decrease over time before entraining to the Tzeit (Fig 3B). Is this reflective of a decrease in Kuramoto parameter and potential de-synchronisation of single cells before re-synchronisation at Tzeit?

      To address this question, we now plot the Phase response with colors indicating pulse number in new Supplementary Figure S13. While capturing the entire PRC as a function of time would require many more experiments (in particular to sample the phases far from entrainment phase), we still clearly see that the PRCs appear to translate vertically as the oscillator is being entrained, i.e. the latter time points are shifted up (down) for T_zeit = 120 (170) min, respectively.

      # 31 Fig 8A What is the importance/meaning of the PRC being similar shape between different

      entrainment periods? Does this reflect that the underlying gene network is the same?

      If one single gene network is responsible for oscillations, we expect from dynamical systems theory that the PRC are not only of similar shape but actually the same, independent of the entrainment period. What is surprising is that the PRC for different entrainment periods do not overlap, and the simplest explanation for this is that the intrinsic period changes with entrainment, all things being kept equal (including the underlying gene networks). This relates to the previous point since we indeed observe that the PRC “translates” vertically with the pulse number for longer periods. The change of period might be due to a long-term regulation as detailed in the discussion.

      # 32 The spatial period gradient and wave propagation under DAPT (Figure S8) should be included in

      the results and not just the discussion.

      We fully agree with the reviewer that both the establishment and the maintenance of a spatial phase gradient is of great interest. However, many more experiments would be required to fully quantify and understand the processes at play here, which we believe to be out of the scope of the current manuscript. To keep the focus of the paper on the global segmentation clock itself, we prefer to keep this figure in Supplement.

      Reviewer #2 (Significance (Required)):

      We currently do not have a detailed understanding of how biological oscillators integrate local signals from their neighbours as well global external signals to give rise to complex patterning that is important for embryonic development. Main bottlenecks that hinder our understanding are lack of real-time endogenous dynamic response together with known global inputs as well as comprehensive models that can explain emergent behaviour in a variety of tissues.

      This study goes a long way in addressing these bottlenecks in the embryonic tissue responsible for somite formation, a dynamical and oscillatory system also known as the segmentation clock. Firstly, they rely on a state-of-the-art previously developed system to entrain endogenous response in live tissue explants using precise microfluidic control. They test the complete range of exogenous perturbation periods and use an existing live reporter (LuVeLu) to monitor endogenous response. They also identify higher order coupling relationships whereby every other LuVeLu peak is entrained through external stimulation.

      As the stimulation system does not control but rather perturb the endogenous response, the observations from LuVeLu provide a unique opportunity in understanding input-output relationships and thus describing the dynamic response of the segmentation clock. Authors propose to study dynamic behaviour of the clock using coarse-graining and focus on describing the overall response over time while amalgamating spatial information. Appropriate coarse-graining is an important strategy in addressing complex problems and is widely used. They use sophisticated methodology such as phase response curves and Arnold tongue mapping to make several important observations. For example the nonlinear shortening and elongation of the period in response to stimulation is particularly interesting since this may indicates a feedback of the clock onto itself potentially via Wnt. Another key observation is that the spatial periodicity and phase wave activity persists in the perturbed conditions suggesting that individual single cell oscillators can adjust their behaviour to external input while retaining coordination with their neighbours. Finally, the authors go on to construct a general dynamical model of the segmentation clock and use this to conclude that the intrinsic period of the oscillator is altered and that the oscillator can be considered excitable.

      This work sheds light onto mechanisms of coordination of Notch activity in assemblies of cells observed in living tissue, an area of research that is important not only for somitogenesis but also for understanding gene expression patterning in many other tissues where Notch plays a critical role, for example in the development of the neural system and organs. As a study of a real-world nonlinear oscillator this work is directly of interest to theoreticians and synthetic biology experts interested in understanding complex patterning and emergence.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, authors studied the system-level responses of the somite segmentation clock by the coarse-grained theoretical-experimental approach, applying the theory of entrainment to understanding the phase responses of mouse pre-somitic mesoderm (PSM) tissues in the presence of periodic perturbation of Notch inhibitor DAPT generated by micro-fluidics technique. It was demonstrated that the segmentation clock is responsive to diverse range of the perturbation-periods from 120 to 180 min, can be period- and phase-locked, and the efficiency is dependent of the DAPT concentration (input-strength). The authors also observed two cycles of the segmentation-clock ticking in single cycles of 300 or 350 min period-perturbation, suggesting that higher order (2:1 mode) entrainment. They also applied stroboscopic maps to analysis and found that entrainment-phases are dependent of period of DAPT pulses, which is recapitulating theoretical predictions. The estimation of the phase response curve (PRC) of the segmentation clock revealed that the inferred PRC is an asymmetrical and mainly negative function, which represents characteristic features in oscillators that emerge after saddle-node on invariant cycle (SNIC) bifurcation. These results also indicated that the the segmentation clock changed the intrinsic period during entrainment.

      Major comments:

      # 33 I have major concerns about the relevance of the global time-series analysis proposed in Fig.2

      and conclusion about the changes of the intrinsic period during entrainment. The validity of the global time-series analysis should be carefully analyzed, because it could bring artifacts in estimated values of the intrinsic period. The authors concluded (page 3, line 172) that the period calculated by the global analysis represents similar values with the rate of segment formation, but there is no data about the quantification of the periods of segmentation, such as the frequency of Mesp2 reporter expression.

      We thank the reviewer for this feedback. We have now added the quantification of the period of segment formation (new Figure 2E) and show its strong correspondence to the dynamics of reporters used (Lfng, Axin2, and Mesp2). Please see also our response to point #15 with additional comments regarding the validation of the global time-series analysis.

      # 34 Another related issue is the presence of spatial period gradient as mentioned (page 13, line 524).

      One possible approach to circumvent this issue would be "local" time-series analysis; for instance, just focusing on the "putative posterior" regions that are close to source-positions of waves. Authors can re-compute and estimate PRCs by using such a method.

      We thank the reviewer for this suggestion and have accordingly now included the analysis of a localized ROI at the center (center ROI) of the 2D-assays (new Figures S5-S6). We also computed the PRC from center ROIs as shown below. We note strong correspondence between the global ROI and the center ROI.

      # 35 I have another major concern about the evidence of higher order entrainment shown in Fig.5. If

      the 1:2 entrainment is successful, we can expect that the values of observed period is close to the half of the period of pulses; However, the period shown in Fig.5B looks like 185 min longer than the half of 350 min. Is this gap due to the temporal accuracy of time-lapse movies?

      We do not think the discrepancy comes from a problem of temporal accuracy as the temporal accuracy is the same for all movies and there is no reason why there would be a specific issue for this set of experiments. In addition, we have re-analyzed the data to calculate the period from the stroboscopic maps. Mathematically speaking, we take the stroboscopic map as (see PDF) and use this to estimate the period of oscillation in entrained samples , in particular inverting the formula for 1:2 entrainment we have : see PDF.

      The advantage of this method is that it gives a more ``instantaneous” estimation of the period.

      The results are as follows:

      350 10uM: 187 +- 8 min (average across entrained samples from the last zeitgeber period)

      350 5uM: 193 +- 13 min (average across entrained samples from the last zeitgeber period)

      300 2uM: 148 +- 8 min (averaged across entrained samples and from two last periods)

      This additional analysis is in agreement with the wavelet analysis.

      The reviewer is right that for 350 minutes, entrained samples show an observed period that is higher than expected, also based on this new additional analysis. The reason for this is not known. One explanation is the relatively short observation time, especially considering for pulses separated by as much as 350-minutes, i.e. only 3 pulses are applied. [We notice that for 300 minutes pulses, the period converges to 150 mins between the 3rd and the 4th pulse]. We have adjusted the text in the results section to reflect that for 350min entrained samples, the observed period ‘approaches’ the predicted value, while for 300min entrained samples, the observed period is very close to it, i.e. 147mins In addition, we comment that the phase distribution narrows with time, another indication supporting higher order entrainment.

      # 36 Also, authors showed the period evolution towards 1:2 locking with just one condition (350 min).

      Authors can show the data for multiple conditions as in Fig. 3D, at least for 300 min and 325 min pulses and add the data about final entrained period with statistic analysis that supports the difference between the entrained period and the natural period (140 min).

      We thank the reviewer for this feedback and have modified the figure accordingly. In particular, in Figure 5A, we have added the period evolution plot for samples subjected to 300-min periodic pulses of 2uM DAPT (or DMSO for control). Additionally, we have added Figure 5D, which plots the average period in the 300-min and 350-min conditions. We summarize the median average period here with computed p-values:

      • 300-min pulses of 2uM DAPT (or DMSO for control): p-value = 0.191
      • CTRL: 130.39 mins
      • DAPT: 146.45 mins

      • 350-min pulses of 5uM DAPT (or DMSO for control): p-value = 0.049

      • CTRL: 127 mins
      • DAPT: 174.86 mins

      • 350-min pulses of 10uM DAPT (or DMSO for control): p-value = 0.016

      • CTRL: 142.82 mins
      • DAPT: 185.12 mins

      Minor comments:

      # 37 The authors can draw vertical lines indicating the T_zeit in Fig.3B, Fig.4B and Fig.5B in order to

      help comparisons between T_zeit and patterns of period (solid lines).

      We thank the reviewer for this comment. We have accordingly added a horizontal line indicating Tzeit in Figures 3B, 4B, S4A, and S5A (figure panel numbers based on the submitted version of the manuscript). We similarly added a horizontal line indicating 0.5Tzeit in the period evolution plots of 300-min and 350-min conditions in Figures 5A and 5B, respectively.

      # 38 In Fig.5A, the authors can show period evolution in the case of 300 min DAPT-pulses as shown

      in Fig.5B.

      We thank the reviewer for this feedback (related to point #36), and we have modified the figure accordingly.

      # 39 In Fig.6B DAPT panel, the authors can draw the points of phi_ent as shown in Fig.7A.

      We thank the reviewer for this comment, and we have modified the figure accordingly.

      # 40 In Fig. 8F, authors can put the information about DAPT concentration at the right y-axis.

      This is a similar comment as point #17, see above. In brief, we do not know the precise relation between the strength of the perturbation in our model and DAPT concentration, zeitgeber strength was inferred from the model by matching the experimental entrainment phase with that obtained from the model isophases.

      # 41 In Fig. 8G, the PRC in the panel "170 mins" does not have any fixed point (cross sections with

      horizontal lines of "0" phase response). If entrainment is successful, there should be stable and unstable fixed points, but those are absent, although 170 min pulses succeeded in the entrainment as shown in Fig.3D. Authors can explain where the fixed points are.

      The fixed points are indeed defined by the intersection with a horizontal line, but not with the ‘0’ line. They are found where the phase response compensates for the detuning/period mismatch, not at ‘0’ phase response. (See PDF for more details).

      Note however on Fig 8G that we further observe a vertical shift of the PRC, which prompted us to propose a change of the intrinsic period with (as explained in the text when we introduce Figs 8A1-2).

      Another way to visualize fixed points is offered in Fig 16 D-E, where we plot the inferred corrected PTC and the stroboscopic maps: there, fixed points correspond to intersections with the diagonal.

      Reviewer #3 (Significance (Required)):

      Although the phase-analysis has been widely applied to various biological systems, such as circadian clocks, cardiac tissues and neurons, this paper represents the first detailed experimental analysis of the segmentation clock based on the theory of phase dynamics. The major results are inline with theoretical predictions, whereas the suggestion about the SNIC bifurcation is attractive not only to the theoretical researchers but also to the experimental biologists; it has been believed that the segmentation clock consists of negative-feedback oscillator that emerge by Hopf bifurcation, whereas this paper proposes another possibility of the molecular network structure for the clockwork. This issue is related to recently proposed hypothesis about the excitable system in the segmentation clock based on the Yap signaling (Hubaud et al. Cell 171, 668 (2017)). However, unfortunately, discussion about detailed molecular networks are not abundant.

      # 42 Thus, maybe the main readers are computational biologists and systems biologists.

      We thank the reviewer for his/her significance comment. We have added comments on the bifurcation structure of the segmentation clock and on excitable systems in the discussion. While our focus is on coarse-graining so that we do not and cannot infer precise molecular details, we can still infer some properties of the underlying networks. In particular we now cite several papers explaining how systems with tunable periods/excitable are indicative of the interplay between positive and negative feedbacks. We think those considerations are of interest to a broad range of biologists interested in connecting experiments to theory.

    1. SAMSON CARRASCO

      Samson is extremely important to Don Quixote. At first glance we think of him as the antagonist however, as the story progresses we find that he is trying to help the Don. "The ir a key figure that fulfills a double function: to cheer up Don Quijote so that he may go out for the third time and also to induce him to return home." This makes him a pivotal part of this story.

      Presence and Sense of Sanson Carrasco | Request PDF. https://www.researchgate.net/publication/298984686_Presence_and_sense_of_Sanson_Carrasco.

    1. “My reasons for marrying are, first, that I think it a right thing for every clergyman in easy circumstances (like myself) to set the example of matrimony in his parish; secondly, that I am convinced that it will add very greatly to my happiness; and thirdly—which perhaps I ought to have mentioned earlier, that it is the particular advice and recommendation of the very noble lady whom I have the honour of calling patroness. Twice has she condescended to give me her opinion (unasked too!) on this subject; and it was but the very Saturday night before I left Hunsford—between our pools at quadrille, while Mrs. Jenkinson was arranging Miss de Bourgh’s footstool, that she said, ‘Mr. Collins, you must marry. A clergyman like you must marry. Choose properly, choose a gentlewoman for my sake; and for your own, let her be an active, useful sort of person, not brought up high, but able to make a small income go a good way. This is my advice. Find such a woman as soon as you can, bring her to Hunsford, and I will visit her.’ Allow me, by the way, to observe, my fair cousin, that I do not reckon the notice and kindness of Lady Catherine de Bourgh as among the least of the advantages in my power to offer. You will find her manners beyond anything I can describe; and your wit and vivacity, I think, must be acceptable to her, especially when tempered with the silence and respect which her rank will inevitably excite. Thus much for my general intention in favour of matrimony; it remains to be told why my views were directed towards Longbourn instead of my own neighbourhood, where I can assure you there are many amiable young women. But the fact is, that being, as I am, to inherit this estate after the death of your honoured father (who, however, may live many years longer), I could not satisfy myself without resolving to choose a wife from among his daughters, that the loss to them might be as little as possible, when the melancholy event takes place—which, however, as I have already said, may not be for several years. This has been my motive, my fair cousin, and I flatter myself it will not sink me in your esteem. And now nothing remains for me but to assure you in the most animated language of the violence of my affection. To fortune I am perfectly indifferent, and shall make no demand of that nature on your father, since I am well aware that it could not be complied with; and that one thousand pounds in the four per cents, which will not be yours till after your mother’s decease, is all that you may ever be entitled to. On that head, therefore, I shall be uniformly silent; and you may assure yourself that no ungenerous reproach shall ever pass my lips when we are married.”

      this seems unnecessary

    1. It’s not such a diffi cult process . . . to start with. . . . If they [Latinos] really wanted to do it, they would just go out and fi ll out the application and ask the teacher for details. . .

      I don't think it is as easy as she makes it sound. As we learned in last week's article, there are some kids who don't even understand the order of taking pre-algebra before algebra simply because they don't have people in their lives to explain that to them. It may seem easy to her because it is a path a lot of people in her family have taken or that her family holds strong values to school so she knows more about it.

    1. Within the field of instructional design, we have sometimes observed a hesitation to dwell on visual aesthetics (Parrish, 2009). This hesitation may stem from concern that artistically-approached designs will lack the ability to be replicated (Merrill & Wilson, 2006) or that the artistic elements will serve merely as window dressing—or worse, distraction—that provides no educational benefit to the learner.

      I find this to be true in my experience. I have worked with some professors who baulk at the idea of spending time creating or searching for a course banner image. There's other examples related to this, but I personally think that something as simple as finding or creating a course banner image can excite students. Or, if it's a corporate training hosted through Rise or Storyline, this may just be little visual elements and images that add a little something to the visual experience.

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides data suggesting that tonic presynaptic a7 nicotinic receptor activity enhances corticostriatal input-mediated excitation of striatal medium spiny neurons; the data also suggest that tonic a4b2 nicotinic receptor activity on PV-fast spiking GABA interneurons inhibits striatal medium spiny neurons. These data advance our understanding about the complex cholinergic regulation of striatal neuronal circuits.

      The presented data are generally clean and high quality; but there are some problems that require the authors' attention.

      We thank the Reviewer for their insightful comments. We have addressed each point below with additional data and/or text. We believe these revisions have made the manuscript significantly stronger.

      1. In this study, ADP is a key parameter manipulated by several pharmacological treatments. But it is not clearly defined. The authors indicate EPSP and ADP are distinct by stating "LED pulse of increasing intensity generates excitatory postsynaptic potentials (EPSPs), or an AP followed by an after depolarization (ADP)." But the data (e.g. Fig. 1B) indicates that much of the ADP is probably EPSP. Please clarify. If much of the ADP is indeed EPSP, how are the data interpretation and the overall conclusion affected?

      We apologize for the oversight. The main focus of our study is on how tonic nAChR activation controls the timing of striatal output; our justification for including the ADP in our experimental analysis was simply corroborative, in that it represents an additional, easily measured parameter of the postsynaptic response to convergent cortical stimulation that 1) can be modulated by similar local inhibitory circuits that we show to mediate the effect of tonic nAChR activation and 2) is positioned (as opposed to EPSPs) to influence subsequent spiking, should the appropriate synaptic cues be present (which are deliberately omitted in our study). That said, under our experimental conditions EPSPs and ADPs were similar in both their kinetics and modulation by mecamylamine, suggesting that they represent mechanistically similar responses to cortical afferents. The defining difference (besides ADPs exhibiting larger amplitudes) is that they appear either in the absence of or following a spike. For these reasons we ultimately decided that reporting changes in both ADPs and EPSPs would be redundant, and limited our analyses to ADPs. Text has been added to the first paragraph of the results section to address these points.

      In Fig. 1F, ADP is absent. Why? Please clarify.

      Figure 1F shows an example of a SPN held at a mimicked ‘up-state’, achieved by injecting positive somatic current to produce a ‘resting’ membrane potential of -55-50mV. In this scenario, the ‘up-state’ membrane potential is higher than what would be reached during most ADPs evoked from Vrest, preventing the observation of ADPs in many trials. Text has been added to the end of the first paragraph in the results section to clarify this point.

      If ADP is distinct from EPSP here in MSNs, has it been reported in the literature, and how is it generated?

      Under our experimental conditions, we do not see any major differences between EPSPs and what we term ADPs (other than amplitude), at least in terms of kinetics and modulation by mecamylamine. That said, we have added text to the first paragraph of the results section that references previous work (Flores-Barrera et al.) describing suprathreshold depolarizations proceeding SPN spikes, which shaped our reasoning for including this measure in our study.

      1. In Fig. 1F, the holding potential for mecamylamine is a few mV more negative than the control, but the spike latency is shorter under mecamylamine. This is hard to understand because membrane potential (current-injection-induced depolarization + EPSP) determines spike firing and latency. If the holding potential is the same, then it's easy to understand (larger EPSP under mycamylamine).

      Thanks for pointing this out! We agree that this might seem counter-intuitive in terms of Vrest and EPSP amplitude only. Given that mecamylamine reduces GABAergic inputs to SPNs, the reduction in spike latency in this case is consistent with a reduction of GABA receptor mediated shunting. We have added this point to the text in the 3rd paragraph of the results section, which we think strengthens our justification to look at GINs as the potential mediators of mecamylamine’s effect on spike latency.

      1. Data in Fig. 2D, E are weak. The spiking ability of whole-cell recorded neurons often declines over time (evidence: the AP duration for the red trace is longer); recovery/partial recovery from MLA is needed for the data to be reliable. Fig. 2E shows 8 cells: 6 had no response, 2 increased. Sample size needs to increase.

      We appreciate this comment. Our initial justification for this experiment was from previous reports that alpha-7 nAChRs reduce corticostriatal glutamate release probability. We have now added additional data (Figure 2 supplemental data) showing that blockade of tonically activated alpha-7 nAChRs with the more specific antagonist MLA was not sufficient to change corticostriatal synaptic strength or release probability. In parallel, as we began increasing the sample size of the experiment testing the effect of MLA on spike latency, we noticed that the effect size became smaller than what we initially reported, which was already modest. Given the modest effect size of MLA on spike latency (with no presynaptic mechanism to offer), we reason that it would likely have minimal impact compared to the larger effect of mecamylamine. For this reason, we have backed off our conclusion that TONIC activation of presynaptic alpha-7 nAChRs on corticostriatal axon terminals will have a meaningful physiological impact on SPN spike timing. Accordingly, we removed previous figure 2D/E, but supplemented Figure 2A/B/C with new data (figure 2 supplement) demonstrating the lack of effect of tonic nAChR activation on corticostriatal synapse release probability. The title of the manuscript has been altered to reflect this.

      1. Fig. 7: the data on DhbE increasing AP duration is not convincing: no effect in 4 neurons, increase in 4 other neurons, and decrease in other neurons. Data ismore important than p<0.05. How do you interpret DhbE increasing AP duration?

      Point taken. We shouldn’t let a statistical calculation dominate the interpretation of a mostly mixed population result. Furthermore, upon revisiting this figure we realized that the main points pertinent to our conclusions (mecamylamine hyperpolarizes PV-FSI Vrest) were obscured by data that were of limited relevance. We have re-focused this figure to highlight data that are directly pertinent to our interpretation. This included removing the AP duration data set in question, which does not add to or inform our conclusions. We have further strengthened our conclusion that PV-FSIs are a primary mediator of the effect of tonic nAChR activation on spike latency by adding new data showing that pharmacologically blocking cortical activation of PV-FSIs occludes the effect of mecamylamine (new figure 8, see comments to Reviewer 2).

      Fig. 7F shows AP duration for PV-FSI is around 1.75 ms (some are over 2 ms, recorded at 35 C). This is unusually long. Also, the AP rise time is around 1.4 ms, very long. 1.75 ms total rise time vs. 1.4 ms for just rise: they do not add up?

      Please see our response to the above point.

      Reviewer #2 (Public Review):

      This manuscript examines one aspect of how acetylcholine influences striatal microcircuit function. While striatal cholinergic interneurons are known to be engaged in key events and tasks related to the basal ganglia in vivo, and pharmacological studies indicate cholinergic signaling is complex and critical to striatal function, the mechanistic details by which acetylcholine regulates individual cell types within the striatum, as well as how these integrate to shape striatal output, remain largely unknown. This work thus addresses an important problem in the basal ganglia field, with likely relevance to both normal function and disease-related dysfunction. The authors used a brain slice preparation in which a large number of excitatory cortical inputs to the striatum are activated, and they could measure the resulting activation of striatal projection neurons (SPNs). Their primary finding was that in this preparation, blocking nicotinic acetylcholine signaling resulted in more rapid activation of SPNs. They then explored some of the potential mechanisms for this phenomenon, and conclude that in their preparation, cholinergic interneurons are engaged both tonically and phasically, resulting in recruitment of local GABAergic interneurons that provide feedforward inhibition onto SPNs. They show that one striatal GABAergic interneuron subclass, PV-FSI, are modestly excited by tonic nicotinic signaling, and suggest this may be one contributor to their primary finding.

      Strengths of the study include the focus on cholinergic signaling across multiple striatal cell types, careful and clearly displayed slice electrophysiology, good writing, and a methodical approach to pharmacology.

      Weaknesses include reliance on the Thy1-ChR2 line to activate excitatory cortical inputs to the striatum (this line may be less specific to cortical pyramidal neurons than a specific Cre recombinase mouse line used with Cre-dependent ChR2, and thus have unintended influences on the results), and despite a strong start, a fairly weak mechanistic exploration of what GABAergic neuron subclasses might contribute to their original phenomenon.

      We thank the Reviewer for their thoughtful and constructive comments. The Reviewer identified two weakness of our study, as presented. The first weakness was our reliance on a transgenic mouse line (Thy1-ChR2) to activate cortical inputs to the striatum. Specifically, how a potential lack of specificity/ectopic expression of ChR2 in non-glutamatergic cortical neurons may impact our interpretation of the data. The second is that we did not make an effort to identify the specific subclass(es) of GINs that contribute to the phenomenon we describe. We have addressed both of these comments with new experiments, which we will describe individually below.

      1) Specificity of corticostriatal afferent activation in Thy1-ChR2 mice. As the Reviewer keenly points out, although Thy1-ChR2 mice are often used as a tool to specifically activate excitatory corticostriatal nerve terminals with optogenetic stimuli, there is concern that ChR2 expression is not exclusively limited to glutamatergic cortical neurons. If present, direct optogenetic activation of non-cortical striatal afferents would influence our results and impact our interpretation. We have addressed this issue experimentally by adding two new types of experiments (and related text, pages 7-8).

      We have added new data using immunohistochemical staining to survey for ectopic expression of ChR2 in the cortex. Staining for GAD, to broadly identify GABAergic neurons, displayed no overlap with ChR2-expressing cortical neurons in Thy1-ChR2 mice. Since a population of GABAergic somatostatin-expressing cortical neurons (particularly in the auditory cortex), have been shown to directly innervate the striatum (Rock et al., 2016), we also show that we found no evidence for somatostatin-ChR2 colocalization in our mice. Furthermore, we report no evidence for somatic expression of ChR2 in the striatum. We do report somatic expression of ChR2 in a population of globus pallidus soma, and add text to describe the above data (figure 3 supplement ) as well as published data identifying ChR2 in axons of the substantia nigra. Together, these data suggest that cortical expression of ChR2 is limited to non-GABAergic neurons, though do not eliminate the possibility of a direct monosynaptic GABAergic input to the striatum form non-cortical (and extrastriatal) brain regions. We describe newly added experimental data below to address this possibility.

      We have added new data to directly test if the optogenetic stimulation protocol used in this study induces a monosynaptic GABAergic current in SPNs (figure 3 supplement). We report that an optogenetically-evoked monosynaptic GABAergic current is indeed detected in SPNs, though it is unlikely to affect our results or interpretations for two reasons. First, based on the newly added histological data, the source of this GABAergic current is non-cortical and extrastriatal. Second, and more importantly, this input is insensitive to mecamylamine (new data, figure 3 supplement) and as such would not be modulated by the key manipulations presented in this study. Finally, experiments described below – instructed by a suggestion made by Reviewer 2 (see below) – show that blocking glutamatergic synaptic activation of a class of striatal GINs eliminates the effect of mecamylamine on SPN spike latency, ruling out the involvement of a monosynaptic GABAergic input in mediating the phenomenon.

      2) Identification of the key GIN subclass that mediates the phenomenon. Our initial manuscript included data demonstrating the feasibility of PV-FSIs in participating in the phenomenon we described, but we agree with the Reviewer that we stopped well short of identifying the class of GINs that are actually involved. We have added two new data sets to the manuscript that now corroborate both the involvement and necessity of PV-FSIs in mediating this phenomenon. First, we have added data showing that striatal SOM+ interneurons respond to mecamylamine differently than PV-FSIs do: while mecamylamine hyperpolarizes PV-FSIs, it depolarizes the average membrane potential of SOM+ interneurons and has no effect on their spontaneous firing frequency, making them unlikely candidates to mediate the phenomenon we describe. Second, we have added data showing that pharmacologically preventing cortical activation of PV-FSIs both mimics and occludes the effect of mecamylamine on spike latency and ADP amplitude (new figure 8). This data also rules out the involvement of certain other classes of GINs, such as PLTS interneurons, as the pharmacological manipulation we performed (blockade of calcium-permeable GluA2-lacking AMPA receptors) does not affect their response to cortical inputs (Gittis et al., 2010).

      Reviewer #3 (Public Review):

      The manuscript by Matityahu et al., investigated the role of tonic activation of AChRs on the spike timing of striatal spiny projection neurons (SPNs) in acute striatal slices. By selectively activation of corticostrialal projections using optogenetic tools (ChR2), they find that pharmacological blockade of presynaptic α7 nAChRs delays SPN spikes, whereas blockade of α4β2 nAChRs on GABAergic interneurons advances SPN spikes. The work is carefully done with proper control experiments, and the main conclusions are mostly well supported by data.

      Although they only constitute ~1% of the total striatal neurons in rodents and humans, cholinergic interneurons (ChINs) are gatekeepers of striatal circuitry because of their extensively arborized axons and varicosities which tonically release ACh. Whereas the role of muscarinic AChRs (mAChRs) in modulating striatal output has been well established, the role of nAChRs (especially the tonic activation) remains to be elucidated. The study is solid and the results are new and convincing. The data suggest that tonic activation of nAChRs may place a "brake" on SPN activity, and the lift of this brake during pauses of ChIN firing in response to salient stimuli may be critical for striatal information processing and learning. The findings from this study will enhance our understanding of the role of tonic nAChR activation in controlling SPNs and striatal output.

      We thank the reviewer for their careful reading of our manuscript and for their kind words and helpful suggestions.

      Unjustified Conclusions and Suggestions:

      1) The change of the SPN spike timing by AChR modulation is on a few milliseconds time scale. To make the current study more significant, the authors should design and perform additional experiments to demonstrate the functional consequence in controlling striatal output and learning. For example, will activation or blockade of nAChRs have effects on striatal STDP?

      We too would be thrilled to see the results of such experiments. Unfortunately our early attempts to perform such tests (e.g., crossing Thy1-ChR2 mice with ChAT-Cre mice to selectively express halorhodopsin in CINs, and combine cortical excitation with silencing of CINs) have been plagued by technical challenges, and would require time and resources that we feel are pragmatically beyond the scope of this study. That said, we’ve included new text (particularly, page 15) discussing how our results may fit with a newly published study on the role of CINs in corticostriatal LTP (Reynolds et al., 2022).

      2) Modulation of striatal circuitry is complex. The addition of a diagram illustrating the hypothesis and key results would help.

      Excellent suggestion. We have added a summary diagram, which is now figure 9.

    1. Rintze December 5, 2011 With regard to broken translators, do the Zotero clients phone home any details on save failures? (there is a preference checkbox "Report broken site translators" which suggests they do)I don't mind fixing up a few more translators, but it would be nice to know which translators fail most often. ajlyon December 5, 2011 It does phone home, but I'm afraid those reports are going into a black hole for now; I've noticed the requests in various logs, but I've never been notified of a failing translator by the Zotero team. It'd be great if the translator list / status page integrated explicit tests and such error reports. adamsmith December 5, 2011 there is, of course, also a good number of translators who don't trigger any errors, because they don't detect. Rintze December 5, 2011 Yes, but I would argue that non-detecting translators are less frustrating to users. dstillman December 7, 2011 Here's a start:https://repo.zotero.org/errorsThe actual error reports aren't public for privacy reasons (and we're not displaying absolute numbers), but we can provide example error strings and URLs on request. We also might be able to have this automatically display error strings that show up across many reports (e.g., "TypeError: scisig is null" for Google Scholar), since short of major site breakages it will probably be hard to debug many of these without examples.Note that the Google Scholar results are greatly skewed by Retrieve Metadata attempts, and DOI is also showing mostly "could not find DOI" errors. I'm hoping detection can be tightened on those (e.g., to remove the folder icon on a Google Scholar search with no results), which would allow this to better show actual error frequency. ajlyon December 7, 2011 I'll try to work on detection. Automatic display of common error strings would be very useful, as well as some general idea of how many errors we're talking about-- for something like ScienceDirect, are we talking about 10 errors? 100? 1000?Also, does this filter out data from clients with out-of-date translators or Zotero versions?Thanks for putting this up! It's sure to be useful in the coming weeks and years. Rintze December 7, 2011 Like ajlyon, I think some indication of the number of errors per translator would be very useful. And could the list be expanded to show more than the top 10 translators (say the top 50)?Also, would it be possible to create somewhat comprehensive reports with, say, 10 error strings and URLs for each translator to send to ajlyon, adamsmith and me, so we don't have to submit individual requests per translator? I'd hope we have established ourselves as at least somewhat trustworthy (and I assume all three of us would be more than willing to sign any privacy agreement). ajlyon December 8, 2011 Thanks for upping the number visible.What's going on with the outdated translators? There are people out there with three different ScienceDirects, two DOIs... Is that just people with updating off? Or something else? dstillman December 8, 2011 OK, updated again with absolute numbers and per-error breakdowns. Hover over each segment for error details. I don't think any page data will make it into the errors, but to be safe I'm displaying only errors coming from at least three addresses that don't include the string "http" in them—the rest get lumped together at the end in blue. If you notice anything that shouldn't be in there, let me know.We might be able to display URLs that show up across enough addresses, though there may not be enough of those. What's going on with the outdated translators? Those are all <2.1.9. Not much we can do for those folks.
      • ABOUT property "Report broken translators"
    1. It is not just that trans women are not really women;even females who self-identify as women are not really women.

      I think that Barnes will not agree this characterization. Barnes's idea is simply that there is no single group corresponding to the term "woman". Instead, there are multiple groups that may be the semantic value of "woman". Some of them are much more gerrymandered. I think the idea does not imply that no one is really a woman. Instead, the upshot is simply that when we consider whether one is really a woman, we must attend to the meaning of "woman".

    1. Author Response:

      We largely agree with the assessment of the Reviewers. Indeed, as noted by Reviewer #2, under the urgent conditions of our experiment, the onset of the cue modulates competing saccade plans that are already ongoing. The reviewer is correct in considering that the initial motor plans are endogenously generated, as they favor one location or the other based simply on the subject's internal bias or preference. We would just note that the endogenous signal that we focus on refers to a later modulation which, based on the perceived cue location and the task rules, directs the motor plans to the correct target location. According to our findings, this endogenous modulation occurs after the exogenous response and acts in the opposite way, boosting the anti-saccade plan and curtailing the activity that would otherwise trigger an erroneous pro-saccade. Thus, three things may happen in each trial: (1) initial, uninformed motor plans are endogenously generated, (2) the cue onset exogenously reinforces the plan toward the cue, and (3) an informed endogenous signal suppresses the plan toward the cue and boosts the plan toward the anti location. We think the novelty here is in being able to characterize these distinct events, which unfold within a few tens of milliseconds of each other.

      Reviewer #3 considered our conclusion that the exogenous response "is entirely insensitive to behavioral context" too strong, and that is a fair point. Conclusions apply to the degree that experimental conditions are valid in general, and furthermore, the deviations from the idealized predictions were small but not zero. However, we do not consider the assumption noted by the reviewer, that saccade-related neural activity ramps up before the saccade goal is known, as a weakness. We have, in fact, recorded such activity in several oculomotor areas using similar urgent-choice designs (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., J Neurosci 33:16394, 2013; Costello et al., J Neurophysiol 115:581, 2016; Scerra et al., Curr Biol 29:294, 2019; Seideman et al., bioRxiv, 2021, https://doi.org/10.1101/2021.02.16.431470), and the responses in the frontal eye field (FEF) in particular conform quite closely with those assumed by the model (Stanford et al., Nat Neurosci 13:379, 2010; Costello et al., 2013; Salinas et al., Front Comput Neurosci 4:153, 2010). Rather than a potential liability, we think the early ramping activity is a key constraint for any model of urgent choice performance.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, Miettinen and colleagues exploit the suspended microchannel resonator developed in their lab and optimize the method to be able to record single live mammalian cells for very long periods of times, across several cell division cycles, while performing a double measure of their buoyant mass in media of different densities (H2O and D2O). Because water exchanges fast enough inside the cell, it allows them to define a dry mass and a dry volume, and thus a density of dry material for single cells along the entire cell division cycle. These measures lead them to confirm and clarify some points from previous studies from their lab and others, such as exponential growth also in dry mass and the fact that buoyant mass and this new dry mass are the same thing in interphase cells. They then find that this is not true during mitosis, mostly because dry mass density increases in early mitosis (dry mass decreases and dry volume decreases even more, suggesting that there is a loss of material of density lower then the average dry mass density). The authors rule out a number of potential mechanisms and give evidence for a role of exocytosis, more precisely exocytosis of lysosomal content. Blocking this phenomenon prevents the change in dry mass density but does not affect cell division. They propose some potential function for this phenomenon, including the interesting hypothesis that this helps cleaning the lysosomal content which might contain some toxic components, so that daughter cells are born with 'clean' lysosomes. Cool idea! It is also quite amazing that the precision of their method allows them to detect this event.

      The main question I have concerns the definition of dry mass and dry volume. The authors should discuss in more details what it represents physically. Technically, this is defined by their equation 1, which relates their measure of buoyant mass to a dry mass and a volume of water as parameters to fit from the buoyant mass data. One gets to this equation by writing the definition of buoyant mass as the mass of the cell minus the mass of the equivalent volume of the surrounding medium. But then, to get what the authors find, one has to write that the cell mass is the sum of the dry mass and the mass of water contained in the cell (which makes the dry mass easy to understand) and then to write that the cell volume is the sum of a volume of water and of a volume of dry material. This then defines a dry volume, as the difference between the volume of the cell and the volume of the water contained in the cell (which is the parameter Vwater in the equation 1). At least this is how I got to this equation. The question I asked myself then is: what is this dry volume? Is it really the volume occupied by the dry mass in the cell? This is probably not the case, since dry mass is solvated in the cell. One can estimate this solvated volume using the van't Hoff/Ponder relation, which can be found changing the osmolarity of the external medium. It defines an excluded volume, which is the total volume excluded by macromolecules (like for a van der Waals gas) - it is usually between 25 and 30% of the cell volume. This volume contains the dry mass plus a certain fraction of the water, so it is not exactly the dry mass volume as defined here by the authors. I am worried that this dry mass volume, which is mathematically defined here and calculated from the fit of the equation, is not a standard physical quantity and so it is not easy to relate it to standard biophysical theories (e.g. equations of state), and its behavior could be very unintuitive even for simple systems. This makes the variation in this quantity not easy to interpret, and thus also the variation in dry mass density is not easy to interpret in physical terms.

      That being said, it is still clear that whatever this is, it changes in early mitosis, and it seems to be related to exocytosis, so I am not saying that the authors are wrong here. They potentially indeed detect this increase of exocytosis. But they should discuss more what they think this quantity is, either in the methods or in the discussion of the article. In particular, the sentence at the bottom of page 5, line 104, is not clear ('We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell'), since this measure is not really clearly a biophysical feature of a cell, but is defined a bit artificially from the equation which defines the dry mass volume from the measures of buoyant mass.

      Thank you for the detailed and very constructive feedback. As stated above in the Essential Revisions section, we have now clarified the terminology we use and made the terminology more consistent with existing literature. We have also better defined the concept behind our method. Our updated Measurement Method section now states (page 3) that: “In our approach, we consider the buoyant mass of a cell to be dependent on two distinct physical “sections” of the cell, the dry content and the water content. To measure the cell’s dry content independently of the water content, we measure the cell’s buoyant mass in H2O and D2O-based solutions. Under these conditions, the influence of the water content on buoyant mass can be excluded, because the intracellular water is exchanged with extracellular water, making the intracellular water content neutrally buoyant with extracellular solution. This allows us to detect the cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume) and dry mass density (i.e. dry mass / dry volume).”

      The reviewer is also correct that our method measures a dry volume which is, by our model’s definition, the volume occupied by the dry mass independently of water. In other words, our method & measurement model assumes that the intracellular water exchange is 100% complete. The reviewer is correct that some water may be retained, and we cannot directly measure the amount of H2O left inside the cell after immersion in D2O-based media. However, our results indicate that our dry volume measurements are not limited by the water exchange time that the cell experiences (Figure 1–figure supplement 2). In other words, in our measurements, cells exchange all the water they can exchange, be that 100% or 98%. This is further supported by our new estimations of the time needed to transport all water in and out of the cell (see above, other comments section #1, and our updated manuscript page 5). Note that, as our method only exchanges H2O to D2O instead of removing all water from the cell, dry mass will always remain solvated in either H2O or D2O, which makes it plausible that 100% of the water content is exchanged.

      As the reviewer keenly points out, our measured dry volume is biophysically distinct from the more classically measured excluded cell volume (or dehydrated cell volume), which still includes some water in the excluded cell volume quantifications. Consistently, our method measures dry volumes that are smaller (~15%) than what the excluded volumes typically are (~25-30%). We do not consider this a limitation of our method, but rather an opportunity for new measurements. That being said, we completely agree with the reviewer that this may cause confusion in the readers. To address this point, our Measurement Method section now states (page 4) that: “Importantly, our approach assumes that all water within the cell is exchangeable between H2O and D2O. Accordingly, our dry volume measurement is distinct from the excluded cell volume detected by measuring cell volume following strong hyperosmotic shocks, which does not remove all water from the intracellular space.”

      Finally, we have also changed the sentence “We are not aware of any other single cell methods capable of quantifying this biophysical feature of a cell” (page 5) so that it only refers to a metric, which hasn’t been quantified before on a single-cell level. We believe that this minor change will avoid the suggestion that dry volume is of biophysical importance on its own.

      Reviewer #2 (Public Review):

      The new suspended microchannel resonator (SMR)-based method described in this paper enables high precision and high temporal resolution single-cell measurements of key physical properties: cell dry mass and the density of cell dry mass, which depends on the macromolecular composition of the cell. The validity of the method is rigorously tested with several convincing control experiments. This method will be useful for future studies investigating cell size and growth regulation and the coordination of mass, volume and density in animal cells.

      Using their method, the authors report two important results. First, they confirm that buoyant mass measurement is a valid proxy for cell mass in interphase, an important finding given that SMR measurements have been one of the best and most productive approaches to investigating cell mass growth regulation. Second, they provide evidence that some cell types lose dry mass during metaphase by a mechanism that involves exocytosis, emphasizing how mass, volume, and density dynamics are more complex than during the rest of the cell cycle.

      While this paper presents very interesting results, it would benefit significantly from two main improvements. First, the different physical variables studied here (dry mass, dry density, dry mass density, dry volume) should be better defined, and the terminology revised to provide a more straightforward and intuitive description of their biological meaning. Several sections of the paper (especially the introduction and the discussion of Fig. 2-4) should be re-written to help the reader understand the message. Second, some of the drug treatments require more replicates to provide more conclusive answers.

      Thank you for this constructive feedback. As stated above in the Essential Revisions section, we have now changed our terminology to increase clarity. Our new density measurement in this manuscript (dry mass divided by dry volume) is now defined as ‘dry mass density’. This change has been applied throughout our manuscript, including our manuscript title. In addition, we have added clearer definitions of each term to our Introduction and Measurement Method sections. Furthermore, we have minimized the use of the term ‘dry composition’ throughout our manuscript, as we now realize this may cause confusion to some readers.

      More specifically, our introduction (page 3) now states: “Here, we introduce a new approach for monitoring single cell’s dry mass (i.e. total mass – water mass), dry volume (i.e. total volume – water volume), and density of the dry mass (i.e. dry mass / dry volume), which we will refer to as dry mass density.” These definitions are also repeated in our Measurement Method section (page 4), as many readers may look for the definitions in that section. We have also done many other minor modifications to our main text throughout the manuscript to help the readers understand our message.

      In addition, as detailed above in the Essential Revisions section 3, we have adjusted the writing of our manuscript to avoid overly strong claims where our replicate numbers are insufficient. More specifically, we now avoid conclusions where we claim that inhibition of cytokinesis has no influence on dry mass and dry mass density changes in mitosis.

      Reviewer #3 (Public Review):

      In this manuscript, the authors extend the Manalis lab's vibrating cantilever approach by adding the ability to rapidly exchange media with heavy water. This allows the authors to measure dry mass and its density in growth and proliferating cells. This resolves a previous discrepancy of the cantilever approach and quantitative phase imaging and shows that cells in early mitosis likely increase lysosomal exocytosis. This is an interesting piece of work.

      The authors report that: "On average, the FUCCI L1210 cells lost ~4% of dry mass and increased dry density by ~2.5%, and these changes took place in approximately 15 minutes (Figure 3C). In extreme cases, cells lost ~8% of their dry mass while increasing dry density by ~4%". Although these changes may sound small, I believe they would require significant changes to the cell composition. I.e., to increase the overall dry mass density by 4% while losing 8% of the cell's dry mass, the cell would need to lose almost exclusively low-density components, which may not be typical for exocytosis. Moreover, even if all of those lost 8% of cell dry mass are exclusively lipids (or other low-density components), it is not intuitively obvious that such a loss would be sufficient to cause a 4% change to the dry density. To make this more convincing, the authors should provide a simple mathematical model that would roughly estimate how the cell composition (e.g., the contents of lipids vs proteins) needs to change and what the composition of the lost (secreted) components needs to be to provide the observed changes to the dry mass and density, given the existing information on average cell composition and the densities of different biomolecules (lipids, sugars, proteins, etc).

      Thank you for this comment. The reviewer is correct that significant changes to the cell composition are needed to explain the phenotypes we observe. As stated above in the Essential Revisions section, we fully agree that such calculations could be very useful in interpreting our results. Our manuscript now contains a new paragraph (discussion section, page 13), where we state: “The magnitude of dry mass density increase in mitosis was large. We have previously observed similar magnitude changes in dry mass density when perturbing proliferation in mammalian cell (Feijo Delgado et al., 2013). To provide some rough estimates of what kind of compositional changes would be required to achieve the dry mass loss and dry mass density increase, we carried out a back-of-the-envelope calculations. Assuming a typical mammalian cell composition and typical macromolecule dry mass densities (Alberts, 2008; Feijo Delgado et al., 2013), we calculated the degree of lipid loss needed to increase dry mass density by 2.5%. This suggested that cells would have to secrete ~1/3 of their lipid content in early mitosis. This could be achieved via lysosomal exocytosis of lipids. Lipid droplets, the main lipid storages inside cells, are frequently trafficked into and degraded in lysosomes (Singh et al., 2009), and lipid droplets can also be secreted via lysosomal exocytosis (Minami et al., 2022). However, it seems likely that the mitotic dry mass density increase also involves secretion of other low dry mass density components (e.g. lipoproteins, specific metabolites) and/or a minor, transient increase in high dry mass density components (e.g. RNAs, specific proteins) in early mitosis. Indeed, CDK1 activity has been suggested to drive a transient increase in protein and RNA content in early mitosis (Asfaha et al., 2022; Clemm von Hohenberg et al., 2022; Miettinen et al., 2019; Shuda et al., 2015).”

    1. The illustrations below (pp. 224 ff.) show the course of the reaction time in hysterical individuals. The light cross-hatched columns denote the locations where the test person was unable to react (so-called failures). The first thing that strikes us is the fact that many test persons show a marked prolongation of the reaction time. This would make us think at first of intellectual difficulties, - wrongly, however, as we are often dealing with very intelligent persons of fluent speech. The explanation lies rather in the emotions.

      This makes sense. Some words may have someone relate to a certain incident or time/place that slows their quick responses. They are distracted and taken back to the thought that is associated with that word.