7,186 Matching Annotations
  1. Oct 2023
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      The manuscript investigates the role of PAT1 gene family in Arabidopsis thaliana. Though the PAT1 protein has been previously investigated and displayed immune-related and developmental phenotypes, the other two members of the family, PATH1 and PATH2, have not been well studied. The authors set out to understand the role of these proteins in relation to the role of PAT1. They thus generated single, double, and triple mutants of the possible combinations of PAT1 genes and examined their phenotypes. As the study focused on the developmental effects of PAT1, the mutants were generated on the background of the summ2 mutant to avoid phenotypes related to immune response. The authors notice a developmental difference between the pat1 mutant combinations, suggesting that PAT1 acts differently than PATH1 and PATH2 and that the PATH proteins serve a redundant function. They also performed RNA-seq analysis to identify differentially-regulated genes in the mutant combinations. The study is interesting and well-executed, yet I believe some questions should still be addressed:

      __Our response: __We thank the reviewer for acknowledging the significance of our findings. Please see our detailed answers to the reviewer’s suggestions in the following.

      1. The research mainly focuses on the developmental phenotype of pat mutants but also tests the interaction of PATH proteins with RNA decapping enzymes to check their function and localization during different treatments. I found it a bit confusing since Figure 1 also shows the developmental phenotype of the mutants. I think editing the order of the figures would make the overall story more coherent.

      __Our response: __We agree with the reviewer thus we moved old Fig 1C to new Fig 3A, we believe the new figure orders make the overall story more coherent.

      My main concern is the correlation between the developmental phenotype of the mutants and the gene expression. Leaf samples for RNA extraction were taken when the plants were 6 weeks old, and the developmental phenotype is very evident. It is thus not possible to tell whether the differences in gene expression are a cause or effect of the developmental phenotype. I think performing qPCR of selected candidates at earlier developmental times might help solve this issue, as well as the characterization of younger plants for the developmental phenotypes (such as leaf number).

      __Our response: __We followed the reviewer’s suggestions and performed qRT-PCR on IAA19, IAA29, SAUR23 and PIL2 in pats mutants under different developmental stages (Line 162, 169; Fig S4), we also characterized leaf number of pats mutants from younger stages (Line 109; new Fig 3C).

      Overall, the manuscript is missing data regarding replicate numbers in the IP and confocal microscopy experiments.

      __Our response: __We thank the reviewer for pointing this it out, the replicate numbers are provided now in our new figure legends.

      Minor comments:

      1. Figure 1C - the authors should add a picture of Col0 plants as well as the mutants.

      Our response: To be reader friendly, the picture of Col-0 plant is added in Fig S1A. For the reviewer’s information, plant pictures in FigS1A and old Fig1C (new Fig 3A) were taken at the same time. 2.

      Figure 3 - Calculating the leaf-to-petiole ratio in the different mutants would be good.

      Our response: We now calculate PBR (petiole blade ratio) of all pats mutants in Fig3F (Line 121).

      Figure 4 - the details in the figure are very unclear, especially in the PCA. It would be good to display the data in 2D for PC1 and PC3 and change the colors a bit.

      Our response: We agree with the reviewer; thus, we remade the PCA plot from RNA-seq reads data in a 2D style and also changed the colors for each mutant (Fig 4A). We need to point out that the PCs number also changed because the old PCA plot were made by mistake from expression data.

      Reviewer #1 (Significance (Required)): Both PATH proteins have been less investigated than PAT1, and in that sense, the work is novel. However, it seems that most of the phenotype is attributed to PAT1 rather than the other family members, limiting the interest to the broad plant science community.

      Our response: We appreciate the reviewer think our work is novel. We agree that PAT1 plays the main role during plant development (old Line 171), however the pat triple mutant exhibit the most severe dwarfism as well as the most mis-regulated genes compared to any single or double mutants, indicating all 3 PATs are essential for development.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      Zuo et al., characterize the role of three cytoplasmic mRNA-decay activator proteins PAT1, PATH1 and PATH2 in the context of plant development and leaf morphology in Arabidopsis thaliana and Nicotiana benthamiana. The authors show that the triple pat mutant displays the most severe dwarfism of all combinatorial mutants. Through treatment with different stimulants the authors found that only IAA treatment induces the three homologues to form condensates (possibly PBs), while PAT1 forms condensates upon every tested stimulus. An extensive RNA seq experiment revealed miss-regulation of several hundred genes in the higher order mutants, several of which were involved in auxin responsive and leaf morphology determinant genes.

      __Our response: __We thank the reviewer for the peer review. Please see our detailed answers to the reviewer’s suggestions in the following.

      Major points: 1.Title is not meaningful as is and, in my opinion, does not reflect the main findings in the manuscript.

      Our response: We now changed our title into “PAT mRNA decapping factors are required for proper development in Arabidopsis”.

      The results section could benefit from improved flow between the paragraphs and more reasoning for the next steps taken to help readers understand the aims of the authors.

      Our response: We followed the reviewer’s suggestion and modified the wording in our result part(Line 79,81,94,146-151).

      L46: "So far little is known about the functions of these three PATs in plant development.", The authors themselves have studied these proteins in the context of seed germination and ABA control, as well as apical hook formation and auxin responses. Should at least be mentioned and the results discussed in this context.

      Our response: We thank the reviewer for noticing our other work and we now included this information in the new introduction and discussion part (Line56&237).

      What are the expression levels and patterns of PATH1 and PATH2 compared to PAT1? Is anything known about spatial or temporal regulation of these proteins?

      Our response: All three PATs are expressed in roots, stems, leaves, flowers, siliques, and seeds during the whole developmental stages, PAT1 has higher expression level in leaves but lower expression levels in petals. (Klepikova et al., 2016;

      https://www.arabidopsis.org/servlets/TairObject?id=138009&type=locus for PAT1; https://www.arabidopsis.org/servlets/TairObject?id=38646&type=locus for PATH1 and https://www.arabidopsis.org/servlets/TairObject?id=128694&type=locus for PATH2).

      Figure 1:

      o I do not agree that the authors have shown that "PATH1 and PATH2 are also mRNA decapping factors", rather that these proteins can co-localize (and possibly interact) with LSM1. Decapping assays for example with the known PAT1 de-capping targets from their previous work and their extensive mutant collection could be used to test this.

      Our response: We thank the reviewer for pointing it out and reminding us about the characterized mRNA decapping target from our previous work, we now include the decapping assays in new Fig5 (Line 197).

      From the BiFC experiment (Figure 1B) it looks like PATs are mostly soluble in the cytoplasm (like LSM1) and might be stress-induced components of PBs (like LSM1). Do PATs co-localize with other canonical PB markers that are more prone to condensation, like DCPs or VCS? BiFC could be performed after IAA treatment to confirm that the cytoplasmic foci are indeed LSM1-positive PBs.

      Our response: We agree with the reviewer that PATs behave more like LSM1. Given time limit of the project, we unfortunately are not able to check the colocalization of PATs with DCPs or VCS. However, we performed BIFC after IAA treatment, and the cytoplasmic foci are indeed LSM1-positive foci (new Fig1B).

      A: please provide uncropped images of all Western blots in supplemental data.

      Our response: To be reader friendly, we decide to show the original western blots here (see in the file named "RC-Full-revision"), instead of in supplemental data. However, we will leave the final decision to the editor.

      I applaud the authors for establishing this great higher order mutant collection that will be very useful for researchers in the field. However, I am confused about the description of these mutants. If I understood it correctly, these mutants were already used in a previous study by the authors, namely “Zuo, Z., et al., Molecular Plant-Microbe Interactions, 35(2), 125-130.” & Zuo, Z., et al., (2021). FEBS letters, 595(2), 253-263.” In this study the authors refer to a BioRxiv “Zuo, Z., et al., (2019).” As the reference for these Arabidopsis lines. Is this current manuscript a continuation of the BioRxiv? Please elaborate whether these lines have been used and described In previous studies.

      Our response: We truly appreciate the reviewer for acknowledging the significance of our work. These pats mutants have been used in the FEBS letters paper (2021), MPMI paper (2022), and the new published paper in Life Science Alliance (2023, but preprinted in BioRxiv 2019 and 2022). However, they have not been fully described or characterized in any of the mentioned published stories. Characterization of these pats mutants were originally only included in preprint 2019 which was cited in FEBS letters paper (2021) and MPMI paper (2022).

      L72: Is the strong developmental phenotype of the higher order mutants persistent under long day conditions? Considering the strong developmental phenotypes of the mutants, the flowering transition and morphology could be an interesting trait to study. Why did you choose short day conditions for this study?

      Our response: The pat triple mutant also has strong developmental phenotype under long day condition and exhibits early flowering phenotype. We are currently preparing a manuscript regarding mRNA decay and flowering. We did not “choose” short day condition, we just started with short day condition and observed phenotypical differences hence we kept this condition.

      L78: This statement is hard to see in Figure 1C and best described for Figure 3A.

      Our response: We now change this statement for Fig 3.

      L82: Please include a reasoning for testing PATs localization after hormone treatment. Do you have any indication that other PB proteins behave similar to either PAT1 or the PATHs after hormone treatment to substantiate that these foci observed are indeed PBs? What is known about PBs after hormone treatment in planta?

      Our response: We were interested in investigating if all three PAT proteins may also form PBs in Arabidopsis thus we tested PATs localization with/without hormone treatment (old Line 84, new line 81). For the reviewer’s interest we also observe LSM1 localization after hormone treatment (Fig 2). PBs have been published to respond to light, cold treatment, PAMPs, ABA, ACC and auxin (Line 39-42).

      Figure 2:

      o How does the localization of LSM1 change under the same treatments? Does ist behave like PAT1 or the homologues?

      Our response: Please see our new Fig 2 for LSM1 localization, and it behaves more like PAT1.

      Which part of the root was imaged for this experiment? Is it possible that the observed foci are ARF-condensates as reported by Jing et al., 2022? Do you observe a gradual change in numbers or morphology along the root?

      Our response: We use root elongation zone for this experiment. We don’t know if the foci are ARF-condensates, but it’s possible to study in the future. If the reviewer is interested, we are happy to share our materials. We do observe more foci in the cell division zone and less in the mature zone.

      How did the authors decide on the concentrations for the stimulant treatments? Have you tried different doses, and could the responses be dose-dependent?

      Our response: We did not try different doses; we searched for and applied the commonly used concentrations for different hormones.

      A representative image is not sufficient for quantitative responses, like RNA granule condensation. Please provide a quantification of stimulant-induced foci after the different treatments.

      Our response: Please see the quantification in our new Fig 2.

      L91: Does that mean that most co-precipitated signal comes from the soluble fraction and not PB-localized? Would an RNAse treatment step eliminate the co-precipitation (optional)?

      Our response: We believe it means LSM1 and PATs are in the same complex regardless of PB localization.

      L92/93: Or alternatively that PAT1 localizes to PBs independent of the stress, while PATHs are signal-specific PB components?

      Our response: We think PAT1 aggregates upon broad stimuli/stress, while PATHs respond to specific/limited stimuli, for example, auxin.

      Figure 3:

      o I wonder if these results fit better in conjunction with Figure 1, either as a combined figure or move before Figure 2.

      Our response: We agree with the reviewer thus we moved old Fig 1C into Fig 3.

      It is interesting that path2/pat1, while being dwarfed, is less serrated compared to pat1 or path1/pat1. Can you find any indications in your RNAseq set which genes might be involved?

      Our response: ANAC016 might be involved, but more research needs to be done to confirm it and this is not the focus of the current project.

      Indicate statistical test used to determine p-value

      Our response: We now indicate the statistic test in Materials and Methods part (Line 369).

      L116/L117: Doesn't the result in Figure 3E indicate that PATH1 and PATH2 are not fully redundant, but that PATs have specific and narrow roles in leaf development? L116 goes against your statement in L150 & L160. What is known about the expression patterns of PAT1, PATH1 and PHATH2?

      Our response: We agree and thus modified our statement (Line 137). All three PATs are expressed in roots, stems, leaves, flowers, siliques, and seeds during the whole developmental stages. Please also see our answer to major comment #4.

      L123: PC3 only explains 0.55% of the variance, so differences along this axis will be overinflated. In my interpretation the pat1/path2 mutant is clustering apart from the other higher order mutants, which is also reflected in the leaf phenotypes. A 2D PCA would be sufficient to describe most of the variation.

      Our response: We agree and thus we changed the PCA plot into a 2D style, please also see our response to reviewer 1 minor comment #3.

      Figure 4: o A: The 3D-PCA inflates the differences between higher order mutants along PC3, even though this axis explains only 0.55% of the variance, maybe a 2D-PCA would more intuitively cluster the samples together?

      Our response: Please see our new PCA plot in Fig4A.

      B: Please explain the scale in the figure legend and which genes were included? Only DEGs between triple mutant and summ2-8 or DEGs that were different in at least one higher order mutant?

      Our response: We now explained more details in the figure legends. The genes which were included in Fig4B were DEGs that were differently expressed in at least one of the pat mutants.

      C: several comparisons are missing from the upset-plot. Please show the complete plot, also is there a white box laid over the second bar in the upper graph? It would help the reader, if the results section would explain the plots and the comparisons took. Which differences are the authors interested in?

      Our response: We covered all the comparisons we wanted to show, but we thank the reviewer for suggesting a more detailed explanation and we therefore explain Fig4C more in detail in Line 146. There is no white box over the second bar, it’s only 1 gene mis-regulated specifically by PATH1 (mis-regulated in plants with path1 mutation).

      From Figure 4B, the triple mutant has an almost inverted expression of mis-regulated genes. High expression genes are now lowly expressed and vice-versa. Has this been reported for other RNA decay mutants before?

      Our response: Our RNA-seq data indicate the pat tripe mutant has more than 1000 mis-regulated genes and based on microarray data on 2-week-old lsm1alsm1b plants (Perea-Resa et al, 2012), more than 600 genes are misregulated in lsm1alsm1b mutant.

      How do you explain that mutants in RNA decay have a large group of repressed transcripts and a large group of enriched transcripts? Wouldn't you suspect a general higher expression in RNA decay mutants or which kind of feedback loop would you propose is happening here? Also, since both kinds of expression changes are recorded in your RNA seq can you speculate on the specificity? Why are some genes up- and others downregulated? Would you suspect that transcription factors are under PATs control?

      Our response: We assume that the mRNA decapping machinery target genes should accumulate in mRNA decapping mutants, pat mutants in our case. On the other hand, the down-regulated genes could be target genes of other mRNA degradation pathways such as exosome pathway (Line 257); We agree with the reviewer that the down regulated genes in pat triple could also be negatively regulated by the mRNA decapping targets which could be transcription factor genes. For example, our previous research indicates the transcription factor gene ASL9/LBD3 is mRNA decapping targets under PATs control.

      Where is the sequencing data deposited? This dataset can be of great value for researchers in the field, but the raw data needs to be made commonly available.

      Our response: We thank the reviewer for acknowledging the significance of our work. The raw data has been submitted to NCBI, accession number is PRJNA1006171(Line 307)

      Minor points:

      1. Check order and nomenclature for protein / gene names in Abstract and Introduction

      Our response: We now carefully double check the order and nomenclature for protein / gene names in abstract and introduction (Line 8,11,14,18,19,24)

      L26 / L83 "aggregate" implies non-functionality, I would use "concentrate", "condensate" or "accumulate".

      Our response: We thank the reviewer for pointing it out, we now use “concentrate” (Line 29&80)

      L35, L45 & L54 all state the same. Maybe remove at least one mention to reduce redundancy?

      Our response: We modified these statements hopefully in a satisfactory way. (Line 56)

      L211: Did you use the same imaging settings for all lines?

      Our response: We used the same settings for all the lines and treatment (Line 284)

      L217: RNA quality "control" word missing?

      Our response: The word “control” is added in Line 296

      L477: Authors should cite the newest version of their BioRxiv: Zuo, Z., Roux, M. E., Chevalier, J. R., Dagdas, Y. F., Yamashino, T., H�jgaard, S. D., ... & Petersen, M. (2022). The mRNA decapping machinery targets LBD3/ASL9 to mediate apical hook and lateral root development in Arabidopsis. bioRxiv, 2022-07.

      Our response: The latest version is cited in our new manuscript (Line 42)

      Figure 3B-F, Figure 4C: check spelling on the axis titles.

      Our response: We carefully checked the spelling on the axis titles in our new manuscript.

      Reviewer #2 (Significance (Required)):

      This manuscript represents a continuation of the author's characterization of the 3 PAT1s in Arabidopsis development after Zuo et al., 2021; Zuo et al., 2022a; Zuo et al., 2022b. The mutants and the corresponding RNA sequencing experiments are of value to the community working on RNA regulation and degradation or plant development. While the initial findings are interesting, the authors do not explore the stimulus-induced condensation differences between the homologues or try to link the extreme differences in expression profiles mechanistically or functionally. I think the manuscript could greatly benefit from contextualizing their work within the frame of their previous studies and what is known about PBs in terms of plant development. While the RNA seq is a comprehensive data set, a closer examination and a better representation of the results would help readers to access the findings.

      __Our response: __We thank the reviewer for the constructive criticism. We hope the reviewer is satisfied by our modified manuscript.

      Reviewer expertise: RNA granule biology, Arabidopsis, molecular biology

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Summary:

      In the study "PAT mRNA decapping factors function specifically and redundantly during development in Arabidopsis" authors investigate potential specific functions of Arabidopsis PAT1 orthologs in plant development. Authors observe differences in rosette phenotypes (leaf size, serration and number) of single and multiple mutants of PAT1 gene family, show variation in translocation of the corresponding PAT1 proteins to processing bodies under a set of stress conditions and perform transcriptomics on the established mutants to elucidate the impact of individual PATs on posttranscriptional regulation of plant gene expression. Authors conclude that PAT1 orthologs have both overlapping and specific roles in plant development.

      __Our response: __We thank the reviewer for the peer review. Please see our detailed answers to the reviewer’s suggestions in the following.

      Major comments:

      1. The study contains intersting transcriptomics data that will be of use for the scientific community. However, analysis of the transcriptomics results could be discussed a bit more in depth. Authors could express their opinion about what gene expression changes might be caused by direct degradation via PAT1-dependent decapping mechanism and what changes are more likely to have occurred indirectly via other factors.

      __Our response: __We followed the reviewer’s suggestion and thus we analysed and discussed more in depth about the transcriptomic data (Line145, 220 &232)

      The intersting phenotypic observations are currently poorly linked to the transcriptomics/qPCR data provided, resulting in a somewhat fragmented story flow.

      __Our response: __We appreciate the reviewer thought the pat mutants’ phenotype are interesting, however we disagre with the reviewer on the statement of “poorly linked to the transcriptomics/ qPCR data”. For instance, downregulation of developmental and auxin responsive genes could explain the stunt growth phenotype in the pat triple mutant. Furthermore, the published petiole elongation regulator genes XTR7/XTH15 and PIL2/PIF6 exhibit decreased expression level only in mutants with shorter petioles. Nevertheless, we hope our new data and analysis will satisfy the reviewer.

      The transcriptomics was performed on the 6-weeks old plants. It would be helpful to learn more about authors reasoning for choosing this developmental stage for sampling. Why did authors decide against sampling at the earlier stages, before the observed leaves phenotypes were established?

      __Our response: __The pat mutants growth phenotypes showed bigger difference among each other at the late stage, therefore we performed RNA-seq on these samples. But we agree with the reviewer (also reviewer 1, major comment #2), transcriptomic shift at earlier stage could also be responsible for the observed phenotype, thus we performed qRT-PCR on the pat mutants at earlier stages for certain genes to examine this (Line 162 &169)

      Authors obtained intriguing results on specific translocation of PAT1, PATH1 and PATH2 to processing bodies in the root cells upon various stresses. Perhaps root transcriptomics of single PAT1, PATH1 and PATH2 knockouts under control conditions, treatment that translocate all three proteins to PBs(IAA) and selectively translocate only PAT1 (e.g. cytokinin) could shed more light on the redundancy an specificity of these proteins as the mRNA decapping factors.

      __Our response: __We appreciate the reviewer found our findings interesting. The specific translocation of PAT1, PATH1 and PATH2 to PBs in the root cells upon various stimuli indicates functional specificity and redundancy in cellular level which correlates with mutants’ growth phenotype. However, we agree with the reviewer that root transcriptomic data on pat mutants are very interesting, we are more than willing to share these mutants with peers who want to persue this in more detail.

      Do authors consider PAT1, PATH1 and PATH2 to be localized to different PBs sub-populations? It could be intersting to check co-localization of PAT1, PATH1 and PATH2 under various stress conditions. Could authors elaborate on their view of PBs composition and fate to which different PAT1s are recruited?

      __Our response: __We agree with the reviewer that it’s interesting to check co-localization of PAT1, PATH1 and PATH2. We observed partial localization of CFP-PATH2(in blue) and Venus-PAT1(in yellow) when transiently expressed in Benthmiana. But for permanent lines, we failed at observing separate CFP-PATH2(Blue) signal due to too much signal leakage from Venus-PAT1(Green). Given the fact that PATs function redundantly, we would assume they are partially co-localized in cellular level.

      Could authors speculate what features in the PAT1 protein might cause it being recruited to PBs more efficiently (or better to say, under a broader range of stresses) in comparison to PATH1 and 2?

      __Our response: __The release of ribosome-free mRNPs induces PB formation (Brengues et al., 2005). We suspect PAT1 could bind broader mRNAs compared to PATH1 and PATH2, therefor PAT1-mRNPs could form PBs more efficiently. Moreover, Sachdev et al found yeast PAT1 enhances the condensation of Dhh1 and RNA and PAT1-DHH1 interaction is essential for PB assembly (Sachdev et al., 2019), we assume PAT1 might have better interaction with DHH1 compared to PATH1 and PATH2 thus promote PB formation more efficiently. Please see our discussion part (Line 252)

      Are all three Arabidopsis PAT paralogs co-expressed in the same tissues /developmental stages?

      __Our response: __Please see our response to reviewer 2 major comment #4.

      Could authors elaborate a bit more why the triple pat1 knockout has a much more severe phenotype in comparison to a single pat1 loss-of-function mutant or any of the double pat1 mutants. Do authors observe complementary changes in the PAT1 genes expression in the mutant lines, e.g. is PATH1 expressed at a higher level in the absence of PAT1 and PATH2?

      __Our response: __We now elaborate more about the reason why triple pat1 knockout has the most severe phenotype in the multiple pat mutants (Line 210). We do see higher transcriptional level of PAT1 in path1-4path2-1summ2-8 and also higher transcriptional level of PATH1 in pat1-1path2-1summ2-8 but the same PATH2 transcriptional level in pat1-1path1-4summ2-8 compared to summ2-8 (Fig S1C, Line 104)

      Please provide the name of the used statistical test in all figure legends.

      __Our response: __We now provide the statistical test in “Material and Methods” part (Line 367).

      Minor comments:

      1. Authors might want to reconsider the title as it is somewhat too vague in its current form.

      __Our response: __We now changed our title into “ PAT mRNA decapping factors are required for proper developmental in Arabidopsis

      Line 9: explanation of PAT1 and PATH1 and 2 abbreviations is best placed at the first mentioning of the name.

      __Our response: __We carefully followed the reviewer’s suggestion (Line 10)

      Line 10: mRNA degradation is rather a posttranscriptional regulation of gene expression.

      __Our response: __We agree and changed our statement in the new ms (Line 9).

      Lines 11 and 12: path1 and path2 abbreviation are not explained. Please note that on the Figure 1A the same proteins are labelled as PAT1H1 and PAT1H2

      __Our response: __We thank the reviewer for pointing it out, we now have PATH1 and PATH2 abbreviations explained in Line 10 and also correct the labels in Fig 1A.

      Lines 22-25: Would you be so kind to rephrase or elaborate on what yoPBu mean. LSM1-7/PAT1 complex are known to bind oligoadenylated transcripts indeed and even stabilize their 3' ends, it is not clear what "engage transcripts containing deadenylated tails" means in this context.

      __Our response: __We hope we now rephrase the statement in a clear way (Line 25)

      Line 29: for the sake of clarity, it might be beneficial to list the known activators of the decapping DCP2 enzyme, including the VCS. Generally the introduction could benefit from a bit more in depth review of the decapping mechanism.

      __Our response: __We hope the more detailed introduction will satisfy the reviewer (Line 27).

      Line 51:"other 2 PATs" => "other two PATs". Generally the text is quite well written, but might need a bit of polishing.

      __Our response: __The text is corrected now (Line 64).

      Authors are absolutely correct in their attempt to provide full information about mutant backgrounds. However, for the sake of comprehension, it would be great to grant the double and triple mutants in the summ2 background shorter and more legible names. For example, the pat1-1path1-4path2-1summ2-8 mutant could be named as pat1/h1/h2/s.

      __Our response: __We originally used pat1/h1/h2/s for the triple but a colleague pointed out “h1” or “h2” are not proper gene names and suggested us to rename them. But we agree that the double and triple pat names are comprehensive, to compromise we change the triple pat mutants into pat triple.

      Figure 1B:

      • it would be intersting to have authors opinion on why PBs are formed in this case under non-stress(?) conditions.

      __Our response: __Forming PBs is a dynamic process, and we assume that even under normal conditions, there is still ongoing mRNA decay and translational repression which should be seen as some background level of PBs (Line 85).

      Please note that expressing only the N-terminal part of CFP is a weak negative control for BiFC. No restoration of CFP can occur in such case and thus it is a given that no fluorescence can be observed in these samples. For example, co-expression of nCFP-PAT1 with cCFP-GUS, would be a more rigorous negative control, better aligned with the coIP experiments.

      __Our response: __We had nCFP-Gus with cCFP-LSM1 as real negative control in old Fig 1B (bottom lane). We also agree with the reviewer that only the N-terminal part of CFP is a weak negative control for BiFC, therefore we removed the weak control and only left the rigorous negative control (new Fig 1B).

      Please note that some arrows point at a structure that seems to be not discernible a signal.

      __Our response: __It’s due to the poor quality of the picture from the PDF file, arrows in the original high-resolution figure do point at discernible foci.

      Figure 1C: It might be helpful to also include a Col-0 WT plant

      __Our response: __Col-WT plant is now included in Fig S1A.

      It is not clear how qPCR data and complementation lines help to characterize the established PATH1 and PATH2 loss-of-function mutants. There is no immunodetection of the corresponding proteins in the knockouts, qPCR shows no dramatic decrease in the transcript level of PATH1 and H2 and the phenotypes of complemented lines presented in the Fig S1E at a glance look quite similar to the phenotypes of the corresponding knockout mutants. Complementation lines are not used for any other experiments in this study and it is not clear why authors decided to include this material into the article.

      __Our response: __To characterize the path1 and path2 mutants, we first did qRT-PCR to check the transcriptional level expression, but like the reviewer mentioned, there was no dramatic decrease indicating the mutations of path1-4 and path2-1 did not change PATH1 and PATH2 transcriptional level expression. We also tried to raise antibodies against PATH1 and PATH2, however the antibodies failed to recognize any PAT proteins. Therefore, we used the complementation lines to characterize the mutations in PATH1 and PATH2. Since path1 and path2 single mutants don’t have obvious growth phenotype and the dwarf pat triple is barely possible to transform, we had to complement the pat1path1 and pat1path2 double mutants. If the reviewer takes a closer look, the growth phenotype of the complementation lines Venus-PATH1/ pat1-1path1-4summ2-8 and Venus-PATH2/ pat1-1path2-1summ2-8 are similar to pat1-1summ2-8 but not the background pat double mutants. The complementation lines were also used to study PATH1 and PATH2 cellular localization.

      Figure S1C misses labels indicating what detection of what gene is shown on what chart.

      __Our response: __We thank the reviewer for pointing it out, the gene names are indicated now in new FigS1C.

      Experiments to visualize PBs under various stress stimuli were conducted on roots for the Figure 2 while coIP was performed on the green tissue. Could authors elaborate on whether PB formation could be expected to be the same in different plant organs? Somewhat related to the same topic, Figure 2 contains micrographs obtained on meristematic, transition and elongation root zones, in which epidermal cells are present at various developmental stages. Since PAT proteins are suggested to impact plant development, it might be prudent to obtain observations for all samples at the same developmental stage. Could authors provide their opinion about how representative the provided micrographs are for all root zones? Furthermore, Venus-PATH2 under ACC treatment shows punctate localization only in a single cell out of the three-ish cells visible on the micrograph, potentially indicating differences in PAT2 recruitment to PBs in trichoblasts and atrichoblasts. This in itself could be an intersting observation helpful for elucidating the specific roles of PAT1 orthologs.

      __Our response: __CoIP results from Benthamiana leaves indicate Arabidopsis PATs and LSM1 are in the same complex, and PB visualization in root area suggests PATs respond to different hormone treatments. flg22 treatment has been published to induce PB formation in Arabidopsis root but dissemble PBs in Arabidopsis protoplasts, indicating a tissue specific manner of PB formation. We randomly chose 1 picture/treatment from 9 (3 plants * bio-triplicates) which showed the same. However, we thank the reviewer for pointing out the confocal pictures we chose were not all from elongation zone, we now carefully checked all our confocal pictures and made sure they are from the same developmental stages. We also discuss more of PATH2 localization in response to ACC (Line 251).

      Figure 4C would greatly benefit from a more detailed description in the main text and figure legend of what authors show/conclude.

      __Our response: __We thank the reviewer for the suggestion hence we describe Fig 4C in more detail in our new manuscript (Line 146).

      Figure 5, please avoid using the same color for the bars for the triple pat knockout and the control summ2-8 line

      __Our response: __We changed the colour scheme for all the mutants (new Fig 4E).

      Figure 5B legend should include the name of the statistical test.

      __Our response: __We now include the name of the statistical test in “Material and Methods” (Line 367).

      Figure S2: The coIP experiment is a bit difficult to interpret due to the extremely low protein quantities in some of the input samples. Perhaps a repetition with more balanced input quantities would be beneficial. The figure legend does not contain information on how normalized intensity values were obtained.

      __Our response: __We used the same amount of total protein for each sample (3mg) for each IP, PATH1 and PATH2 don’t express as high as PAT1. The numbers indicate the comparative ratio between PAT-HA protein signal and LSM1-GFP signal, and PAT1-HA/LSM1-GFP under non-treatment condition is normalized as 1.

      Line 130: Fig S2 is referenced but Fig S3 is meant

      __Our response: __We thank the reviewer for pointing out our mistake, the correct figure is now referenced.

      Reviewer #3 (Significance (Required)):

      Strength:

      Regulation of gene expression by mRNA decay is an extremely intersting topic and is highly relevant in plant stress and developmental biology. This study provides a more in depth view on the potential specific roles of the three PAT1 orthologs in Arabidopsis plants. Authors established loss-of-function mutants of the corresponding genes and performed transcriptomics analysis that will be a valuable source for future studies. Furthermore, microscopy analysis of PATH1 and PATH2 translocation to PBs indicates their potential specific roles in plant stress response.

      Weakness: The current version of this study suffers from vague presentation of the results. Starting from the title and ending with discussion authors provide a "general" view on their results and do not go into detailed interpretations. Thus, no mechanistic insight has been derived or at least suggested from the wealth of the transcriptomics, phenotypic and microscopy data.

      The introduction should provide more detailed information on what is known on the PAT1 role in the mRNA decapping pathway and its relevance for plant stress response and development.

      Please note, that the above mentioned suggestion of different sampling for transcriptomics analysis is not meant as a request for this particular study, but rather as an illustration of an expectation a reader would built while following the current version of the text. A thorough description of the strategy for transcriptomics and a more in depth analysis might significantly strengthen the study's coherence and impact.

      Advance:

      At this stage, the study looks more like an incremental advance of the work from the same laboratory performed for the single PAT1 protein. However, as mentioned in the comments above, the study might be made significantly stronger by elaborating the results analysis and highlighting potential discoveries.

      Audience:

      The topic of this study is of a significant interest to a broad audience performing research in plant stress biology and also developmental plant biology.

      __Our response: __We thank the reviewer for acknowledging the significance of our work and the structural criticism. We hope our detailed answers to the reviewer’s suggestions and the additional data we included in the manuscript will satisfy the reviewer.

      Reviewer's and co-reviewer's fields of expertise:

      Molecular Biology, Plant cell biology, Plants Stress response, Autophagy, Stress granules

      __Reviewer #4 (Evidence, reproducibility and clarity (Required)): __

      PAT1 (Protein Associated with Topoisomerase II) are RNA-binding proteins involved in the control of mRNA decay in the cytoplasm. Plants possess multiple PAT1 family members, three in Arabidopsis, PAT1, PATH1 and PATH2. According to the literature, the pat1 mutant shows dwarfism and de-repressed immunity. In this paper, Zou et al. describe the function of PATH1 and PATH2. Two pieces of evidence are consistent with their role in the control of mRNA decay. First, Co-IP and bimolecular Fluorescence Complementation assays in tobacco indicate physical interaction and co-localization of PAT1, PATH1 or PATH2 with LSM1 (Fig. 1), which is a protein present in decapping complexes that form the cytoplasmic foci involved in mRNA decay. Second, PAT1, PATH1 and PATH2 are present in these cytoplasmic Processing Bodies (Fig. 2). Zou et al. generated path1 and path2 mutants, double mutants with pat1 and the triple mutant using independent alleles and the summ2 background to avoid autoimmunity interference. The mutants show leaf growth (Fig. 3) and gene expression (Fig. 4) phenotypes that are not exactly similar among the different family members, but there is significant redundancy revealed by these phenotypes.

      __Our response: __We thank the reviewer for the peer review. Please see our detailed answers to the reviewer’s suggestions in the following.

      1. The conclusions are straight forward and, apparently, well supported by the data. However, the authors should confirm that when they provide the number of replicates (n) in the legends to the figures, this actually refers to the number of biological replicates. The statements should be based on true biological replicates (not technical replicates). The statistical tests should also be explicitly indicated (including that used to identify DEG in the RNAseq experiment).

      __Our response: __We carefully went through our figures and made sure the number of replicates (n) were correctly stated in figure legends and the statistical tests were indicated (Line 367)

      Reviewer #4 (Significance (Required)):

      The results are useful but mainly descriptive. Personally, I am interested in the mechanisms involved in the control of growth and the manuscript does not mechanistically link the action of PAT1, PATH1 and PATH2 to the transcriptome and the latter to the growth patterns.

      __Our response: __We thank the reviewer for acknowledging the significance of our work of characterizing PATs and we hope our new data could satisfy the reviewer in regarding to “mechanistical link”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) In general given several of the "equivalence groups" were distinguished from each other in Packer et al's annotation, can the authors comment more on why they aren't able to distinguish them? Are the markers listed for those cell states in Packer not expressed appropriately in these data? Or are they expressed but the states are not different enough to form discrete clusters? I suggest the possibility that the analysis choices of 20 "initial dimensions" or 1000 most variable genes filtered out some of these differences which may be encoded in later principle components, or that the use of t-SNE projection is not sufficient to resolve these distinct states.

      2) I was a bit confused by the spatial gene expression analysis. Several distinct ideas appear to be posed in the text. These ideas aren't really supported by any quantitative analysis, just the visual patterns in Figure 4B/C which I'm not sure I always agree with.

      For example, ceh-43 expression is mentioned as having "physically proximate" expression. But it is well established that different lineages form specific spatial territories (e.g. Schnabel et al 1997). Thus it seems logical that genes with specific lineage patterns will have specific spatial patterns as well. If the claim is that the observed patterns are more clustered along the A-P axis than expected by chance given their lineal complexity then I'm not sure this is shown. Maybe some comparison with control lineage patterns of similar complexity of non-TFs or non-HD TFs could get whether these genes specifically are more spatially patterned? Visually it looks to me like some patterns are more like "blobs" or even lateral or D-V specific patterns than they are like "stripes."

      In addition there is a long history in the literature discussing the origin of position-specific patterns in C. elegans - most I'm aware of support the idea that positional information arises primarily from intrinsic lineage mechanisms (e.g. Cowing and Kenyon 1996). Perhaps the authors are making this same argument here, but if so this isn't clear from the text.

      Or maybe the authors are trying to make the argument that combinations of TFs encode more precise position than individual TFs? This seems more likely to me from the images presented still not well-supported without quantitative or statistical analyses.

      3) The comparison with Drosophila is interesting but also under-developed. I think all I would feel comfortable claiming from the data as shown is that genes that are spatially patterned in early fly development are also usually patterned in the C. elegans lineage. But to even say this is an enrichment over expectation would require more analysis.

      Minor comments:

      Methods: some statement about temperature control during cell isolation would be useful. In other words were embryos continuing to develop or put at low temperature such as in a cold room to prevent temporal differences between the first and last cells collected from a given embryo?

      Current links to data at GEO are incorrect and link to Levin et al 2016 instead. I was not able to access the raw single cell data, just the processed data in Table S6.

      The standardization of expression in embryos isn't well explained - would be good to expand a little on the types of batch effects being addressed and how this approach was chosen or a relevant citation.

      Page 2: Including P0 and cell deaths there are 1,341 branches in the hermaphrodite lineage (2n-1 for 671 terminal cells including deaths).

      -"as their each have" (grammar error)

      -"very large nuclear hormone receptor domain" (add "family")

      Page 3: As noted Packer et al largely missed cells prior to the 50-cell stage as described - but the reason for this is likely that the use of 10 micron filters or centrifugation to remove undissociated embryos also removes early stage cells.

      -"few new expressions occur" (grammar). Also, in both Tintori and Hashimshony datasets there well over 1000 newly expressed genes detectable (see for example Sivaramakrishnan et al 2021 biorxiv).

      Figure S1 would be easier to interpret with a legend explaining what fates are represented by each color

      Some genes listed as markers in Figure S2 are not included in the marker table such as flh-3, oma-2, sma-9.

      "New markers were required" - this is plural but only F19F10.1 is mentioned. Were other markers examined this way or should it be singular?

      In Figure S2 the lower ("robustness") plots are nice but could be explained more clearly. What is the nature of the "cell similarity score"? How many (if any) cells were excluded due to not being most similar to their own cluster?

      "transcriptomically very similar shortly after division" - can the authors comment on any information they have about how long after division the cells were collected?

      GFP reporter lineaging - the methods are minimally described (what brand of microscope, which strains/transgene/CRISPR configurations etc). And data are not presented. If these embryos are all incorporated into Ma et al 2021, that is fine, but should be clearly cited. Otherwise it is important in my view to include some way to access the quantitative values from the lineaging and understand these details.

      "as illustrated for ceh-43, dmd-4 and unc-30" - were there other examples as suggested from this wording? I'd also note that similar fluorescent reporter imaging data have been published previously for all three genes listed (Walton et al 2015 for UNC-30, Ma et al 2021 for DMD-4 and CEH-43 protein reporters, Murray et al 2012 for dmd-4 and ceh-43 promoter reporters).

      Zacharias and Murray are cited as promoting "continuous symmetry breaking" but actually that review argued for a "non-monophyletic" architecture similar to that supported by the data .

      The text and figure don't always agree. For example mec-3 expression is listed in the text as part of one of the stripes, but mec-3 is not labeled on the figures.

      The stage of each embryo in figure 4B/C should be explicitly labeled (and maybe also given specific figure panel designations to clarify what statements in the text correspond to which figures).

      In the discussion it is unclear what the numbers "97 to 104" refer to

      The scRNA-seq reads were mapped to a relatively old genome build and annotation set (WS230) - thus current users may find discrepancies with current gene names in WormBase. Also, since the CEL-seq data are 3' biased, it is worth noting that Packer et al found that a substantial number of genes (~1000) in a slightly later annotation set (WS260) were undercounted (sometimes dramatically) with the similarly biased 10x data due to incomplete 3'UTR annotations. While I would be reluctant to ask for a requantification for the purposes of the manuscript given the challenges of repeating the various analyses, it is worth explicitly mentioning whether this was dealt with.

      Reviewer #2 (Recommendations For The Authors):

      The writing was otherwise good, at least to my eye, and the data was presented very well and made freely available to other researchers. I am not as well-versed in the statistical methods and will leave comments on these to a better-equipped reviewer(s).

      Fig. 1 legend 'P' should be P4 (subscript 4).

      p. 9 'ceh-51' should be italicized. Only one factor seems to have been confirmed by smFISH, F19E10.1. There are available reporters, did they show a similar pattern? From CGC website: RW12347 F19F10.1(st12347[F19F10.1::TY1::EGFP::3xFLAG]) V endogenous tagged reporter; RW11620 unc-119(tm4063) III; stIs11620 [F19F10.1::H1-wCherry + unc-119(+)] array reporter.

      Reviewer #3 (Recommendations For The Authors):

      Typo: on page 11, where it says nanog it should read nanos.

      Reviewer #4 (Recommendations For The Authors):

      I found some sentences and paragraphs to be a bit unclear. There are no page or line numbers in the manuscript, so I point in the general direction, and hope the authors find what I am referring to.

      • 2nd paragraph of the Introduction - "their" should be "they", but the sentence as a whole is not clear.

      • 3rd para. of the Intro. - The last sentence of this paragraph doesn't make sense. Please rephrase and/or break up into shorter sentences.

      • 1st Para. of Results - "the maternal deposit" is not clear. Perhaps "maternally deposited transcripts" or something similar.

      • 1st Para. after Figure 3. The last sentence "Thus, continuous symmetry breaking..." is unclear. What is "continuous symmetry breaking"? Please define and expand.

      • Fig. 4 - the genes seem to be listed from posterior to anterior. The common way of presenting Hox gene lists and other regionally expressed genes is from anterior to posterior.

      • For the benefit of the non-C. elegans crowd, please give names of Drosophila homologs where relevant (e.g., when comparing to Drosophila expression patterns)

      In a few places there are citations of popular science books or general textbooks (e.g., Carrol et al., 2004; Wolpert et al., 2019) . I think it would be better to cite review papers from the scientific literature or relevant primary papers.

      I am very happy to submit the revised manuscript. We were very happy to have received reports from four reviewers!

      We have decided not to prepare a separate response to the public comments of the reviewers, as we did not undertake any further major revisions.

      We did address most of the minor editorial suggestions.

    1. Reviewer #1 (Public Review):

      The overall tone of the rebuttal and lack of responses on several questions was surprising. Clearly, the authors took umbrage at the phrase 'no smoking gun' and provided a lengthy repetition of the fair argument about 'ticking boxes' on the classic list of criteria. They also make repeated historical references that descriptions of neurotransmitters include many papers, typically over decades, e.g. in the case of ACh and its discovery by Sir Henry Dale. While I empathize with the authors' apparent frustration (I quote: '...accept the reality that Rome was not built in a single day and that no transmitter was proven by a one single paper') I am a bit surprised at the complete brushing away of the argument, and in fact the discussion. In the original paper, the notion of a receptor was mentioned only in a single sentence and all three reviewers brought up this rather obvious question. The historical comparisons are difficult: Of course many papers contribute to the identification of a neurotransmitter, but there is a much higher burden of proof in 2023 compared to the work by Otto Loewi and Sir Henry Dale: most, if not all, currently accepted neurotransmitter have a clear biological function at the level of the brain and animal behavior or function - and were in fact first proposed to exist based on a functional biological experiment (e.g. Loewi's heart rate change). This, and the isolation of the chemical that does the job, were clear, unquestionable 'smoking guns' a hundred years ago. Fast forward 2023: Creatine has been carefully studied by the authors to tick many of the boxes for neurotransmitters, but there is no clear role for its function in an animal. The authors show convincing effects upon K+ stimulation and electrophysiological recordings that show altered neuronal activity using the slc6a8 and agat mutants as well as Cr application - but, as has been pointed out by other reviewers, these effects are not a clear-cut demonstration of a chemical transmitter function, however many boxes are ticked. The identification of a role of a neurotransmitter for brain function and animal behavior has reasonably more advanced possibilities in 2023 than a hundred years ago - and e.g. a discussion of approaches for possible receptor candidates should be possible.

      Again, I reviewed this positively and agree that a lot of cumulative data are great to be put out there and allow the discovery to be more broadly discussed and tested. But I have to note, that the authors simply respond with the 'Rome was not built in a single day' statement to my suggestions on at least 'have some lead' how to approach the question of a receptor e.g. through agonists or antagonists (while clearly stating 'I do not think the publication of this manuscript should not be made dependent' on this). Similarly, in response to reviewer 2's concerns about a missing receptor, the authors' only (may I say snarky) response is ' We have deleted this sentence, though what could mediate postsynaptic responses other than receptors?' The bullet point by reviewer 3 ' • No candidate receptor for creatine has been identified postsynaptically.' is the one point by that reviewer that is simply ignored by the authors completely. Finally, I note that my reivew question on the K stimulation issues (e.g. 35 neurons that simply did not respond at all) was: ' Response: To avoid the disadvantage of K stimulation, we also performed optogenetic experiments recently and obtained encouraging preliminary results.' No details, not data - no response really.

      In sum, I find this all a bit strange and the rebuttal surprising - all three reviewers were supportive and have carefully listed points of discussion that I found all valid and thoughtful. In response, the authors selectively responded scientifically to some experimental questions, but otherwise simply rather non-scientifically dismissed questions with 'Rome was not built in a day'-type answers, or less. I my view, the authors have disregarded the review process and the effort of three supportive reviewers, which should be part of the permanent record of this paper.

    2. Reviewer #3 (Public Review):

      SUMMARY:

      The manuscript by Bian et al. promotes the idea that creatine is a new neurotransmitter. The authors conduct an impressive combination of mass spectrometry (Fig. 1), genetics (Figs. 2, 3, 6), biochemistry (Figs. 2, 3, 8), immunostaining (Fig. 4), electrophysiology (Figs. 5, 6, 7), and EM (Fig. 8) in order to offer support for the hypothesis that creatine is a CNS neurotransmitter.

      STRENGTHS:

      There are many strengths to this study.<br /> • The combinatorial approach is a strength. There is no shortage of data in this study.<br /> • The careful consideration of specific criteria that creatine would need to meet in order to be considered a neurotransmitter is a strength.<br /> • The comparison studies that the authors have done in parallel with classical neurotransmitters is helpful.<br /> • Demonstration that creatine has inhibitory effects is another strength.<br /> • The new genetic mutations for Slc6a8 and AGAT are strengths and potentially incredibly helpful for downstream work.

      WEAKNESSES:<br /> • Some data are indirect. Even though Slc6a8 and AGAT are helpful sentinels for the presence of creatine, they are not creatine themselves. Of note, these molecules themselves are not essential for making the case that creatine is a neurotransmitter.<br /> • Regarding Slc6a8, it seems to work only as a reuptake transporter - not as a transporter into SVs. Therefore, we do not know what the transporter into the TVs is.<br /> • Puzzlingly, Slc6a8 and AGAT are in different cells, setting up the complicated model that creatine is created in one cell type and then processed as a neurotransmitter in another. This matter will likely need to be resolved in future studies.<br /> • No candidate receptor for creatine has been identified postsynaptically. This will likely need to be resolved in future studies.<br /> • Because no candidate receptor has been identified, it is important to fully consider other possibilities for roles of creatine that would explain these observations other than it being a neurotransmitter? There is some attention to this in the Discussion.

      There are several criteria that define a neurotransmitter. The authors nicely delineated many criteria in their discussion, but it is worth it for readers to do the same with their own understanding of the data.

      By this reviewer's understanding (and combining some textbook definitions together) a neurotransmitter: 1) must be present within the presynaptic neuron and stored in vesicles; 2) must be released by depolarization of the presynaptic terminal; 3) must require Ca2+ influx upon depolarization prior to release; 4) must bind specific receptors present on the postsynaptic cell; 5) exogenous transmitter can mimic presynaptic release; 6) there exists a mechanism of removal of the neurotransmitter from the synaptic cleft.

      For a paper to claim that the published work has identified a new neurotransmitter, several of these criteria would be met - and the paper would acknowledge in the discussion which ones have not been met. For this particular paper, this reviewer finds that condition 1 is clearly met.

      Conditions 2 and 3 seem to be met by electrophysiology, but there are caveats here. High KCl stimulation is a blunt instrument that will depolarize absolutely everything in the prep all at once and could result in any number of non-specific biological reactions as a result of K+ rushing into all neurons in the prep. Moreover, the results in 0 Ca2+ are puzzling. For creatine (and for the other neurotransmitters), why is there such a massive uptick in release, even when the extracellular saline is devoid of calcium?

      Condition 4 is not discussed in detail at all. In the discussion, the authors elide the criterion of receptors specified by Purves by inferring that the existence of postsynaptic responses implies the existence of receptors. True, but does it specifically imply the existence of creatinergic receptors? This reviewer does not think that is necessarily the case. The authors should be appropriately circumspect and consider other modes of inhibition that are induced by activation or potentiation of other receptors (e.g., GABAergic or glycinergic).

      Condition 5 may be met, because authors applied exogenous creatine and observed inhibition. However, this is tough to know without understanding the effects of endogenous release of creatine. if they were to test if the absence of creatine caused excess excitation (at putative creatinergic synapses), then that would be supportive of the same. Nicely, Ghirardini et al., 2023 study cited by the reviewers does provide support for this exact notion in pyramidal neurons.

      For condition 6, the authors made a great effort with Slc6a8. This is a very tough criterion to understand or prove for many synapses and neurotransmitters.

      In terms of fundamental neuroscience, the story should be impactful. There are certainly more neurotransmitters out there than currently identified and by textbook criteria, creatine seems to be one of them taking all of the data in this study and others into account.

    1. Why do you think social media platforms allow bots to operate?

      Bots could be helpful to today’s life. Automation is useful to use. For example, we use bots to block spam, archive out dated threads. Bots make the platform programable, which extends the possibility of the platform. With bots, platform may have more functionality than it designed. Platform get benefits from content on it and the user traffic. Bots help both improve the quality of content, and may also attract more user traffic. And so it benefits the platform. It is hard to blocking bots. Introducing more captcha could be a bad idea to stopping bots as it also harm experience of real people. And as we discussed before, attacker may still use more complex technology or even a real human (as discussed in 3.1) to bypass the restriction. So, disallowing all bots won’t help much if attackers may get benefits from their actions. But it also blocks friendly bots too.

    2. Why do you think social media platforms allow bots to operate?

      Bots could be helpful to today’s life. Automation is useful to use. For example, we use bots to block spam, archive out dated threads. Bots make the platform programable, which extends the possibility of the platform. With bots, platform may have more functionality than it designed. Platform get benefits from content on it and the user traffic. Bots help both improve the quality of content, and may also attract more user traffic. And so it benefits the platform. It is hard to blocking bots. Introducing more captcha could be a bad idea to stopping bots as it also harm experience of real people. And as we discussed before, attacker may still use more complex technology or even a real human (as discussed in 3.1) to bypass the restriction. So, disallowing all bots won’t help much if attackers may get benefits from their actions. But it also blocks friendly bots too.

    1. The best collaborative practices of the past ten years address this contradictory pull between autonomy and social intervention, and reflect on this antinomy both in the structure of the work and in the conditions of its reception. It is to this art—however uncomfortable, exploitative, or confusing it may first appear—that we must turn for an alternative to the well-intentioned homilies that today pass for critical discourse on social collaboration. These homilies unwittingly push us toward a Platonic regime in which art is valued for its truthfulness and educational efficacy rather than for inviting us—as Dogville did—to confront darker, more painfully complicated considerations of our predicament.

      SP5: The criteria of socially engaged art sees the self-sacrifice of the artist as successful. Through the self-sacrifice, the artists are expected to renunciate control of the aesthetic and focus merely on the social praxis of the work. However, according to Jacques Rancière, the system of art is based on a confusion between art’s autonomy and heteronomy, and the authorial presence is integral to the autonomy. The authorial aesthetic plays a crucial role to think of the contradiction between autonomy and social change and doesn’t need to be sacrificed for social change as it contains the promise of amelioration. In reference to Lars von Trier’s film, Dogville, Claire Bishop addresses a terrifying implication of self-sacrifice. The good intention of artist is not a reason to avoid critical analysis. A good socially collaborative art project should be able to address the contradiction between autonomy and social intervention, and reflect it through authorial aesthetics and the participants, more importantly, it should lead us to the serious thinking of our issues and predicaments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study largely focuses on the use of a 293 cell line that lacks a functional Dicer gene originally identified by the Cullen group. Baldaccini use this cell line, referred to as NoDice cells, to reconstitute various Dicer isoforms that have thus far been described in a variety of settings (e.g., stem cells and oocytes). Collectively, these data demonstrate the capacity of certain N-terminal truncations of Dicer to inhibit Sindbis virus and reduce the presence of viral dsRNA, supporting some of the observations made thus far concerning an antiviral role for mammalian Dicer. For other viruses, this impact was significantly more modest (SFV reduction is less than a log) or was not observed at all (VSV and SARS-CoV-2). The authors then go on to characterize the nature of the observed antiviral activity and ultimately implicate PKR and the induction of NF-kB in priming the cell's antiviral defenses. Importantly, the group also found that this antiviral activity neither required the nuclease activity of Dicer nor the kinase activity of PKR - providing evidence against antiviral RNAi in mammals. In all, the data would seem to suggest that Dicer can act as a dsRNA sensor and can mediate the activation of an NF-kB response - akin to what is observed in response to NOD or some TLR engagement. In all, it is the opinion of this reviewer that this work brings additional clarity to a concept that remains controversial in the field and therefore embodies something meaningful for the community.

      With that said, there are a few issues that require additional attention. The first of these is textual. The introduction of the paper accurately describes the evidence in support of mammalian RNAi but does not invest the same time in discussing the data to the contrary. For example, Seo et al demonstrated that virus infection results in poly-adp-ribosylation of RISC preventing RNAi activity (PMID: 24075860), Uhl et al showed that IFN-induced ADAR1 resolves dsRNA in the cell and prevents RNAi (PMID: 37017521), and Tsai et al showed that virus-derived small RNAs are not loaded into the RISC in a manner that would enable antiviral activity (PMID 29903832). None of this work is referenced in this manuscript and it generates an unbalanced introduction as it relates to the controversy surrounding the idea of RNAi in mammals.

      Reply: We thank the reviewer for their positive comments and suggestions. In the revised version of this manuscript, we will rewrite the introduction to take into account the published data that are not in favor of an antiviral role of RNAi in mammals and we will add the suggested references

      The second issue that would further strengthen this paper relates to the fact that the authors spend a considerable amount of time discussing the data of Figure 6 and 7 as conditions that are defined by a Dicer that can not be processive in its nuclease activity (WT) vs. one that can (N1). However, there seems to be little consideration about the fact that the introduction of WT Dicer into these cells also restores miRNA biology whereas N1 appears to remain only partially functional (based on the data of Fig 3E). Given this, it seems the authors should verify that the high baseline of NFkB signaling that is being observed when comparing WT to N1 is not a product of restored miRNA function in WT cells, in contrast to the hypotheses outlined in the manuscript. This could be addressed by silencing Drosha or DGCR8 in the Dicer knockout cells prior to their reconstitution of Dicer. In the opinion of this reviewer, this experimental control would significantly strengthen the conclusions the authors are making here.

      Reply: This would indeed be an ideal experiment to rule out the contribution of miRNAs in the observed phenotype. We believe however that this particular experiment would prove difficult to realize given that we reconstitute Dicer expression by lentiviral transduction and keep the cells under selection for a couple of weeks before using them for further experiments. This time frame is therefore not compatible with the use of siRNA to knock-down Drosha or DGCR8. Alternatively, we could knock them out by CRISPR-Cas9, but this would take too long and is not feasible in the frame of this work.

      We can however address the concern regarding the role played by miRNAs in the observed phenotype of the Dicer N1 cells. Indeed, we can determine the miRNA profile from our small RNA sequencing data and compare them between the Dicer WT and Dicer N1 cells. We have done this comparison and could not find striking differences in miRNA expression between the two cell lines. We will add this additional piece of evidence in our revised manuscript.

      Reviewer #1 (Significance (Required)):

      In the manuscript entitled, "Canonical and non-canonical contributions of human Dicer helicase domain in antiviral defense" Baldaccini et al. describe their findings concerning the ability of certain N-terminal deletion variants of Dicer in contributing to mammalian antiviral activity. The concept of a functional antiviral RNAi system in mammals is a contentious one with many publications including data both in support of its existence and opposing this idea. In this manuscript, Baldaccini et al. perform a wide range of well-controlled experiments to specifically address aspects of those reports to both provide clarity in what has been documented thus far and to expand on those concepts further.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      Whether RNAi is used as an antiviral mechanism in mammals has been a hotly debated issue. The research team previously published several papers on the roles of Dicer in siRNA/miRNA biogenesis and in antiviral responses. They have recently reported that the helicase domain of human Dicer specifically interacts with several proteins that are involved in the IFN response, including PKR. In this study, Baldaccini et al. investigated the involvement of Dicer in antiviral response using various mutants of human Dicer. They showed that deletion mutants of helicase domain exhibit antiviral activity that requires the presence of PKR. They further demonstrated that one of the mutants, N1-Dicer showed antiviral activity in an RNAi-independent manner but depending on the presence of either native PKR or kinase deficient mutants. Transcriptomic analysis revealed that numerous genes involved the IFN and inflammatory response were upregulated in the cells that express N1-Dicer, which is likely due to an increased activation of the NFκB pathway. Based on these findings, the authors propose that Dicer may act as antiviral molecule using its helicase domain, which representing a novel non-canonical function of Dicer.

      Major comments:

      1.The results from experiments with SARS-CoV2 are intriguing (Fig.2). The authors speculated that NFkb activation is in favor of the replication of this virus. It would be interesting to see the infection and replication of SARS-CoV2 in PKR deficient cells and cells expressing PKR mutants (as described in Fig.5). The results may prove/disapprove the authors' speculation and yield additional findings.

      Reply: We thank the reviewer for this suggestion. We have cells that are double knock-out for Dicer and PKR (NoDice/∆PKR) that were transduced to stably express Dicer WT or Dicer N1 and further transduced to express ACE2. We will infect those cell lines with SARS-CoV-2, which will allow us to see whether the difference in viral accumulation can still be observed in the absence of PKR. However, it might prove more difficult to reconstitute PKR expression (WT or mutants) in these cells since they are already transduced twice with two different constructs (Dicer and ACE2).

      Western blot analysis. In the method section, it is stated that proteins were quantified with Bradford method and equal loading was verified by Ponceau S staining. The members of also probed with gamma-tubulin (It was stated that antibodies against alpha-tubulin was used in the method section) as a loading control, however, the bend intensity of tubulin shows great variations among different lanes in several figures while Ponceau S staining is similar (Fig.s, 4, 5, and 8). The differences compromise the accuracy of the results.

      Reply: We apologize for the difference in Tubulin signal in some of our western blots. There are several possibilities to explain those inconsistencies between Ponceau staining and Tubulin blotting, including an effect of viral infection on Tubulin expression. To remove ambiguities around this issue, we could quantify the signal across several blot replicates and provide the quantification after normalization. In addition, we would like to stress that regarding quantification of the infection, we think that the plaque assay experiments are more reliable than quantification of western blot signals.

      3.RNA-seq analysis revealed that Dicer N1 cells have significantly increased expression levels of signaling molecules in type I IFN response even in uninfected cells. While this provides a potential explanation for the antiviral phenotype of N1-Dicer cells. I wonder why the expression levels of type I IFNs (probably the most potent antiviral molecules) were not analyzed in WT and Dicer N1 cells. Measurement of the levels of IFNα and IFNβ by ELISA in the cells before and after infection could provide the important and direct data to support their conclusion.

      Reply: This an interesting suggestion but unfortunately, we do not believe that it would possible to quantify IFNα and IFNβ by ELISA in the cell line that we used in our experiments. Indeed, the level of expression might just be too low to be able to measure something meaningful. We could measure the induction of IFNβ expression at the mRNA level by RT-qPCR though. However, we do not believe that the observed increased expression of genes that belong to the type I IFN response is solely the effect of an increased production of IFN. These genes are also under the control of other transcription factors, including NF-kB for some of them, and it might prove difficult to make a direct link with IFNα or IFNβ production.

      4.While the data presented in Fig. 5 provides convincing evidences that the antiviral activity of mediated by PKR against SINV is independent of its kinase activity in N1-Dicer cells. An interesting question is that whether antiviral activity associated with PKR is N1-Dicer dependent, which could be addressed by comparing the viral infection of NoDice∆PKR and NoDicer expressing PKR mutants.

      Reply: Yes indeed, we have generated NoDice/∆PKR cells expressing PKR WT or mutant and we will infect them with SINV to confirm whether the presence of Dicer N1 is needed for the observed phenotype.

      5.In the concluding paragraph of the discussion, the authors presented an oversimplified discerption of a complex model that involves a crosstalk between IFN-I and RNAi and Dicer-PKR interaction, which is difficult for the reader to compose a clear picture of mechanisms involved. It could be helpful to use a schematic illustration to summarize the action model of PKR incorporated with the canonical and non-canonical Dicer functions.

      Reply: We will add a schematic model in the revised version of our manuscript to summarize our main findings.

      Minor comments:

      1.It stated that NoDice FHA-Dicer WT #4 and NoDice FHA:Dicer N1 110 #6 are referred to as Dicer WT and Dicer N1 cells (p.6). For simplicity, Dicer WT and Dicer N1 cells should be used throughout manuscript, including in all figures. The labels in the figures are difficult to read and are confusing in some cases.

      Reply: This will be changed in the revised version to increase the clarity of the figures.

      2.It is to note that p-PKR was only detected at in N1-Dicer cells at 24 hpi (Fig.8A). This is an interesting observation that was not discussed. It appears that this could be due to a delayed viral replication since these cells are already in an elevated antiviral state. This possibility could be tested by examining viral replication and dsRNA accumulation at more time points in the experiments described in Fig.1.

      Reply: We have performed a kinetic of infection at more time points and we will incorporate these experiments in the revision.

      3.The authors may point out the limitations of the studies. For examples, all cells used in the study are engineered HEK cell lines and were tested with limited number of viruses. As such, the observations may reflect Dicer-PKR interaction under artificially overexpressed conditions, but how the model established from the current study applies to primary cells require further investigation.

      Reply: This is indeed important, we will add a sentence about this in the discussion.

      Reviewer #2 (Significance (Required)):

      The findings reported in this study shed some new light on a long-debated issue regarding the potential roles of RNAi as physiologically relevant antiviral mechanism in mammals. Identification of a new antiviral function of Dicer helicase domain via interaction with PKR is a new advancement of the field, and it also adds a new dimension to a complex subject that overlaps of innate immunity , RNA biology, and developmental biology associated with Dicer.

      Field of expertise: Innate immunity, cell signaling, cytokine biology

      Areas that that I do not have sufficient expertise to evaluate: Small RNA cloning, sequencing and, analysis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This work by Baldaccini et al. explores the interplay between Dicer and the antiviral protein PKR in the context of viral infection. It builds on a previous publication of the team which demonstrates that the Dicer helicase interacts with multiple RNA binding proteins, including PKR (see Montavon et al.). In this work from 2021, they demonstrate that an artificially-truncated form of Dicer (Dicer-N1) lacking part of the helicase is antiviral against RNA viruses in a PKR-dependent fashion. This was an interesting finding because the field largely assumed that Dicer-N1 performs its antiviral function via canonical dicing of dsRNA, as part of an antiviral RNAi pathway. The present manuscript follows up on this initial discovery and deciphers the specifics of Dicer-N1 antiviral phenotype, as well as delineates the interplay between Dicer's helicase and PKR. The authors main claims are as follow:

      1. i) Dicer-N1 antiviral effect does not require its catalytic activity, therefore is completely RNAi-independent.
      2. ii) Neither does it require canonical PKR activation, but relies instead on NF-kB-driven inflammation. The origin of this inflammation is not studied.
      3. ii) Truncated Dicers other than Dicer-N1 are antiviral through RNAi, but are also PKR-dependent. The authors claims are mostly supported by the data, although I suggest below some improvements regarding experimental approaches and data presentation. This work details in an interesting manner the interplay between the machinery of RNAi and the classical pathway of innate immunity (PKR). As explained by the authors, there is solid data in the literature demonstrating the mutual exclusivity of IFN and antiviral RNAi in differentiated cells. This mostly goes through the receptors LGP2, which inhibits dsRNA dicing by Dicer. The authors data suggest that, conversely, Dicer may play a role in preventing the unwanting activation of PKR (a non-canonical activation leading to inflammation). Given that PKR activation does not depend on virus, the authors discuss potential mechanisms of PKR triggering. This is an interesting topic that deserves further investigation (not necessarily within the frame of this work - it can be a follow-up). Another interesting piece of information is that different truncated Dicers behave differently with respect to implementing antiviral RNAi. Whilst Dicer-N1 isn't proficient in doing so, the other forms are. It shows that lab-generated truncations do not fully recapitulate what is observed with existing truncated Dicers (DicerO and aviD).

      Experimental design and data interpretation

      1. The authors should compare infection between different cell lines across a range of time points (ie, a virus growth curve). In Fig 4E for example, I worry that cells expressing or not PKR will reach the plateau of viral particle accumulation at different time points. One could imagine that cells lacking PKR do not show any differences in particle production at 24h, but do at earlier time points.

      Reply: This is an interesting suggestion, we can perform a kinetic experiment by looking at more time points to address this point. This will allow us to determine the time needed for every cell line to reach the plateau of infection.

      Western blots should be accompanied with proper quantifications plotted as bar graph with biological replicates (p-PKR, p-eIF2a and capsid).

      Reply: We have biological replicates for our western blot experiments, and we will quantify those to better determine the observed changes. However, in the case of p-eIF2a, we do not think it is pertinent to measure it since there are other kinases than PKR that are known to induce eIF2a phosphorylation upon SINV infection. It might therefore not prove very informative to precisely quantify this particular signal.

      Microscopy images should be properly quantified across biological replicates (Fig 1&2 for the J2 staining, for example).

      Reply: We could do a proper quantification of the J2 signal across replicates, but we do not think it would bring much to our message. Here, we mostly used J2 staining as a qualitative indication that the infection was impacted or not. We have a proper quantification of the effect with our plaque assay experiments, which are way more robust to determine the levels of infection between conditions.

      Confounding factors hinder the interpretation of siRNA accumulation (Suppl Fig 2): i) the efficiency of dsRNA dicing from different Dicers will generate different amounts of siRNAs from a given amount of dsRNA and ii) the higher antiviral response translates into decreased infection, so decreased dsRNA substrate. I suggest that the authors normalise the amount of viral siRNAs over the total amount of viral genomes. This should allow to assess if Dicer-N1 is better at dicing dsRNA than WT in these conditions.

      Reply: This is a valid concern and we agree that it is important to be able to normalize small RNA reads between conditions before reaching a conclusion. The problem is that there is no easy way to do this since we do not get a direct measurement of viral genomes accumulation from our small RNA sequencing data. To better compare the two conditions, we could normalize the individual viral siRNA to the total number of viral reads. Another problem that we face is that we are looking here at the AGO-loaded small RNAs, which makes it more difficult to assess dicing efficiency since not every generated siRNA might be loaded into an Argonaute protein. In fact, this has been proposed by the Cullen laboratory in a paper published in 2018 (Tsai et al. doi: 10.1261/rna.066332.118). They showed that although viral siRNAs were generated during IAV infection, those were inefficiently loaded and thus did not significantly impacted the infection.

      In Fig 8, the authors should verify that phospho-p65 increase depends on PKR by repeating the experiment in PKR KO cells.

      Reply: Yes, good point. We will check what happens to phosphorylation of p65 in PKR KO cells. In addition, we can also measure the effect on a known NF-kB target by RT-qPCR (e.g. PTGS2).

      Data representation

      1. Levels of phospho-PKR and eIF2a need to be normalised on the total amount of PKR and eIF2a, respectively. The authors should quantify the blots and present bar graphs with biological replicates and statistics.

      Reply: As mentioned above in our reply to point 2, we can add the quantification for phospho PKR, but we do not think it is pertinent to do it for eIF2a.

      Could the authors add the names of representative genes on the volcano plots of Fig 7?

      Reply: Yes, this will be done.

      Points of discussion

      1. In Fig 4C, catalytically-dead mutants of truncated Dicers (other than N1) do not display an antiviral effect. Presumably, such proteins implement canonical antiviral RNAi. Is there a reason why the authors interpret this data as Dicers being "partially" antiviral through RNAi (l. 92). This data instead suggest that is it totally dependent on RNAi.

      Reply: Indeed, and we do not say the contrary. It seems that some of this helicase-truncated Dicer proteins can act through RNAi. However, they also depend on PKR, so in the end it might be a combination of the two that allows their antiviral effect.

      Gurung et al. demonstrate that PKR is activated in Dicer KO mouse ES cells, which results in phosphorylation of eIF2a at steady-state. This is different from the authors' data, in which PKR activation does not affect eiF2a phosphorylation. Could the authors discuss this discrepancy?

      Reply: The problem that we face here is that SINV is known to also activate GCN2 and therefore eIF2a phosphorylation does not strictly rely on PKR in our experimental conditions. In addition, we did not check eIF2a phosphorylation in Dicer KO cells, but we always compare Dicer WT and Dicer N1 expressing cells.

      Do the authors expect that truncated Dicers other than N1 trigger an inflammatory response such as the one described for N1? Would it be possible to have this antiviral inflammatory response in conjunction with antiviral RNAi?

      Reply: This goes back to Point 1 mentioned previously. We think indeed that there might be a dual action of Dicer and that it will be important to check whether in other cellular systems or animal model such a phenomenon can be observed as well. This is a point that we did address in the discussion of our manuscript (line 522-525).

      Reviewer #3 (Significance (Required)):

      This is a study that conceptually advances the field of antiviral RNAi in mammals, including its interplay with the machinery of innate immunity. It is of interest for virologists and immunologists. My expertise is centered on the mechanisms of innate immunity in mammalian cells, including antiviral RNAi.

    1. Author Response

      We would like to express our gratitude to the Editors and Reviewers for their thoughtful and helpful comments. We sincerely appreciate the opportunity to submit our revised manuscript titled “Predicting Ventricular Tachycardia Circuits in Patients with Arrhythmogenic Right Ventricular Cardiomyopathy using Genotype-specific Heart Digital Twins” to eLife. We are delighted that our research in ARVC has garnered the interest of the three reviewers. Below, we provide our point-by-point responses to the reviewers’ comments. We have also incorporated the suggestions provided by the reviewers in our revised manuscript.

      Comments from Reviewer 1

      We thank Reviewer 1 for their positive assessment and thoughtful suggestions. Here are the responses to the comments of reviewer 1:

      Comment 1: One addition that could add more insight is to predict the effect of structural remodeling alone well, considering only normal electrophysiological models.

      We thank the reviewer to give this thoughtful suggestion to our experiment design. We would like to highlight that this suggestion was indeed taken into consideration in our study as all the patients’ hearts were modeled using the gene-elusive cell model before the structural-EP mismatch was implemented. The gene-elusive cell model is a baseline ten Tusscher (TT2) human ventricular model described in the “Cell-level modeling” of our Methods. Therefore, we have already examined the impact of structural remodeling alone in the study.

      Comment 2: Another interesting approach would be a sensitivity analysis, to determine how sensitive the VT circuits are to the specific geometry of the patient and remodeling that occurs during the disease, such an approach could also be used to determine how sensitive the outputs are to electrophysiological model inputs.

      We think this suggestion is of great value and could benefit our future ARVC studies. The reviewer pointed out the importance of investigating how sensitive the VT circuits are to the specific geometry/remodeling of the patient during disease progression. To achieve this, for each patient, a sequence of LGE-CMR images at different stages of this disease is required for model reconstruction; unfortunately, our cohort for this study does not incorporate such data.

      Comments from Reviewer 2

      We thank Reviewer 2 for the positive assessment, and here are the responses to the comments:

      Comment 1: I appreciate that the types of computational models detailed in this paper take enormous time to develop. However, to identify bottlenecks in the clinical workflow (and thus targets for future research), it may be nice for the authors to discuss the time taken to generate and run the models for each patient?

      We sincerely appreciate the valuable feedback from the reviewer. We recognize the importance of considering model generation and run time. In the introduction, we have highlighted the clinical challenge in managing ARVC ablation procedures, which is the inability to capture all the VT due to an incomplete understanding of VT mechanisms. We acknowledge the reviewer’s concern regarding the potential time taken by the model to predict VT circuits and whether this could hinder the integration into the current ablation procedure. However, it is important to clarify that our model is primarily based on clinical images obtained in advance of the procedure. As a result, there is sufficient time available to generate the results required for ablation planning.

      Comment 2: In the Materials and Methods section, some references are underlined? Is this a typo or meant to convey some particular information?

      We thank the reviewer for pointing this typo out and we have removed the underlining of references in our revised manuscript.

      Comment 3: The authors state that the cellular models are available from the CellML model repository. This is an excellent practice. However, the URL that is given points to the entire CellML website. It will be more useful for URLs that point to the specific models used in the study so that readers can be sure they are looking at the correct model.

      We appreciate the reviewer for this suggestion, and we have edited the URL in Data Availability to link to a specific cell model on the CellML website.

      Comment 4: In the abstract, the authors report the sensitivity, specificity, and accuracy of their computer models but fail to comment in the abstract that they are comparing against recordings from the patient during a previous EPS study. To assist further readers who are scanning the abstract, the authors may wish to add a sentence or two to detail what they are comparing their model results to.

      We thank the reviewer for the suggestion. This is a retrospective study. We recognize the importance of wording clarity in the abstract; in response, we have added a sentence in the abstract to clarify that we compared VT locations of Geno-DT with the ones recorded during clinical EPS to obtain sensitivity, specificity, and accuracy.

      Comment 5: In Table 1 some of the data is discrete e.g., the number of patients on a beta-blocker. The authors give a p-value for comparing the GE and PKP2 data and state in the caption that a Student's t-test has been used. Strictly speaking, a t-test is not really appropriate for the population proportion with non-parametric data. That said, the size (n) of the data here makes the p-values from any statistic very unreliable. Perhaps the authors might like to reconsider if p-values add anything to such data? If so, then the statistical test should be reconsidered.

      We truly appreciate the reviewer for pointing out this typo in the caption of Table 1. For the non-parametric discrete data, we used z-test, a common statistical method used to compare percentages, to get the p values, but we mistakenly only mentioned t-test in our caption. We acknowledge the limitation of our sample size and we have corrected this typo in our revision.

      Comment 6: I found Table 1 and its caption a little confusing. The authors put the range in [] brackets and then abbreviated standard deviation with () brackets. On initial reading, I incorrectly assumed that the numbers in the table in () brackets were standard deviations when, in fact, they are percentages. Perhaps the authors could consider changing the caption so that the percentage is in, say, {} brackets and make the caption say that values are given as n {%} etc.

      We appreciate the reviewer for pointing this out and we recognize that certain expression in the Table 1 caption is confusing. In our revised manuscript, we used n {%} to replace n (%) and deleted the abbreviated standard deviation which has not been used.

      Comment 7: In the caption for Figure 2 the authors present action potentials "at steady state". Adding the pacing frequency (or cycle length) for the steady state would be useful.

      We thank the reviewer for pointing this out. We agree that showing pacing frequency is important and we have made the edit in our revision.

      Comment 8: In Table 2 the VT locations are compared between the EPS and the Geno-DT model. The comparison metrics listed in the table should be better described in the table caption. It is unclear if the authors compare VT locations in the AHA segments or if the specific geometric location is used. If it is a geometric location, then I would have expected to see information on the mean error distance or similar information? If it is a comparison of AHA segments, there could be a problem if a VT location was very close to the border between segments. The predicted VT location might be very close to the measured VT location but may end up in a different segment? The authors may like to clarify the methodology and/or discuss these issues.

      We thank the reviewer for this comment. We recognize the need for clarification on the comparison metrics of Table 2. In the text related to Table 2, we used the wording “anatomical location” to avoid excessive repetition of mentioning AHA segments. However, we agree that reverting it back to the “AHA segment” will reduce confusion. Regarding the point of comparing exact locations the reviewer mentioned, in clinical settings, clinicians primarily rely on AHA segments to describe the VT locations during ablation and descriptions in the EP report, rather than using exact coordinates. As such, a match between our predicted AHA segments and clinical AHA segments is a direct comparison. This alignment provides a meaningful comparison and is sufficient for assisting ablation procedures.

      Comment 9: In Figure 7, activation maps are shown, and the row is labelled as Induced VTs/Geno-DT. Are the colour maps from the model or the EPS measurements? The last sentence of the caption indicates they are from the measurements, but such detailed full-wall maps seem to be from a model. The authors may like to clarify what the figure shows.

      We thank the reviewer for this comment. We understand the reviewer’s concern regarding the clarity of Figure 7’s caption. While we believe that the first bold sentence in the caption adequately clarifies that the results in Figure 7 are derived from the Geno-DT model, we agree with the reviewer that it is needed to further enhance the wording clarity. In response, we have made the necessary edits to the caption in our revised manuscript.

      Comments from Reviewer 3

      We thank Reviewer 3 for giving the positive assessment. Here are the responses to the comments.

      Comment 1: The small sample size is a limitation but has already been acknowledged and documented by the authors.

      We thank the author for this comment, and we acknowledged the small sample size as a limitation in our manuscript.

      Comment 2: Another limitation is the consideration of only two of the possible genotypes in developing the cell membrane kinetics, but again has been acknowledged by the authors.

      We thank the author for this comment, and we acknowledged the consideration of only two genotypes as a limitation in our manuscript. We hope to enlarge the genotype groups in our future ARVC studies.

    1. Others are also made by well-intentioned and conscientious people who fear that harm will come to some segment of the community if a particular text is read or recommended.

      I am curious about the idea that as generations pass and ideologies change, if the banned book list will see a shift in reasoning. I think undoubtedly it has to, for example, 40 years ago a book may have been banned due to having tones of homosexuality or transgender people, but now, or maybe in the near future, I could see books being banned for having themes of homophobia or transphobia. This is a very base line example, and I think if we look at the list of banned books and the culture of the time, we will be able to find out a lot of what was considered right and wrong in those time periods.

    2. first, any text is potentially open to attack by someone, somewhere, sometime, for some reason

      I really enjoy how they make argue this point. Someone, somewhere will always be able to find something wrong with a text you have selected - because the world is imperfect and we as humans are imperfect and often sensitive, we cannot satisfy every single human being. That being said, I think it is important to challenge students to read books that may challenge beliefs or bring new perspectives to light for them.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      In this study, the authors conducted a multi-omics analysis comparing cells from the long-lived bat, Pteropus alecto, and human cells. Their findings revealed that bat cells express higher levels of mitochondrial complex I components and exhibit a lower rate of oxygen consumption. Moreover, computational modeling suggested that the activity of complex II in bat cells might be low or even reversed, similar to the conditions observed during ischemia. The decrease in central metabolites and the increased ratio of succinate to fumarate in bat cells might indicate an ischemia-like metabolic state. Despite having high mitochondrial ROS levels, bat cells exhibit higher levels of total glutathione and a higher ratio of NADPH to NADP. Additionally, bat cells showed resistance to glucose deprivation and induction of ferroptosis.

      Major comments

      1. Regarding Figure 1A, the authors mention 'n = 3' for a single cell line. Does this refer to three different passages or three independent experiments? Please provide a more detailed description to clarify.
      2. In relation to Figures 1C and 1D, the authors state in the figure legend that the 'GSEA analysis identifies Respiratory electron transport and Cellular response to hypoxia as the top metabolic pathways that are differentially regulated between PaLung and WI-38 cells.' (Lines 140-144). However, the criteria for selecting these terms as the top metabolic pathways is not clear. In the lists in Supplementary Tables 2 and 3, the authors' proposed term, 'Respiratory electron transport,' is ranked 126th, and 'Cellular response to hypoxia' is ranked 79th. Conversely, terms related to the TCA cycle are ranked 66th and 82nd, and another term that seems to be related to hypoxia, 'OXYGEN-DEPENDENT PROLINE HYDROXYLATION OF HYPOXIA-INDUCIBLE FACTOR ALPHA,' is ranked 62nd. Could the authors please provide a clarification for their choice of 'Respiratory electron transport' and 'Cellular response to hypoxia' as the top metabolic pathways?
      3. In the Materials and Methods section (lines 419-421), the authors mention, 'GSEA was run against the complete Gene ontology biological process (GO BP) gene set list (containing 18356 gene sets).' However, they narrow down the gene dataset for analysis (lines 136-138, 'we filtered our gene dataset to contain only genes listed under the Gene ontology category Cellular Metabolic Process (GO ID:0044237), resulting in a truncated list of 4794 genes.'). I'm concerned that this selective approach might introduce bias into the resultant pathways. Is this selective approach commonly employed in this type of analysis? And isn't there a need for adjustments to avoid potential bias?
      4. The authors noted that the number of differentially expressed genes (DEGs) is quite high (6,247 out of 14,986) as per lines 134-135, stating that "The number of differentially expressed genes (6,247) was extremely high, suggesting that multiple pathways are differentially regulated between the two species." However, this large number of DEGs could indicate either an improper correction procedure or a need for a more stringent threshold. The authors should address this issue to avoid potential misinterpretation of the results.
      5. In Figure 2B, the samples labeled as W1 and P1 appear to be outliers. This raises questions about the integrity of the sampling or analysis process. Please describe about this.
      6. Regarding the GSEA analysis of Fig. 2, they are using the full set of GSEA. However, this reviewer is wondering if this is appropriate when analyzing mitochondrial fractions, as I believe using the entire GSEA set could introduce a bias. Is this a common approach? Shouldn't the authors be focusing on mitochondrial-related sets within the GSEA, and then determining the upregulated and downregulated pathways from there?
      7. The authors describe in lines 195-197, "GSEA-flagged upregulation in OxPhos was driven mostly by the upregulation of Complex I subunits, for both the proteomic and transcriptomic data (Figure 2G, Supplementary Figure S1D)." However, within this analysis, the number of genes composing each subgroup of the mitochondrial Complexes are 44 for Complex I, 4 for Complex II, 10 for Complex III, and 19 for Complex IV (https://www.genenames.org/data/genegroup/#!/group/639). The authors mention that the genes of Complex I were dominant in the ETC, but, might this just be reflecting the original difference in the number of genes? As this reviewer believes this could have a significant impact on the authors' current claims, this reviewer suggest the authors to carefully reconsider this point, comparing the actual results with the proportion expected from the difference in gene numbers. (Even in Fig. S1D, it appears to correlate with the number of genes: C1 39.3%, C3 10.7%, C4 10.7%, C2 3.5%)
      8. As pointed out in Major Point 7, if the authors' claim of enrichment in Complex I is indeed due to the large number of genes included in the Complex I subgroup (https://www.genenames.org/data/genegroup/#!/group/639), can the assumption of High Complex I flux truly be considered valid? In that case, this constraints model would become inappropriate, and the validity of the inferred low or reverse activity of Complex II would be diminished. Therefore, a careful re-examination is desirable.
      9. (option, takes about 1-2 months). This reviewer believes that the authors' most important claim, concerning the high activity of Complex I and the low activity of Complex II, lacks strong evidence as no biochemical data of the activities of each mitochondrial complex are presented to substantiate this. Unless additional biochemical experimental data is provided, the assertions should be toned down. While the abstract mentions "complex II activity may be low or reversed," it is stated with certainty in line 108 of the introduction, "associated with the low or reverse activity of Complex II." Based on the present data, this reviewer believes that the claim remains speculative. Therefore, I suggest moderating the overall argument or adding the biochemical data. While the results from metabolomics are supportive, they do not serve as direct evidence.
      10. Regarding Figure 5, the title of the figure states "lower antioxidant response", but it doesn't seem that the data in the figure actually shows a lower antioxidant response.
      11. In lines 109-110 of the Introduction, the authors state, "we confirmed our prediction of ischemic-like basal metabolism in PaLung cells by characterizing the response of bat cells to cellular stresses such as oxidative stress, nutrient deprivation, and a type of cell death related to ischemia, viz. ferroptosis." However, can the assertion that the cells are in an ischemic-like state be confirmed simply because they are resistant to several types of cellular stress?

      Minor points:

      1. The authors mention the use of cufflinks/Tophat for mapping/quantification. However, support for these software programs has ended and the creators of these programs themselves recommend using the successor programs. I recommend re-analysis using a more current pipeline (such as HISAT2/StringTie, STAR/RSEM, etc.). Furthermore, the transcriptomics section of the methods should also include the program used for cleaning and trimming.
      2. As for the Oxygen Consumption Rate (OCR) data presented in Figure 2F, it makes sense that it's low at the basal level. However, it's perplexing that it is also low even under uncoupled conditions, especially considering the high energy demand associated with flight in this species. Could the authors provide their interpretation on this apparent contradiction?
      3. In line 156, the authors mention that 'Profiling detected a total of 1,469 proteins.' Please provide more details in the explanation. Specifically, does this total of 1,469 proteins represent a combined count from both humans and bats, or is this the number of proteins for which orthologs could be identified in both species, just like the authors did with the transcript results.
      4. In Supplementary Table 4, only 127 mitochondrial proteins are listed out of the 405 proteins mentioned in "Of these 405 proteins, we identified 127 to be core mitochondrial proteins (lines 161-163)". As there is no explanation for this within Supplementary Table 4, it would be better to include one.
      5. In line 472, the phrase "GO BB gene set list" is used. Could this potentially be a typographical error, and should it instead be "GO BP gene set list"?
      6. In the volcano plot of Fig. S3B, it appears that the side with lower P/W values generally corresponds with lower p-values. I wonder if there might have been any oversight or mistake in the data analysis process that could explain this observation?
      7. In lines 249-252, it is stated, "The low or negative flux values for Complex II in our PaLung simulations indicate that the electrons obtained from Complex I may accumulate at Complex II or potentially even get consumed by Complex II operating in reverse (bypassing the rest of the ETC) in PaLung cells." However, isn't the basic process of electron transfer done through Complex I-III-IV, independent of Complex II?
      8. Regarding Figure 4F, the authors state, 'PaLung cells displayed higher viability than WI-38 cells after glucose deprivation (Figure 4F).' However, in addition to the cell images, it would be beneficial to perform experimental quantification of cell death to provide more rigorous data. Additionally, the cells appear to be over-confluent, which might influence the results. Also, scale bars should be included in all photos, including Fig. 6.
      9. Regarding Figure 5B, it is stated that 'the expression levels of differentially expressed antioxidant genes' are shown, but it includes those that are not significant. It would be helpful if the authors could clarify how this gene set was selected.
      10. Regarding Figure 6C, the values for total glutathione seem to significantly differ from those in Figure 5C. An explanation for this discrepancy would be appreciated to ensure the consistency and reliability of the data.

      Referees cross-commenting

      I think the comments from the other reviewers are appropriate.

      Significance

      Collectively, these intriguing results from the interspecies comparison provide novel insights into the differences in metabolism and cellular characteristics between bat and human cells. However, the study has some limitations, notably certain weaknesses in the data and potential overstating of certain interpretations. Addressing these issues would enhance the overall quality and robustness of the manuscript. Furthermore, if feasible, conducting a biochemical analysis of each mitochondrial complex activity would solidify the authors' main conclusions.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript by Liu et al entitled "Discovery and biological evaluation of a potent small molecule CRM1 inhibitor for its selective ablation of extranodal NK/T cell lymphoma" reports the identification of a novel CRM1 inhibitor and shows its efficiency against extranodal natural killer/T cell lymphoma cells (ENKTL).

      This is a very timely and very original study with potential impact in a variety of pathologies not only in ENKTL. However, the main conclusions of the work are not supported by experimental evidence.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is original with considerable translational impact to the field. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      The study claims that LFS-1107 reversibly inhibits the nuclear export receptor CRM1 but the authors only show that the compound binds to CRM1 and that the CRM1 substrate IκBα accumulates in the cell nucleus upon LFS-1107 treatment. The evidence is indirect and alternative scenarios are certainly possible.

      Many thanks for this critical comment. We have conducted extra experiments to demonstrate that LFS-1107 can reversibly inhibit the nuclear transport machinery mediated by CRM1. Namely, culturing the medium for two hours after LFS-1107 treatment restored the transport of IκBα from the nucleus to the cytoplasm. Please see Figure 2 -Figure Supplement 3 for more details.

      On the other hand, the manuscript is not always well-written and insufficiently referenced.

      Thanks for this critical comment. This has been fixed. We have checked through the manuscript with extensive language editing. Moreover, we have added more references to the manuscript.

      The nuclear translocation in figure 2G is not convincing. The western blot in figure 2G shows that LFS-1107 treatment induces IκBα expression, and both cytoplasmic and nuclear amounts increase in a dose-dependent manner. Together, these data do not support nuclear IκBα accumulation upon LFS-1107 treatment.

      Thanks for this critical comment. This has been fixed. We have reconducted the Western experiments and our results revealed that only nuclear IκBα amount was increased upon the treatment of LFS-1107. In contrast, cytoplasmic IκBα amount was decreased after the treatment of LFS-1107. Please see Figure 2J for more details.

      Reviewer #2 (Public Review):

      Indeed, ENKTL is a rather deadly tumor with unmet medical needs. The work is novel in the sense that they designed and identified a very potent inhibitor homing at CRM1 via a deep-reinforcement learning model to suppress the overactivation of NF-κB signaling, an underlying mechanism of ENKTL pathogenesis. The authors demonstrated that LFS-1107 binds more strongly with CRM1 (approximately 40-fold) as compared to KPT-330, an existing CRM1 inhibitor. Another merit of the small-molecule inhibitor is that LFS-1107 can selectively eliminate ENKTL cells while sparing normal blood cells. Their animal results clearly demonstrated that the small-molecule inhibitor was able to extend mouse survival and eliminate tumor cells considerably. Overall, the manuscript may provide a possible therapeutic strategy to treat ENKTL with a good safety profile. The manuscript is also well-written. The weakness of the manuscript is that some details for the design and evaluation of the small-molecular inhibitor are missing.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is relatively novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this study, the authors made a two-component homing modification gene drive in Anopheles coluzii with a different strategy than usual. The final drive itself targets and disrupts the saglin gene that is nonessential for mosquitoes, but important for the malaria parasite. The drive uses several gRNAs, and some of these target the Lp gene where an anti-malaria antibody is added, fused to the native gene (this native gene is also essential, removing nonfunctional resistance alleles at this locus). In general, the system is promising, though imperfect. Some of the gRNAs self-eliminate due to recombination of repetitive elements, and the fusion of the antimalaria gene had a modest fitness cost. Additionally, the zpg promoter was unable to operate at high efficiency, requiring use of the vasa promoter, which suffers from maternal deposition and somatic expression (the latter of which increased fitness costs at the Lp target). The manuscript has already undergone some useful revisions since its earliest iteration, so additional recommended revisions are fairly modest.

      Line 43-45: The target doesn't need to be female sterility. It can be almost any haplosufficient but essential target (female sterility works best, so it has gotten the most study, but others have been studied too).

      --- We agree. However, this paragraph focused on previous achievements in malaria mosquitoes, for which suppression gene drives spreading lethality rather than female sterility have not been reported to our knowledge. Even the targeting of doublesex, which is a sex determination rather than female fertility gene, results in female sterility (Kyrou et al. 2018). However, we inserted the possibility of female killing by X-shredder GD (Simoni et al., 2020).

      Line 69: A quick motivation for studying Anopheles coluzii should be added here (since gambiae is discussed immediately before this).

      ---Thank you for drawing our attention to this point. We modified the sentence to:

      _Here, we present the engineering of the Lipophorin (Lp) essential gene in Anopheles coluzzii, a prominent member of the A. gambiae species complex and a major malaria vector in sub-Saharan Africa.

      _

      Introduction section: It might be helpful to break up the introduction into additional paragraphs, rather than just two.

      --- We followed this suggestion and broke up the introduction into 5 paragraphs to make it more breathable.

      Introduction last part: The last part of the introduction reads more like an abstract or conclusions section. Perhaps a little less detail would fit better here, so the focus can be on introducing the new drive components and targets

      --- We have followed this suggestion and substantially shortened this last part of the introduction.

      Line 207-213: This material could go in the methods section. There are some other examples in the results that could be similarly shortened and rearranged to give a more concise section.

      --- We moved the long description from lines 207-213 to the Methods as suggested, and summarized it simply as:

      Only mosquitoes displaying GFP parasites visible through the cuticle were used to infect mice.

      We emphasize this point because in subsequent experiments using Saglin knockout mosquitoes, this enrichment for infected mosquitoes will probably attenuate the Plasmodium-blocking phenotype caused by Saglin KO, since mosquitoes lacking Saglin tend to be less infected (Klug et al., 2023). Elsewhere in the Results, we still provide detailed descriptions of procedures because we believe they aid understanding and assessing the quality of the experiments.

      Line 283-287: I couldn't find the data for this.

      --- Indeed we only summarized the data about the progeny of the [zpg-Cas9; GFP-RFP] line crossed to WT, as we didn’t judge these results worth detailing. Here is our record from one such cross:

      GFP-RFP females x WT males  486 (50.7%) GFP+ and 472 (49.3%) GFP- larvae

      GFP-RFP males x WT females  1836 (48.9%) GFP+ and 1925 (51.1%) GFP- larvae

      This shows no significant gene drive. However in these progenies, a few GFP+ and non-RFP larvae, and a few RFP+ non-GFP larvae were noted by visual examination under the fluorescence microscope, without counting them precisely. Their existence testified to some weak homing activity mediated by zpg-Cas9 in the Lp locus.

      We modified the sentence as follows to support our conclusion, and we propose to leave these detailed numbers here in our response, which will be published along with the paper.

      In spite of the presence of the zpg-Cas9 and gRNA-encoding cassettes in the GFP-RFP allele, it was inherited in about 50% of male or female progenies, demonstrating little homing activity of the GFP-RFP locus after crosses to WT, except for the appearance of rare GFP-only or RFP-only progeny larvae, …

      Line 291: Replace "lied" with "was".

      ----done.

      Line 356: Homing in the zygote would be considered very unusual and is thus worthy of more attention. While possible (HDR has been shown for resistance alleles in the zygote/early embryo), this would be quite distinct from the mechanism of every other reliable gene drive that has been reported. Is the flow cytometry result definitely accurate? By this, I mean: could the result be explained by just outliers in the group heterozygous for EGFP, or perhaps some larvae that hatched a little earlier and grew faster? Perhaps larvae get stuck together here on occasion or some other artifact? Was this result confirmed by sequencing individual larvae?

      ---- We agree with your skepticism, especially given that the same is not seen in Suppl Fig 2A with a similar genotype setup, i.e., the vasa gene drive at the Lp locus, or in the G1 of populations 6 or 8 at the Saglin locus (Suppl. File 2). Unfortunately, it would take too much time at this point to re-create this line (which has been discarded) to re-examine this issue. Therefore, we acknowledge that another explanation than homing in the zygote may account for this result. Based on our empirical experience COPAS outputs are reliable: such outliers from the heterozygous population are usually not seen, and we always sort neonate larvae a few hours from hatching. Those 6% homozygous-looking larvae may come from a contamination with male pupae when female pupae were manually sorted for the cross to WT males, a human error that we cannot exclude. In this case, the true GFP inheritance would be closer to 79% than to 85%. For these reasons, we must back up from our initial statement as follows:

      The progeny of these triple-transgenic females crossed to WT males showed markedly better homing rates (>79% GFP inheritance)

      And edit the figure legend of Figure 4B to account for the alternative possibility of a contamination with males:

      6% of individuals appeared to be homozygous, revealing either unexpected homing in early embryos due to maternal Cas9 deposition, or accidental contamination of the cross with a few transgenic males.

      Results in general: Why is there no data for crosses with male drive heterozygotes? Even if some targets are X-linked, performance at others is important (or did I miss something and they are all X-linked). I see some description near line 400, but this sort of data is figure-worthy (or at least a table).

      --- For the only example of functioning split gene drive at the Lipophorin locus on chromosome III, we do show homing results from heterozygous GD males in Suppl. Fig. 2A (91.2% homing in males inferred from ((40.7+53.1+1.8)-50)x2). We added this calculation of the homing rates in the figure legend. For full drive constructs in the Saglin locus on chromosome X (our final functional design), in addition to the data described in the text near line 400, male data showing “teleguided” homing at the Lipophorin locus on chromosome II is shown in Suppl. File 2 (see G2 of population 7, showing close to 100% homing at the GFP locus); the same data (less easy to assess) being converted into the G2 point of the graphs in Figure5.

      Lines 362-367: What data (figure/table) does this paragraph refer to?

      --- We apologize for the fact that this sentence was misleading. In this population, the genotype frequencies were not tracked at each generation but measured once after 7 generations. We rephrased (now lines 401-403) and now provide the measured values directly in the text:

      We maintained one mosquito population of Lp::Sc2A10 combined with SagGDzpg (initial allele frequencies: 25% and 33%, respectively) and measured genotype frequencies after 7 generations. This showed an increase in the frequency of both alleles (G7: GFP allelic frequency = 59.2%, phenotypic expression of DsRed in >90% of larvae, n=4282 larvae),

      Lines 405-406: There may be a typo or miscalculation for the DsRed inheritance and homing rate here. Should DsRed inheritance be 90.7%?

      --- Thank you for spotting this. You are right, DsRed inheritance would be 90.7% if the homing rate were 81.4% as we mistakenly wrote. Actually DsRed inheritance was really 80.7% so our mistake was in calculating the homing rate: 61.4% is the correct value ((80.7-50)x2), now corrected in the manuscript.

      Figure 5: The horizontal axis font size for population 8 is a little smaller than the others.

      --- True. Corrected.

      Line 454: In addition to drive conversion only occurring in females and the somatic fitness costs, embryo resistance from the vasa promoter would prevent the daughters of drive females from doing drive conversion. This means that drive conversion would mostly just happen with alleles that alternate between males and females.

      --- We agree with this idea, although the impact of this phenomenon will depend on the extent of resistance allele formation in early embryos. We observed (Fig. 6) that failed homing mutagenesis in Saglin is not that intense, the sequenced non-drive alleles that were exposed 1-4 times to mutagenic activity in females either being mostly wild-type, or carrying mutations that often still left one or two gRNA target sites intact and vulnerable to another round of Cas9 activity. Therefore, alleles passed on from female to female may still undergo drive conversion to a large extent, that future experiments may be able to quantify.

      Line 481: Deletions between gRNAs certainly happen, but I wouldn't necessarily expect this to be the "expectation". In our 2018 PNAS paper, it happened in 1/3 of cases. There were less I think in our Sciences Advances 2020 and G3 2022 paper. All of these were from embryo resistance from maternal Cas9 (likely also the case with your drive due to the vasa promoter). When looking at "germline" resistance alleles, we have recently noticed more large deletions.

      --- We agree that the early embryo with maternally deposited Cas9 is probably the most prominent source of mutations at gRNA target sites. Perhaps naïvely we imagined that it would be easier for cells to repair two closely spaced DNA breaks by eliminating the intervening sequence, rather than stitching each break individually. Given that we sequenced many alleles carrying a single mutation, the lack of larger deletions may be explained by lower rates of Cas9 activity in Saglin, with mostly a single break at a time, due to limiting Cas9 amounts and their partial saturation with Lp gRNAs, and/or lesser accessibility of the Saglin locus compared to Lipophorin… We deleted the phrase “Contrarily to our expectation”.

      Figure 6C: It may be nice to show the wild-type and functional resistance sequence side-by-side.

      --- done

      Lines 642-644: This isn't necessarily the case. At saglin, the nonfunctional resistance alleles may still be able to outcompete the drive allele in the long run. This wasn't tested, but it's likely that the drive allele has at least some small fitness costs.

      --- We agree. We inserted this comment in a parenthesis in the text (now lines 644-645):

      Unlike the first approach, this design may allow Cas9 and gRNA-coding genes to persist indefinitely within the invaded mosquito population (unless nonfunctional resistance alleles outcompete the drive allele in the long run).

      A few comments on references to some of my studies:

      Champer, Liu, et al. 2018a and 2018b citations are the same paper.

      --- Duplicate in our reference library. Corrected.

      For Champer, Kim, et al. 2021 in Molecular Ecology, there was a recent follow-up study in eLife that shows the problem is even worse in a mosquito-specific model (possibly of interest as an alternate or supporting citation): https://elifesciences.org/articles/79121

      --- Citation added (line 68).

      One of my other previous studies was not cited, but is quite relevant to the manuscript: https://www.science.org/doi/10.1126/sciadv.aaz0525<br /> This paper demonstrates multiplexed gRNAs and also models them, showing their advantages and disadvantages in terms of drive performance. Additionally, it models and discusses the strategy of targeting vector genes that are essential for disease spread but not the vectors themselves (the "gene disruption drive"), showing that this can be a favorable strategy if gene knockout has the desired effect (nonfunctional resistance alleles contribute to drive success).

      --- your 2020 study will indeed now be useful to inform the design of multiplex gRNAs for various gene drives designs, in terms of number of gRNAs, distribution of their target sites, necessity to generate loss-of-function rather than functional resistance allele in the target gene (such as our Lp and Saglin pro-parasitic genes)… The notion of Cas9 saturation with increasing gRNA numbers is also important. When we initiated this project in 2018, we only had intuitive notions that multiplex gRNAs could improve the durability of GD and increase the chances of resistance alleles to be loss-of-function. We thus arbitrarily maximized the number of gRNAs for each of the two targets: 3 for each target in one design, 3 and 4 in another, which, according to your modelling, is luckily close to the optimal numbers for each locus. We now cite your paper as a GD design tool in the discussion about pathways to optimizing our system:

      To further optimize GD design, modeling studies can now aid in determining the optimal number of gRNAs in a multiplex, depending on the specific GD design and purpose (Champer et al., 2020)__.

      In addition to this and to the stabilization of multiplex gRNA arrays, other paths to improvement (…)

      This one is less relevant, but is still a "standard" homing modification rescue type drive that could be mentioned (and owes its success to multiplexing): https://www.pnas.org/doi/abs/10.1073/pnas.2004373117<br /> The recoded rescue method was also used in mosquitoes (albeit without gRNA multiplexing) by others, so this may be a better one to mention: https://www.nature.com/articles/s41467-020-19426-0

      --- We added the two references on what is now Line 663:

      Lp::Sc2A10 depends on SagGD for its long-term persistence and spread in a population, and SagGD depends on Lp::Sc2A10 as a rescue allele of the essential Lp target for its survival. This design can be seen as a two-locus variation of rescue-type GDs (Adolfi et al., 2020; Champer et al., 2020)

      Sincerely,<br /> Jackson Champer

      Referees cross-commenting<br /> Other comments look good. One thing that I forgot to mention: for the 7-gRNA construct with tRNAs, the authors mentioned that it was harder to track, but it sounds like they obtained some data for it that showed similar performance. Even if this one is not featured, perhaps they can still report the data in the supplement?

      --- This GD required examination of the mosquitoes at late developmental stages, such as the pupa, to score red fluorescence under control of the OpIE2 promoter, that is unfortunately late-active when expressed from the Lp locus. We precisely scored only the first 128 pupae arising from the progeny of the first obtained G1 [SagGD/+ ; Lp-2A10/+] females crossed to WT males. Among these:

      • 115 were GFP+, DsRed+ (89.8%)

      • 12 were GFP+, DsRed- (9.3%)

      • 1 was GFP-, DsRed- (<1%)

      This allowed us to roughly estimate the homing rates at 98.2% at the Lipophorin locus and 79.7% at the Saglin locus, which is similar to the other construct without tRNA spacers.

      These approximate rates were confirmed by visual examination of progenies in two subsequent generations of [SagGD/+; Lp-2A10/+] males and females backcrossed to WT.

      Reviewer #1 (Significance):

      Overall, this study represents a useful advance. Aside from being the first report for gene drive in A. coluzii, it also is the first that investigates the gene disruption strategy and is the first report of gRNA multiplexing in Anopheles. The study can thus be considered high impact. There are also other aspects of the study that are of high interest to gene drive researchers in particular (several drives were tested with some variations).

      --- We are grateful for your positive, constructive and in-depth analysis of our study!

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors initially created a transgenic mosquito colony expressing the Sc2A10 antibody fused to the lipid transporter Lipophorin, and tested the transmission-blocking activity of this transgene. Building off of previous findings that the Sc2A10 antibody inhibits sporozoite infectivity when expressed in mosquito salivary glands, the authors showed that found it was also efficient at inhibiting sporozoite infectivity when secreted into the hemolymph expressed under the lipophorin endogenous promoter in An. coluzzii. They then designed and tested two different gene drives utilizing the Sc2A10-Lipophorin fusion protein. In the first, the authors used a recoded allele of Lp-Sc2A10 while simultaneously utilizing gRNAs that targeted endogenous Lp in an effort to select for mosquitoes that expressed transgenic Lp-Sc2A10 due to the essential nature of Lp. However, this drive was unsuccessful because recoded Lp is necessarily heterozygous while the GD is entering the population, and Lp proved to be largely haploinsufficient. Further, the zpg promoter expressing cas9 was not effective in promoting homing of the gRNAs. In the second gene drive that was tested, authors made use of the endogenous Saglin locus, which expresses a natural agonist for Plasmodium, and is thus desirable to target for disruption in a gene drive that aims to reduce vector competence for Plasmodium. This gene drive also uses recoded Lp-Sc2A10 to replace the wild-type Lp allele, thus selecting for Sc2A10 expression, however this drive is not dependent on fitness of individuals with only one functional copy of Lp.<br /> The authors discovered that the efficacy of the zpg promoter to drive homing of cas9 is locus-dependent, limiting the success of their gene drive designs. They do show, however, that the Saglin gene drive succeeds at reaching high frequencies in mosquito populations using instead the vasa promoter to express cas9, and that these transgenic mosquitoes are able to reduce infectivity of sporozoites in a bite-back mouse model. However, they observe gene drive refractory mutations in the Lp gene, despite its highly conserved nature, showcasing the difficulty of avoiding drive resistance even in small populations of mosquitoes, and also observed deletions of gRNAs targeting both Lp and Saglin, further highlighting possible shortcomings in gene drive approaches. Together, these findings are useful to the field in walking the readers through an interesting and promising approach for a novel gene drive, and illustrating the challenges in engineering an efficacious and long-lasting drive.

      Major comments:

      As the authors are able to observe Plasmodium within mosquitoes, it would be useful to have these data in the manuscript pertaining to the prevalence and intensity of infection in mosquitoes prior to bite-back assays. If there are data or images that the authors could include, it would be helpful to show if there is a possibility that infection intensity is a variable that contributes to whether or not mice develop an infection. It would also be interesting to note whether there is a different in infection (oocysts or sporozoites) between transgenic mosquitoes and wild type mosquitoes.

      --- This is a valuable suggestion. Please note that, in order to evaluate the transmission-blocking properties of the Lp-2A10 allele (acting at the sporozoite level), we discarded non-infected mosquitoes prior to bite-back experiments, so that infection prevalence was 100% in the mosquitoes retained for the bite-back. We have not systematically compared parasite loads between transgenic and control mosquitoes. In some experiments comparing Lp-2A10 mosquitoes and their control, we dissected a subset of the mosquito midguts after bite-back to visually ascertain that they showed roughly equivalent oocyst numbers between transgenic and controls. However, we have not precisely recorded these data. It is possible that slightly decreased lipid availability in Lp::2A10 mosquitoes (their lipophorin allele producing slightly less Lp than the WT) negatively affects the parasite, as suggested by previous studies highlighting the role of host lipophorin-derived lipids for parasite development in the mosquito (Costa et al, Nat Commun 2018; Werling et al. Cell 2019; Kelsey et al. PLoS Path 2023).

      In the case of Lp-2A10 mosquitoes additionally containing a GD in Saglin, it is expected that they should carry lower parasite numbers than their controls, an effect of the Saglin knockout mutation alone (Klug et al., PLoS Path 2023). Re-inforcing the transmission blocking effect of the 2A10 antibody by reducing parasite loads via the Saglin KO was indeed our intention. Hence, having selected the most infected mosquitoes for our bite-back experiments likely attenuated this desired effect, but we still observed a 90% transmission decrease when the two modifications were combined, compared to a 70% decrease with Lp-2A10 alone. We do not plan to perform additional infections experiments for the current manuscript on Plasmodium berghei expressing Pf-CSP, but we do intend to record parasite counts in a follow-up study with an optimized SagGD transgene and Plasmodium falciparum infections. This will be of high relevance for potential future applications in malaria control.

      The authors also go into significant detail in the discussion exploring ideas of how to optimize or improve this specific gene drive design. The authors should also stress further the applicability of their discoveries in other gene drive designs, and emphasize the lessons they learned in the difficulties encountered in this study and how these findings could guide others in their decision making process when choosing targets or elements to include in a potential gene drive approach.

      --- We feel that we already emphasized these lessons in the manuscript, in the discussion and when justifying the chosen strategies in the Results section. Lessons for future designs include:

      • inserting an antimalarial factor into an essential endogenous gene, preserving its function, can provide many benefits (high expression level, secretion signal that can be hijacked, endogenous introns can be hijacked to host a marker, inactivation by mutagenesis or epigenetic silencing being more difficult…);

      • a distant-locus gene drive (as here in Saglin) could potentially drive several antimalarial cargoes at the same time, inserted in different loci;

      • non-essential mosquito genes agonistic to Plasmodium are attractive host loci for a GD, an already old idea illustrated here by the case of Saglin;

      • multiplex gRNAs are a viable approach to reduce the formation of GD-resistant alleles in essential genes and/or to increase the frequency of loss-of-function alleles, which will either disappear if the gene is essential or decrease vector competence if the gene is pro-parasitic. Hence gRNAs targeting intron sequences should be avoided in order to preserve this benefit, as illustrated by one of our Lp gRNAs targeting the first intron and that contributed to generate the only Lp viable resistance allele identified in this study;

      • To increase long-term stability of the GD construct, repeats should be minimized in gRNA multiplexes through the use of a single promoter and various spacers (tRNAs, ribozymes?) – it remains to be seen if the 76-nucleotide gRNA constant sequence itself, necessarily repeated, will stimulate unit losses in a gRNA multiplex;

      • The best promoter to restrict Cas9 expression to the germ line may be zpg in some but not all loci; the vasa promoter causing maternal Cas9 deposition may still be envisaged if resistance allele formation can be prevented by other means (targeting hyper-conserved essential sequence, multiplexing the gRNAs against an essential gene…).

      Minor comments:

      Line 44 - female sterility but also female killing approaches to crash pop. like X shredder, if authors would like to expand

      --- Female killing citation of Simoni et al, 2020 added (line 45).

      Lines 48-60 - Authors should add some references from the literature surrounding ethics and ecology studies related to gene drive release

      --- we added: (e.g., National Academies of Science, Engineering, and Medicine, 2016; Courtier-Orgogozo et al., 2017; de Graeff et al., 2021) on lines 49-51.

      Line 114 - Given the only moderate impacts of Saglin's role in Plasmodium invasion, I am not sure this saglin deletion is a convincing benefit for GD as it is probably not impactful enough alone - can the authors soften this statement?

      --- while it’s correct that Saglin KO mosquitoes show a significant decrease only in P. berghei oocyst counts and not in prevalence when mosquitoes are heavily infected, they do show a significant decrease in both counts and prevalence upon infection with P. berghei and, most importantly_, P. falciparum_ when parasite loads are lower —a situation that is more physiological (e.g. prevalence of 65% and 13% in WT and Sag(-)KI mosquitoes, respectively, upon infection with P. falciparum - Klug et al., PLoS Path 2023). Therefore, for human-relevant P. falciparum infections, an impactful decrease in vector competence can be legitimately expected.

      Line 126 -Can the authors provide rationale for expressing Sc2A10 with Lp instead of expressing it from salivary glands?

      --- There are three reasons for this. First, we knew from the cited Isaacs et al. papers that the 2A10 antibody was efficient against transmission when expressed in the fat body, and from unpublished work (Maria Pissarev, Elena Levashina and Eric Marois) that anti-CSP ScFvs expressed in the fat body of transgenic mosquitoes blocked sporozoite transmission as efficiently as when expressed from salivary glands. This is certainly favored by the easy sporozoite accessibility to the antibody when both are in mosquito hemolymph. Of note, the transmission blocking results suggest that the binding of ScFv to CSP withstands the crossing of the salivary gland epithelium by sporozoites. Second, we were looking for a host gene expressed as high as possible to produce high levels of Sc2A10 antibody. Third, the host gene must be essential so that resistance alleles would not be viable.

      We agree that it would also be possible to use a salivary gene instead of Lp as a host for this antimalarial factor. In this case, a same-locus gene drive may have functioned, but the advantages of the host locus being an essential gene would be lost, at least partially, as genetic ablation of the salivary gland, albeit slowing blood uptake, does not prevent mosquito viability and reproduction (Yamamoto et al., PLoS Path 2016).

      Line 140 - Can authors give any comment on why these regions of Lp were chosen to be recoded / targeted with gRNAs?

      --- inserting Sc2A10 just after the cleaved Lp secretion signal, and N-terminally to the rest of the Lp protein, was the goal, so that 2A10 would be secreted together with Lp and separated from both signal peptide and Lp by naturally occurring proteolysis. This constrained the choice of the target site to be at the junction between signal peptide and the remainder of Lp protein. An alternative design could have been to insert it between the two subunits ApoLpI and ApoLpII, with duplication of the protease cleavage site, or on the C-terminal extremity of the protein, but there would have been no intron in the immediate vicinity to knock-in a selection marker at the same time.

      Line 171 - "stoichiometric"

      --- Corrected.

      Line 186 - Can the authors comment or speculate on why the expression levels of the fusion protein are expected to be lower than endogenous Lp?

      --- We did not expect this. It is hard to predict whether and explain how insertion of exogenous sequences in a gene can alter its expression. Possible explanations include: the existence of harder-to-translate mRNA sequences in the Sc2A10 moiety; the addition of seven exogenous amino acids on the N-terminal side of ApoLpII (mentioned in M&M) possibly modifying the stability of the Lp protein; the modification of the intron sequence perturbing efficient intron excision and/or pre-mRNA expression due to the disruption of regulatory elements or to the new presence of the GFP gene in the antisense orientation (albeit expressed in the nervous system and not in the fat body); the presence of the exogenous Tub56D transcription terminator used to arrest GFP transcription possibly possessing bidirectional termination activity and lowering the mRNA level of the Lp allele…

      Line 211 - Why were 6 mosquitoes used for these assays, and 10 mosquitoes used in later assays (Line 223)?

      --- Mice were always exposed to groups of 10 mosquitoes, but not all 10 mosquitoes were necessarily biting the mice. We retained mice bitten by at least 6 mosquitoes for further analysis (M&M, lines 871-873 of the revised file).

      Line 212 - I would also suggest using letters (Suppl. Table 2A,B,C etc) to refer the specific experiments and sections in the Table.

      --- Implemented.

      Line 225- 228 - The authors should mention in the text that homozygotes and heterozygotes do not differ in infection assays.

      --- Added: Therefore, heterozygous mosquitoes showed a transmission blocking activity comparable to that seen in homozygotes.

      Line 249 - Can the author comment on the impacts of population influx / exchange on the idea that the GD cassette need only be transiently in the population?

      --- If Lp::Sc2A10 is fixed in the population and the GD gone, indeed an influx of WT alleles through mosquito immigration will begin to replace the antimalarial factor and drive it to extinction due to its fitness cost. As mentioned in the final paragraph of the discussion, this could be seen as an advantage to restore the original natural state—hopefully after malaria eradication! However, we regard a situation where Lp::2A10 never reaches fixation as more likely, with its spread being re-ignitable by updated GDs (line 741 of the revised file).

      Line 273 - Can the authors comment on why this may have occurred more frequently than the expected integration of the GD cassette?

      --- When a chromosome break is repaired, each side of the cut must recombine with the repair template. A possible explanation for our observation is that one side of the break recombined with the injected repair plasmid, while the other recombined with the intact sister chromosome (physiologically probably the preferred option). Since this situation still leaves truncated chromosomes, another repair event can join the plasmid-bearing chromosome end to the sister chromosome. The observation that complex rearrangement occurred frequently suggests that such events can be very common, but will usually go undetected due to the absence of genetic markers. Here, GFP on the intact sister chromosome served as a genetic marker to betray its unexpected involvement in the repair process.

      Line 314 - Not all fitness costs are apparent through standard laboratory rearing as was performed in Klug et al. Authors could consider "no known fitness cost" instead.

      --- We agree. This is what we meant by “no fitness cost in laboratory mosquitoes”. We changed this to “no fitness cost at least in laboratory conditions (Klug et al., 2023)” to make clear that this was tested.

      Line 407 - don't start new paragraph (same with 409)

      --- we removed these two lines, as we realized they contained an error, and made a correction on line 420 of the revised manuscript.

      Line 408 - I'm not sure it's clear why all these populations were kept for a different number of generations - can the authors clarify?

      --- Populations 1 and 2 were the oldest founder populations, therefore maintained for the longest time. As described in the text, all other populations were derived from populations 1 and 2 later in time by outcrossing a subset of individuals to WT mosquitoes. For these derived populations, we reset the clock of generation counting to 0 as we monitored the homing phenomenon “from scratch” in transgenic males crossed to WT, and in transgenic females crossed to WT. Resetting the clock resulted in an apparent lower number of generations for these derived populations. In addition, some of them were discarded early, usually after reaching a stable state, as it was difficult to maintain so many populations in parallel over a long period of time.

      Line 558 - "10/12 mice" not immediately clear - the authors could be more specific about how data was combined here

      --- Thank you for pointing out this ambiguity. We replaced by: the absence of infection in a total of 10 out of 12 mice showed… (line 561)

      Line 586 - Since there do appear to be some fitness costs associated with the Sc2A10 version of Lp, might it be expected that fitness costs imposed by the transgene itself could lead to selection pressures leading to its loss? Or do the authors think that these fitness costs are prevented from causing selection against Sc2A10 due to the design of the transgene such that its translation is a prerequisite for Lp's translation? Is the fact that its removal occurs more rapidly than Lp's any indication that selection against the persistence of Sc2A10 may occur?

      --- Yes, we believe that Lp::Sc2A10 will progressively disappear, replaced by the WT allele, as shown in Figure 1C, in the absence of a GD stimulating its maintenance and spread. In the Lp::Sc2A10 transgene, translation of Sc2A10 is indeed a prerequisite for Lp translation, imposing a degree of genetic stability of this transgene in terms of sequence integrity, but this does not mean that the locus cannot be outcompeted by the WT under natural selection, so that long-term persistence of Lp::Sc2A10 depends on the presence of the GD, as outlined in lines 669-672. As the GD itself can disappear due to the accumulation of resistance alleles, we expect a progressive lift of its pressure to maintain Lp::Sc2A10 and both loci to be progressively lost, a form of reversibility that may be regarded as desirable (lines 773-776 in v2, 741-743 in v3). Alternatively, both transmission blocking alleles could be maintained by releasing an updated version of the dual GD.

      Line 659 - add some further detail to this - how do you envision this to occur?

      --- We have deleted this paragraph, as it hypothesized that SagGD could frequently be transmitted to the next generation in the absence of Lp::2A10, which is not the case (it would be lethal, and Lp::2A10 homing is anyway extremely efficient). After a putative field release of [SagGD / Y; Lp::2A10/ Lp::2A10] males, both transgenes should rapidly be introgressed in the field’s genetic background.

      Line 635 - Long paragraph, should be broken up or removal of text. Some of these ideas could possibly be made more concise to improve readability. There are many different hypotheticals that are expanded upon in the discussion.

      --- We admit that this paragraph in the discussion was long and dense. We have split it into 4 smaller paragraphs to better separate the concepts that we want to discuss, and have deleted the part mentioned in the above point.

      Line 677 - This scenario seems potentially unrealistic considering the only subtle impacts of Saglin deletion on vector competence, and the potential for population exchange in mosquito populations to dilute out these alleles if the drive begins to fail. Can the author comment or potentially decrease emphasis on such scenarios?

      --- while Saglin KO mosquitoes show a moderate decrease of infection prevalence in the context of high infections, the Saglin KO decreases parasite loads in all cases, and most importantly, also prevalence upon physiological infections with P. falciparum (Klug et al., PLoS Path 2023 and see our response to your comment to line 114 above). This yields a higher proportion of non-infected mosquitoes. Therefore, the impact of Saglin mutations should be stronger for the epidemiology of human infections with P. falciparum than in laboratory models of infections where parasite loads are very high.

      We agree that mosquito migration in natural populations would progressively dilute out the beneficial alleles once the GD effect ceases. The epidemiological impact is difficult to predict and will strongly depend on the durability of the GD and on the intensity of genetic influx from adjacent mosquito populations.

      Line 708 - Can the authors speculate on why zpg is sensitive to local chromatin and elaborate on possible solutions or consequences for other drive ideas? This seems broadly important.

      --- We do not precisely know why the zpg promoter is more sensitive to local influences than the vasa promoter, but this phenomenon seems common for other promoters as well (e.g., the sds3 promoter as opposed to the shu promoter in Aedes aegypti (Anderson et al., Nat Comm 2023)). It is possible that the vasa promoter is better insulated from local repressive influences, perhaps by insulating elements akin to gypsy insulators in Drosophila. Knowledge of genetic insulators active for mosquito genes is lacking as far as we know. Characterization of efficient mosquito insulators, for example if one could be identified within vasa, and their combination with zpg or sds3 promoter elements, could potentially improve the locus-independent activity of such promoters. Alternatively, a natural and ideal promoter may still be found showing both an optimal window of expression of Cas9 in the germline, and little susceptibility to local repression.

      Line 737 - The suggestion of releasing laboratory-selected resistance alleles in the absence of further context may be provocative and unnecessary here.

      --- We didn’t intend to sound provocative, but are interested in the idea of simple resistance alleles with limited sequence alteration that could be selected in the lab, and released to block a gene drive that turned undesirable, so we wanted to share it with the reader. Mutations in the Lp and Saglin loci, preserving their functions, can be limited to one or few nucleotide changes in the gRNA target sites, as illustrated by the mutants we sequenced. Lab population of GD mosquitoes can, therefore, be a source of GD refractory mutants that could be leveraged in recall strategies.

      Line 850 - unnecessary comma

      --- Corrected.

      Line 854 - change to "after infection, moquitoes were "

      --- Changed.

      Figure 1 - Not clear what is intended to be communicated by shapes portraying proteins / subunits - consider more detailed illustration of mosquito fat body cells synthesizing and secreting proteins rather than words in text box with arrow to clearly demonstrate the point of this figure.

      --- We propose a new version of figure 1 to better illustrate the fat body origin of Lp and 2A10. We have also re-worked the graphic design to improve several figures.

      Figure 3 - I recommend rearranging this figure so that B comes before C, visually. The proportions for the design of in B should also match those used for A.

      --- We have followed these recommendations in the new Figure 3, and also used more logical color codes for the gRNAs and their target genes.

      Figure 5 - It is unclear to me why some Populations were maintained for such different lengths of time.

      --- Same point as above for line #408: Populations 1 and 2 are the oldest founder populations, therefore maintained for the longest time. As described in the text, all other populations were derived from populations 1 and 2 later in time by outcrossing to WT mosquitoes, resulting in a lower number of generations for these derived populations. In addition, some of them were discarded earlier, usually after reaching a stable state, as it was not possible to maintain so many populations in parallel for a long period of time.

      Figure 7 - Ladder should be labeled on the gel. It may also be helpful for the author to indicate clearly exactly which mosquitoes were shown by sequencing to have these different deletions, as it is occasionally unclear based on band sizing.

      --- we have added the ladder sizes as well as a numbering of individual mosquitoes on Figure 7. We sequenced 4 gel-purified small -type B- amplicons of Population 1 individually (#1, 2, 4, 6), and a pool of 4 type B amplicons from Population 7 (pooled #2, 4, 5, 6) as well as two samples of several pooled gel-purified large -type A- amplicons from Population 2 (pool of samples #2, 3, 4, 5, 6, 8, 9, 11, 12) and from Population 7 ( pool of #1, 3, 7, 11, 12). This information now also appears in the material and methods section (PCR genotyping of the SagGDvasa gRNA array).

      Line 996 - given that there is a size band on the right line of this gel also, can authors crop the gel image to eliminate unnecessary lanes a and b from this figure without losing information needed to interpret this blot?

      --- we agree that this would make the message easier to understand, but cropping lanes a and b would place WT control and Lp::Sc2A10 homozygotes on two separate images, even if a size marker is present on each. We prefer keeping the raw image to facilitate direct comparison of the band sizes, making clear that this was a single protein gel.

      Line 1070 - 12 out of how many sequenced mosquitoes?

      --- 12 mosquitoes from each of these four populations served as PCR templates to generate figure 7. A subset of amplicons were sequenced individually or pooled, as described above and now in Methods. All sequencing reactions of type A and type B amplicons showed consistent results.

      Line 1078 - Can remove some detail like % of agarose, and replication of results with different polymerase as these are already in methods.

      --- Done.

      Line 1098 - "Unbless"

      --- Corrected

      Reviewer #2 (Significance):

      This study illustrates a wide range of issues pertinent for gene drive implementation for malaria control, and as such is of value to the field of entomologists, genetic engineers, parasitologists and public health professionals. The gene drive designs explored for this study are interesting largely from a basic biology perspective pertinent mostly to specialists in the field of genetic engineering and vector biology, but highlight challenges associated with this technology that could also be of interest to a broader audience. A transmission blocking gene drive has not yet been achieved in malaria mosquitoes, and is thus a novel space for exploration. As a medical entomologist that works predominantly outside of the genetic engineering space, I have appreciated the detail the authors have provided with regard to their rationale and findings, even when these findings were inconsistent with the authors' primary objectives or expectations.

      --- Thank you for your positive assessment and for this in-depth evaluation of our data.

      Reviewer #3 (Evidence, reproducibility and clarity):

      The study by Green et al. generated a gene drive targeting both Saglin and Lipophorin in the Anopheles mosquito, with a view to blocking Plasmodium parasite transmission. This is a highly complex but elegant study, which could significantly contribute to the design of novel strategies to spread antimalarial transgenes in mosquitoes.<br /> Overall, this is a complex study which, for a non-specialist reader gets quite technical and heavy in most parts. Despite this, there are key points showing that suppression gene drive may not be the way forward in this instance. However, I would advise explaining certain elements in more detail for the benefit of the general readers. I only have minor points for the authors to address:<br /> 1) Please point out for the general reader that Anopheles coluzzii belongs to the gambiae complex, since you explain that gambiae are the major malaria spreaders in sub-Saharan Africa.

      --- done in the introduction (lines 71-73) also in response to Rev. 1

      2) The authors pretty much give all results in the last part of the introduction, could the intro be shortened by removing these parts, or just highlighting in a single paragraph the main take home message?

      --- We have condensed this part to highlight the take home messages in the last paragraph, also in response to Rev. 1.

      3) Why is Vg mentioned? It is only mentioned once and doesn't have any other mention through the manuscript.

      --- this introduces the two proteins that are by far the most abundant, and present at similar levels, in the hemolymph of blood-fed females, Vg being also prominent on the Coomassie stained gel of fig.1. We mention Vg also because it represents another excellent candidate locus to host anti-plasmodium factors, as discussed later on lines 600-610 of the Discussion section.

      4) Please make it clearer for non-specialists why Cecropin wasn't used.

      ---On lines 630-636 we explain that we decided to leave out Cecropin to avoid potential additional fitness costs due to expression at all life stages in the fat body, as opposed to solely in the midgut after blood meal (Isaacs et al. PNAS 2012); and to avoid complexifying the anti-Plasmodium Lipophorin locus in a way that could further reduce the functionality of the Lp gene. We also had prior knowledge from unplublished work that Sc2A10 alone was sufficient to block sporozoite infectivity.

      5) Why were homozygous and not heterozygous transgenics transfected if there is such as fitness cost to homozygous mosquitoes?

      --- the fitness cost of homozygous mosquitoes is actually mild, unnoticeable if homozygotes are bred in the absence of competing heterozygotes and wild-types (lines 151-156). Microinjection experiments to obtain the different versions of SagGD were, therefore, performed on either the heterozygous or homozygous line. As for infection assays, the anticipated effect of gene drive is to promote homozygosity at the Lp::Sc2A10 locus. For this reason, it made sense to test the vector competence of homozygotes, in addition to the fact that the Plasmodium-blocking phenotype was expected to be stronger (and thus, easier to document) with two copies of the transgene. Only after obtaining a large dataset from infection assays with homozygotes did we test heterozygotes and found that they actually had a similar phenotype.

      6) Line 211 - what was the average number of infected mosquitoes used per infection for each mosquito strain?

      --- As described in the text (lines 204-206 of v2; 208-212 of the revision) and in the Methods (lines 868-873), non-infected mosquitoes were discarded prior to performing the experiment using 10 infected mosquitoes per mouse, and we discarded mice bitten by fewer than 6 mosquitoes. So at least 6 infected mosquitoes bit each mouse (often 8-9).

      7) Line 219 - please be clearer regarding this being infection detected in the blood.

      --- We replaced « infection » with « detectable parasitemia in the blood »

      8) Line 320 - please clarify why the zpg promoter was used.

      --- The advantages of zpg are mentioned in lines 257-258 and 320-322 (revised file).

      9) Line 375 - what was the rationale for using so many gRNAs?

      --- 3 or 4 gRNAs against Lipophorin and 3 gRNAs against Saglin, amounting to a total of 6 or 7 gRNAs against the two loci. The rationale is explained on lines 249-253 : the goal was to maximize the chance of causing loss-of-function mutations in the essential Lp gene and to favor elimination of GD resistant alleles by natural selection, in case of failed homing. For Saglin which is a non-essential gene, we wanted to ensure loss-of-function of failed homing alleles to achieve a reduction in vector competence, even if GD-resistant alleles accumulate. We sought to make this rationale clearer by adding a sentence on lines 328-332:

      Multiplexing the gRNAs was intended to promote the formation of loss-of-function alleles in case of failed homing at the Lp and Saglin loci: non-functional alleles of the essential Lp gene would be eliminated by natural selection while non-functional Saglin alleles would reduce vector competence.

      Line 555 - please state how long post bite back parasite appears in infected mice.

      --- We changed this sentence to : …two of these six mice developed parasitemia six days after infection<br /> (line 556).

      Reviewer #3 (Significance):

      This is potentially a highly significant study that could provide a vital mechanism for generating efficient gene drives. Although highly technical and complex in most parts, with a little clarification in certain areas this manuscript could be of great value to a general readership.

      --- Thank you for your appreciation and thoughtful evaluation of our manuscript.

      Reviewer #4 (Evidence, reproducibility and clarity):

      The authors hijacked the Anopheles coluzzii Lipophorin gene to express the antibody 2A10, which binds sporozoites of the malaria parasite Plasmodium falciparum. The resulting transgenic mosquitoes showed a reduced ability to transmit Plasmodium.

      The authors also designed and tested several CRISPR-based gene drives. One targets Saglin gene and simultaneously cleaves the wild-type Lipophorin gene, aiming to replace the wildtype version with the Sc2A10 alele while bringing together the Saglin gene drive.

      Drive-resistant alleles were present in population-caged experiments, the Saglin-based gene drive reached high levels in caged mosquito populations though, and simultaneously promoted the spread of the antimalarial Lp::Sc2A10 allele.

      This work contributes to the design of novel strategies to spread antimalarial transgenes in mosquitoes. It also displays issues related to using multiplexing gene-drive designs due to DNA rearrangements that could prevent the efficient spread of the gene drive in the long term.

      This is tremendous work considering how many transgenic lines and genetic crosses are performed using mosquitoes. The conclusions are supported by the data presented, and some modifications regarding the experimental design description through text/figure improvements would facilitate the reading and flow of the paper.

      Here some questions/comments:

      • Line 124-125: Reference?

      --- added

      • Line 133-134: Reference?

      --- added

      • Table 1: It seems the authors have some issues recovering a good amount Sc2A10 from hemolymph samples. Is this a problem of the antibody per se? Is it the Lp endogenous promoter weak? Could this be improved by placing the antibody in a different genomic region? Alternatives could be discussed.

      --- The 2A10 antibody must be initially produced in the same, very high, amounts as the Lp endogenous protein with which it is co-translated. Therefore, its low relative abundance must result from faster turnover or stickiness to tissue, as hypothesised on lines 176-177. We believe that virtually any other endogenous promoter would be weaker than Lp and produce lower Sc2A10 levels.

      • Fig.1B: It would be nice to have a representation of the genome after integration. You could add a B' panel or just another schematic under the current one.

      --- In agreement with this suggestion and that of rev. 3, we added a new panel in 1B.

      • Supplementary Fig.1b: Could the authors explain the origin of the (first) zpg promoter used? Is it from An. Coluzzii? It seems they use a different one in the gene drive designs later (see comments below too).

      --- We initially cloned a PCR-amplified zpg promoter region of the same size as the version published by Kyrou et al., from genomic DNA from our colony of A. coluzzii. The resulting promoter fragment harbored several single nucleotide polymorphisms (SNPs) compared to published sequences, as typically observed when cloning genomic fragments due to high genetic diversity in Anopheles species. Such SNPs are not usually expected to affect promoter activity, but are difficult to distinguish from PCR mutations which, in turn, could decrease or abolish promoter activity if mutating an essential transcription factor binding site. For this reason, our next constructs were based on the validated zpg sequences from Kyrou et al. The first cloning strategy was described in the results section but was missing in the material and method section. This is now corrected (lines 773-779).

      • Fig.3: Please, correct to A, B, C order. Current one is A, C, B.

      --- Done.

      Could the authors include a schematic of the final mosquito genome after integration? I can see they are targeting two different locations (Saglin and Lp). It is unclear though from the figure where the Sc2A10-GFP is coming from. I understand this represents the mosquito genome as you injected heterozygous animals already containing the Sc2A10-GFP. Maybe label the Sc2A10-GFP as mosquito genome or similar? A schematic showing mosquito embryos already carrying this and then the plasmid being injected could help.

      --- Figure 3 does not represent the injection of new transgenic constructs. Instead, it shows the conversion process of chromosomes X and II in a germ cell carrying both transgenes in the heterozygous state, to illustrate how the dual gene drive can spread in a population after WT mosquitoes mated with transgenics carrying both the SagGD and Lp-2A10 alleles. We have re-worked the graphic design of this figure and modified its title to make this more clear.

      • Line 330-331: Do you know the transgenesis efficiency? Did the authors make single or pools for crossing and posterior screening? It would be interesting to know about transgenesis rates to inform the community.

      --- we no longer perform single crosses for transgenesis, as batch crosses ensure higher recovery of transgenics due to the collective reproductive behavior (swarming) in Anopheles. Therefore, we cannot precisely calculate the transgenesis efficiency. However, >60 positive G1s from a pool of 36 G0 males crossed to WT females is indicative of a rather high integration efficiency. We consistently observe high efficiency of transgene integration when using the CRISPR/Cas9 system, that we estimate to be about 5-fold more efficient than docking site transgenesis, and much more efficient than piggyBac mediated transgenesis.

      • Line 357/Fig.4B: Could the authors explain in the text GFP+ vs. GFP++?

      --- GFP++ was meant to indicate higher intensity of GFP fluorescence than GFP+, due to two copies of the transgene versus one, but see our response to reviewer 1’s comment to line 356 about the questionability of homing in the zygote.

      • Line 357: Where is the vasa promoter that made the "rescue" coming from? Is it amplified from Coluzzii? Please, include this explanation for clarification. Why the authors think the zpg from Kyrou et al 2018 works for the cassette integration but not for homing? They discuss positional effects, any references showing that?

      --- We amplified the vasa promoter from A. coluzzii using primers CggtctcaATCCcgatgtagaacgcgagcaaa and CggtctcaCATAttgtttcctttctttattcaccgg (annealing sequence underlined) to have a fragment equivalent to that (vas2) characterized in Papathanos et al, 2009. We have now added this information in the Methods under Plasmid construction. This is the only source of vasa promoter used in this work.

      About zpg promoter activity : we have past experience suggesting that promoters, such as the hsp70 promoter from Drosophila, can be sufficient to express enzymatic activities in embryos injected with helper plasmids, even though the same promoters appear to become inactive once integrated in the genome. This may be due to injected “naked” plasmids being readily accessible to the transcription machinery, unlike organized chromatin. A recent reference showing genomic positional influences on promoter efficiency is Anderson et al., 2023, which we have added on line 710 of the Discussion.

      • Line 362: No reference to figure nor table.

      --- These data (numbers from a COPAS analysis) are provided directly in the text in this sentence (which has been clarified in response to Reviewer 1). See lines 364-369 of the revision.

      • Line 417: The text brings the reader back to Fig.3C. Could the authors move this panel for easier flow of the paper?

      --- We agree that positioning of this panel in Figure 3 is a bit awkward, but this western blot pertains to the characterization of the insertion shown in Fig. 3. Placing it after COPAS analyses would be equally awkward.

      • Line 472-474: How many WT alleles were recovered? It is not stated unless I missed anything, which is possible.

      --- We refrained from providing a quantification of this, and focussed on qualitative results, as we didn't trust the quantitative representativity of our high-throughput amplicon sequencing results in terms of allele frequency in the sampled mosquito population. A large fraction of sequenced reads corresponded to PCR artefacts such as primer dimers and unspecific short amplicons, potentially affecting the relative frequencies of gene-specific amplicons. However, among the sequenced gene-specific amplicons, WT alleles were the majority (lines 474-475).

      • Fig.5. Could the authors discuss why the observed DsRed-gene drive drop in population 1 at ~18 generation? The population gets to the point where only 50% of the population carries the Cas9-DsRed cassette. Considering that the Saglin gene drive only converts through females (inserted into the X chr.), and some indels could be generated by generation 20, how do you explain the great recovery until fully spreading into the population?

      --- We agree that this is somewhat puzzling. We don’t have a satisfactory explanation beyond stochastic effects, possibly promoted by population bottlenecks: although we strived to maintain these populations at a high number of individuals at each generation, we cannot exclude that at a given generation only a relatively small fraction of individuals contributed to the next generation, leading to fluctuations in allelic frequencies. This would be possible particularly for populations 1 and 2, which were not monitored frequently between generations 10 and 18, at which point additional populations 5-8 were established and it was decided that close monitoring of all populations was important.

      It seems to me populations 3-8 are new cage experiments by randomly picking mosquitoes from populations 1 and 2 (at a specific generation) and mixing them with WT individuals. Could the authors explain the reasoning for these experiments? I believe populations 3-8 deserves a different figure (main or supplementary) describing how they were seeded. It is confusing having everything together as these experiments were performed differently way and for a different reason compared to populations 1 and 2. Some cage schematics and drawings would help in understanding the protocol strategy for populations 3-8.

      --- This is correct for populations 3 and 4 that indeed originated from randomly picking mosquitoes from populations 1 and 2 at generation 10 and mixing them with WT individuals. Populations 5, 6, 7 and 8 are crosses between generation 16 transgenic partners of one sex to WT of the other sex, as indicated above the COPAS diagrams provided in Suppl. File 2. We apologize for having insufficiently described how each population was assembled and now provide more details (lines 422-429, in the figure 5 legend, and G0 crosses spelled out on top of each population diagram). In setting up these populations, we wanted to test the effects of various routes by which the transgenes may be introduced into a wild mosquito population: release of unsorted transgenic males + females, or release of one sex only (probably males in the field, but the crosses with transgenic females as with transgenic males also served to re-quantify homing in the second generation of each cross).

      The modified text reads as follows:

      Populations 3 and 4 were established by mixing randomly selected transgenic mosquitoes (both males and females of generation 10) from populations 1 and 2, respectively, with wild-types, to mimic what may occur in a mixed-sex field release. Populations 5-8 were established by crossing single-sex transgenic mosquitoes to WT of the opposite sex, both to mimic a single-sex field release and to re-assess homing efficiency after 16 generations.

      Also, could you add homozygous and heterozygous labels in the figure legend to help understanding the different lines.

      --- As indicated on the side of the figure and in the figure legend, lines don’t represent homozygous vs. heterozygous frequency, but allele frequency (continuous lines), and frequency of mosquitoes carrying the transgene (dotted lines). In the figure legend we now provided the calculation formulas for gene frequency: [ 2 x (number of homozygotes) + (number of heterozygotes)] / 2 x (total number of larvae) for the autosomal Lp::2A10 transgene, and [ 2 x (number of homozygotes) + (number of heterozygotes) ] / 1.5 x (total number of larvae) for the X-linked SagGD transgene.

      • Fig.6: The authors sequenced non-DsRed individuals from generations 3-4. The authors also mentioned they sequenced mosquitoes from generation 32 (Fig.7). Interestingly, they observed that these mosquitoes were missing a piece of the cassette (they contained 2 gRNAs instead of 7). Since the amplicons only cover the gRNA portion, a PCR covering the Zpg-Cas9 portion would be ideal to confirm that only the gRNAs are missing. Sampling DsRed+ mosquitoes from generations 3, 18 and 31 (populations 1 and 2) and carrying out these experiments is recommended. Although unlikely, I would be worried about the Cas9 being deleted due to unexpected DNA rearrangements; in that case, the cassette would contain the DsRed marker alone.

      --- Thank you for this suggestion. We no longer have DNA samples from the earlier generations. Thus, we genotyped 7 DsRed positive male mosquitoes from each of current populations 1, 2 and 7 (generation 41 since transgenesis) for the presence of Cas9. We detected a Cas9-specific amplicon of 1.6 kb in 21/21 sampled DsRed positive mosquitoes, in parallel to the same shortened gRNA arrays detected in earlier generations. This suggests that the Cas9 part of the transgene was not affected by the loss of gRNA units. We made a panel C in Figure 7 showing these results and mentioned them on lines 537-538. Of note, the Cas9 moiety of the gene drive construct shows no repetitive sequence and should therefore not be as unstable as the gRNA multiplex array. The observed excisions of gRNA expression units were strictly due to recombinations between repeated U6 promoter sequences (Fig. 7).

      The authors employ 3 different gRNAs that are 43 and 310 nts apart. It has been shown that only 20 nt lack of homology produces an important reduction on gene drive performance (Lopez del Amo et al 2020, Nat Comms). Also, it has been shown that gRNA multiplexing approaches should be kept with a low number of gRNAs, 2 being maybe the best one depending on the design (Samuel Champer 2020, Sciences Advances). This could be discussed more.

      --- Thank you for this suggestion. These results were not published when this study was initiated, so that our gene drive constructs could only be designed on empirical bases. For gRNA numbers, see the new discussion point and inclusion of a reference to the study by S. Champer et al., on line 700-702. The reduction of drive performance with longer non-homologous stretches is indeed also a very important point, that we now discuss on lines 713-717, citing your study:

      Finally, tighter clustering of gRNA target sites at target homing loci, especially Saglin, should improve gene drive performance by reducing the length of DNA sequences flanking the cut site that bear no homology to the repair template on the sister chromosome and need to be resected by the repair machinery to allow homing (López Del Amo et al., 2020)__.

      Reviewer #4 (Significance):

      There are different novelty aspects from my point of view in this work. While most of the scientists focus on developing CRISPR-based gene drives in An. Stephensi and gambiae, this work employs An. Coluzzii. Some limitations regarding fitness cost associated with the Lp gene were also noted and discussed by the authors.

      --- To be fair, earlier gene drive studies were performed on the G3 laboratory strain, traditionally named A. gambiae, although it is probably itself a hybrid strain from gambiae and coluzzii. Still, the Ngousso strain from Cameroon that was used in this study is thought to be a bona fide A. coluzzii. We have also added a reference to a recent paper (Carballar-Lejarazu et al., 2023) that also describes a population modification GD in A. coluzzii.

      First, they show that An. Coluzzii mosquitoes infect less when containing the antimalarial effector cassette inserted in their genomes. Second, a gene drive is showing super-Mendelian inheritance in An. Coluzzii, which would be the second example of a gene drive in these mosquitoes so far to my knowledge.

      I believe this is the first manuscript experimentally using multiplexing approaches (multiple gRNAs) in mosquitoes (all previous works I saw were performed in flies). While previous gene-drive works employ only one gRNA in mosquitoes, this works explores the use of different gRNAs targeting nearby locations to potentially improve HDR rates and gene drive spread. Although they observe gene drive activity, they also show DNA rearrangements due to the intrinsic nature of multiplexing gene drives that can generate multiple DNA double-strand breaks, impeding proper HDR and clean replacement of the wildtype alleles. This is important from a technical point of view as it shows this approach requires optimization. They included 3 gRNAs targeting the Saglin gene, and trying 2gRNAs instead could be interesting for future investigations.

      --- We now discussed optimization with the help of modeling, in response to Reviewer 1, on lines 701-702.

      This work will be very useful for the CRISPR-based gene drive field, which seeks to develop genome editing tools to control mosquito populations and reduce the impact of vector-borne diseases such as malaria.

      This reviewer intended to understand the work and provide constructive feedback to the best of my abilities. I apologize in advance if I misunderstood anything.

      --- Thank you for your appreciation, insight, and constructive evaluation of our manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors made a two-component homing modification gene drive in Anopheles coluzii with a different strategy than usual. The final drive itself targets and disrupts the saglin gene that is nonessential for mosquitoes, but important for the malaria parasite. The drive uses several gRNAs, and some of these target the Lp gene where an anti-malaria antibody is added, fused to the native gene (this native gene is also essential, removing nonfunctional resistance alleles at this locus). In general, the system is promising, though imperfect. Some of the gRNAs self-eliminate due to recombination of repetitive elements, and the fusion of the antimalaria gene had a modest fitness cost. Additionally, the zpg promoter was unable to operate at high efficiency, requiring use of the vasa promoter, which suffers from maternal deposition and somatic expression (the latter of which increased fitness costs at the Lp target). The manuscript has already undergone some useful revisions since its earliest iteration, so additional recommended revisions are fairly modest.

      Line 43-45: The target doesn't need to be female sterility. It can be almost any haplosufficient but essential target (female sterility works best, so it has gotten the most study, but others have been studied too).

      Line 69: A quick motivation for studying Anopheles coluzii should be added here (since gambiae is discussed immediately before this).

      Introduction section: It might be helpful to break up the introduction into additional paragraphs, rather than just two.

      Introduction last part: The last part of the introduction reads more like an abstract or conclusions section. Perhaps a little less detail would fit better here, so the focus can be on introducing the new drive components and targets

      Line 207-213: This material could go in the methods section. There are some other examples in the results that could be similarly shortened and rearranged to give a more concise section.

      Line 283-287: I couldn't find the data for this.

      Line 291: Replace "lied" with "was".

      Line 356: Homing in the zygote would be considered very unusual and is thus worthy of more attention. While possible (HDR has been shown for resistance alleles in the zygote/early embryo), this would be quite distinct from the mechanism of every other reliable gene drive that has been reported. Is the flow cytometry result definitely accurate? By this, I mean: could the result be explained by just outliers in the group heterozygous for EGFP, or perhaps some larvae that hatched a little earlier and grew faster? Perhaps larvae get stuck together here on occasion or some other artifact? Was this result confirmed by sequencing individual larvae?

      Results in general: Why is there no data for crosses with male drive heterozygotes? Even if some targets are X-linked, performance at others is important (or did I miss something and they are all X-linked). I see some description near line 400, but this sort of data is figure-worthy (or at least a table).

      Lines 362-367: What data (figure/table) does this paragraph refer to?

      Lines 405-406: There may be a typo or miscalculation for the DsRed inheritance and homing rate here. Should DsRed inheritance be 90.7%?

      Figure 5: The horizontal axis font size for population 8 is a little smaller than the others.

      Line 454: In addition to drive conversion only occurring in females and the somatic fitness costs, embryo resistance from the vasa promoter would prevent the daughters of drive females from doing drive conversion. This means that drive conversion would mostly just happen with alleles that alternate between males and females.

      Line 481: Deletions between gRNAs certainly happen, but I wouldn't necessarily expect this to be the "expectation". In our 2018 PNAS paper, it happened in 1/3 of cases. There were less I think in our Sciences Advances 2020 and G3 2022 paper. All of these were from embryo resistance from maternal Cas9 (likely also the case with your drive due to the vasa promoter). When looking at "germline" resistance alleles, we have recently noticed more large deletions.

      Figure 6C: It may be nice to show the wild-type and functional resistance sequence side-by-side.

      Lines 642-644: This isn't necessarily the case. At saglin, the nonfunctional resistance alleles may still be able to outcompete the drive allele in the long run. This wasn't tested, but it's likely that the drive allele has at least some small fitness costs.

      A few comments on references to some of my studies:

      Champer, Liu, et al. 2018a and 2018b citations are the same paper.

      For Champer, Kim, et al. 2021 in Molecular Ecology, there was a recent follow-up study in eLife that shows the problem is even worse in a mosquito-specific model (possibly of interest as an alternate or supporting citation): https://elifesciences.org/articles/79121

      One of my other previous studies was not cited, but is quite relevant to the manuscript: https://www.science.org/doi/10.1126/sciadv.aaz0525<br /> This paper demonstrates multiplexed gRNAs and also models them, showing their advantages and disadvantages in terms of drive performance. Additionally, it models and discusses the strategy of targeting vector genes that are essential for disease spread but not the vectors themselves (the "gene disruption drive"), showing that this can be a favorable strategy if gene knockout has the desired effect (nonfunctional resistance alleles contribute to drive success).

      This one is less relevant, but is still a "standard" homing modification rescue type drive that could be mentioned (and owes its success to multiplexing): https://www.pnas.org/doi/abs/10.1073/pnas.2004373117<br /> The recoded recuse method was also used in mosquitoes (albeit without gRNA multiplexing) by others, so this may be a better one to mention: https://www.nature.com/articles/s41467-020-19426-0

      Sincerely,<br /> Jackson Champer

      Referees cross-commenting<br /> Other comments look good. One thing that I forgot to mention: for the 7-gRNA construct with tRNAs, the authors mentioned that it was harder to track, but it sounds like they obtained some data for it that showed similar performance. Even if this one is not featured, perhaps they can still report the data in the supplement?

      Significance

      Overall, this study represents a useful advance. Aside from being the first report for gene drive in A. coluzii, it also is the first that investigates the gene disruption strategy and is the first report of gRNA multiplexing in Anopheles. The study can thus be considered high impact. There are also other aspects of the study that are of high interest to gene drive researchers in particular (several drives were tested with some variations).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of changes

      I thank the reviewers for their thorough feedback on this paper and providing me with such a detailed list of recommendations. I have been able to incorporate many of their suggestions, which I believe has greatly improved this paper.

      The most important changes:

      • I added comparisons to the lexicon- and rule-based sentiment algorithms TextBlob and VADER to Supplementary Fig. 4. This shows the superiority of ChatGPT in scoring the sentiment of scientific texts compared to existing and already-validated tools for sentiment analysis based on natural language processing. [Suggestion Reviewer 2]

      • I added the measure intra-class correlation to Fig. 3b, emphasizing the inconsistency in sentiment scores across different reviews of the same paper. [Suggestion Reviewer 3]

      • I added Supplementary Fig. 6, in which I directly propose different experiments to test the causes of the observed gender effects on peer review. [Suggestion Reviewer 3]

      • I further studied the issue of variability in responses by ChatGPT (Supplementary Fig. 2), and learned that this has greatly improved in the latest version of ChatGPT (for Version Aug 3, 2023, R2 values of 0.99 (sentiment) and 0.86 (politeness) were reached). I show these findings in Supplementary Fig. 2. [Suggestions Reviewers 1 and 3]

      • Throughout the manuscript (most notably in the Abstract and Discussion), I emphasize that this is a proof-of-concept study, and make suggestions on how to scale this up across journals and fields. I also toned down certain claims given the relatively small sample size of this study, including in the abstract. I also more prominently and elaborately discuss the limitations of the study in the Discussion section. [Suggestions Reviewers 1, 2 and 3]

      • I made many smaller changes to text, figures and references on the basis of the reviewers’ comments. [Suggestions Reviewers 1, 2 and 3]

      Notably, Reviewer 3 has provided me with a very detailed list of recommendations for follow-up experiments. I appreciate their ideas, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted papers. As suggested by this reviewer, I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review.

      Based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      Reviewer #1 (Public review)

      Strengths:

      The innovative method is the biggest strength of this article. Moreover, the method can be implemented across fields and disciplines. I myself would like to see this method implemented in a grander scale. The author invested a lot of effort in data collection and I especially commend that ChatGPT assessed the reviews twice, to ensure greater objectivity.

      I want to thank this reviewer for commending the innovative methodology of this study. I appreciate that this reviewer would like to see this methodology implemented at a grander scale, which is a view that I share. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores).

      The reviewers have provided me with a list of potential follow-up experiments, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript of a journal. In addition, as suggested by Reviewer #3, I am looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Importantly, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Weaknesses:

      I have several concerns regarding the methodology of the article. The first relates to the fact that the sample is not random. The selection of journal and inclusion and exclusion criteria do not contribute well to the strength of the evidence.

      Indeed, the inclusion of only accepted manuscript from a single journal is the biggest caveat of this paper. I have re-written much of the Abstract to emphasize that this is a proof-of-concept paper, hoping that other researchers concurrently expand this method to larger and more diverse datasets.

      An important methodological fact is that the correlation between the two assessments of peer reviews was actually lower than we would expect (around 0.72 and 0.3 for the different linguistic characteristics). If the ChatGPT gave such different scores based on two assessments, should it not be sound to do even more assessments and then take the average?

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #3. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations).

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      Reviewer #1 (Recommendations to author)

      I had some difficulties reading the article, so it would maybe help to structure the article more (e.g. In the introduction there are three aims stated, so the Statistical Analysis section could be divided in three sections, and instead of the link to figures, the author could state which variables were analysed in a specific manner) to be easier to comprehend the details. Also, I found on one place that the sample consisted of 572 reviews, and on other that it was 558.

      These are very good points. I re-wrote the statistical analysis for clarity (Page 7 of the manuscript). The 558 reviews was a mistake from my part, as I forgot to include the fourth review for the 14 papers that received four reviews in the histograms of Fig. 2b and the accompanying text. This has been updated.

      For figures 1a and 1b it could be considered to enter the table instead of several figures.

      I thank the reviewer for pointing this out. I tried this suggestion, but I found it to reduce the readability of the paper. As an alternative, I now provide an Excel spreadsheet with all the raw data, so people can find all the characteristics of the included papers.

      99.8% of the reviews analysed were assessed as polite. This is, in my opinion, extremely important finding, which shows that reviewers are still holding to certain degree of standards in communication, and it can be mentioned in the abstract.

      I very much agree with this reviewer; this has now been added to the Abstract.

      In results you state that QS World Ranking is "imperfect" measure. When stating that in the results section, it poses the question why it is used in the study, so maybe it is more suitable for the discussion.

      This point is well taken. Even though the QS World Ranking score is imperfect, I still think it can be useful, as a rough proxy of perceived prestige of an institution. I now removed this “imperfect measure” statement from the Results section, and moved it to the Discussion (Page 5).

      In the Results section, instead of using only p values, please add measures of effect (correlations, mean differences), to make it easier to place in the context.

      For the significant effects of Fig. 4, I have added these to the figure legends. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      I think the results interpretation should be softened a bit, or the limitations of the study should be placed as the second paragraph in the discussion, since this was only specific journal with specific subfield.

      I agree with this reviewer that the relatively small sample size of this paper demands more careful wording. Throughout the manuscript, I have toned down claims, and emphasized the “proof of concept” nature of this study (for example in the Abstract). I also moved the limitations section to the second paragraph of the Discussion, and elaborate more on the study’s caveats.

      Methods:

      The measure Review time was assessed from submission to acceptance, but this does not need to be review time since it takes a lot of time sometimes to find reviewers. that needs to be stated as the limitation.

      This point is well taken. I changed this to “Paper acceptance time” in Fig. 3 and the accompanying text.

      Gender name determination methods differed between the assessment of the first authors and the last authors, and that needs stronger explanation.

      I appreciate this reviewer raising this point, which has also been raised by Reviewer #3. For this paper, I have carefully weighed the pros and cons of automated versus manual gender determination. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process.

      I also realize that my rationale for the different methods of gender determination was not explained well enough in the original submission; I now explain my reasoning more elaborately on Page 7 on the manuscript.

      For sentiment analysis: Please state based on what the GPT made a decision? Which program? (e.g. for gender it used genderize.io)

      This has been added to Page 7.

      Finally, your entire analysis can be made reproducible (since everything is publicly available). You can share ChatGPT chats as online materials with variables entered with the dataset analysed and the code. This would increase the credibility of the findings.

      I will make the entire raw dataset available through the eLife website, including all reviews and their scores.

      Reviewer #2 (Public review)

      Strengths include:

      1) Given the variability in responses from ChatGPT, the author pooled two scores for each review and demonstrated significant correlation between these two iterations. He confirmed also reasonable scoring by manipulating reviews. Finally, he compared a small subset (7 papers) to human scorers and again demonstrated correlation with sentiment and politeness.

      2) The figures are consistently well presented and informative. Figure 2C nicely plots the scores with example reviews. The supplementary data are also thoughtful and include combination of first/last author genders. It is interesting that first author female last author male has the lowest score.

      3) A series of detailed analysis including breaking down reviews by subfield (interesting to see the wide range of reviewer sentiment/politeness scores in computational papers), institution, and author's name and inferred gender using Genderize. The author suggests that peer review to blind the reviewers to authors' gender may be helpful to mitigating the impoliteness seen.

      Thank you.

      Weaknesses include:

      1) This study does not utilize any of the wide range of Natural Language Processing (NLP) sentiment analysis tools. While the author did have a small subset reviewed by human scorers, the paper would be strengthened by examining all the reviews systematically using some of the freely available tools (for example, many resources are available through Hugging Face [https:// huggingface.co/blog/sentiment-analysis-python ]). These methods have been used in previous examinations of review text analysis (Luo et al. 2022. Quantitative Science Studies 2:1271-1295). Why use ChatGPT rather than these older validated methods? How does ChatGPT compare to these established methods? See also: colab.research.google.com/drive/ 1ZzEe1lqsZIwhiSv1IkMZdOtjPTSTlKwB?usp=sharing

      This was a great recommendation by this reviewer, and I have tested ChatGPT against TextBlob and VADER, the two algorithms also used by the Luo et al. study — see Supplementary Fig. 4. Perhaps unsurprisingly, these algorithms performed very poorly at scoring sentiment of the reviews. Please note that I also tested these two algorithms at scoring individual sentences, Tweets and Amazon reviews, which it did very well (i.e., the software package was working correctly). Thus, ChatGPT is better at scoring scientific texts than TextBlob and VADER, likely because these algorithms struggle with finding where in the review the sentiment is conveyed. I now discuss this on Pages 1, 3 and 4 of the manuscript.

      2) The author's claim in the last paragraph that his study is proof of concept for NLP to analyze peer review fails to take into account the array of literature already done in this domain. The statement in the introduction that past reports (only three citations) have been limited to small dataset sizes is untrue (Ghosal et al. 2022. PLoS One 17:e0259238 contains over 1000 peer review documents, including sentiment analysis) and reflects a lack of review on the topic before examining this question.

      I thank this reviewer for pointing me to this very useful study. I regret missing this one in my initial submission; I now discuss this paper in Pages 1 and 5 of the manuscript.

      3) The author acknowledges the limitation that only papers under neuroscience were evaluated. Why not scale this method up to other fields within Nature Communications? Cross-field analysis of the features of interest would examine if these biases are present in other domains.

      I share this reviewer’s opinion that it would be very interesting to expand this analysis to different subfields. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Reviewer #3 (Public review)

      Strengths:

      On the positive side, I thought the use of ChatGPT to score the sentiment of text was novel and interesting, and I was largely convinced by the parts of the methods which illustrate that the AI provides broadly similar sentiment and politeness scores to humans who were asked to rank a sub-set of the reviews. The paper is mostly clear and well-written, and tackles a question of importance and broad interest (i.e. the potential for bias in the peer review process, and the objectivity of peer review).

      Thank you.

      Weaknesses:

      The sample size and scope of the paper are a bit limited, and I have written a long list of recommendations/critiques covering diverse aspects including statistical/inferential issues, missing references, and suggestions for other material that could be included that would greatly increase the usefulness of the paper. A major limitation is that the paper focuses on published papers, and thus is a biased sample of all the reviews that were written, which prevents the paper properly answering the questions that it sets out to answer (e.g. is peer review repeatable, fair and objective).

      I very much appreciate this reviewer taking the time to provide me with such a detailed list of recommendations. Below, I will respond to this list in a point-by-point manner.

      Reviewer #3 (Recommendations to author)

      My main issues with the paper are that it is not very ambitious, and gave me the impression the aim was to write the first paper using ChatGPT to address this question, rather than to conduct the most thorough and informative investigation that would have been feasible (many obvious questions that could be addressed are not tackled, since the sample size is small and restricted). There are also issues with selection bias, and the statistical analysis, that have possibly led to erroneous inferences and greatly limit what conclusions can be drawn from the analysis. I hope my comments of use in further improving the paper.

      The repeatability of ChatGPT when calculating the two linguistic characteristics is low. Taking the average of multiple assessments is one way to deal with this. To verify that taking the average of, say, 5 scores gives a repeatable score, the author could consider calculating 10 scores for a set of 20-30 reviews, calculating two scores for each review using the first 5 and second 5 ChatGPT ratings, and then calculating repeatability across the 20-30 reviews. It is important to demonstrate that ChatGPT is sufficiently repeatable for this new method to be useful.<br /> Also, it might be possible to automate this process a bit to save time - e.g. the author could change the ChatGPT prompt, like "please rate the politeness of this review from -100 to +100, do it 10 times independently, and print your 10 ratings as well as their average". Hopefully the AI is smart enough to provide 10 independently-computed ratings this way, saving the need to copypaste the prompt into the chat box 10 times per review.

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #1. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations). I also tested this Reviewer’s suggestion to ask ChatGPT to score many times, and give separate scores for each iteration — this worked very well.

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      To my mind, the main reason to use an AI instead of one or more human readers to rank the sentiment/politeness of peer reviews is to save time, and thereby allow this study to have a larger sample size than would be feasible using human readers. With this in mind, why did you choose to download only 200 papers, all from the discipline of Neuroscience, and only from Nature Communications? It seems like it would be relatively easy to download papers from many more journals, fields of research, or time periods if using AI-based methods, and in fact it would have been feasible (though fairly laborious) for one person to read and classify the sentiment of the reviews for 200 papers.

      As well as providing more precise estimates of the parameters you are interested in (e.g. the consistency of reviews, and the size of the difference in reviewer sentiment between author genders), expanding the sample beyond this small set of papers would allow you to address other interesting questions. For example, you could ask whether the patterns observed for neuroscience are similar to those in other research disciplines, whether Nature Comms is representative of all journals (given there are other journals with public reviews), and you could test whether the male-female differences have become greater or smaller over time (e.g. by comparing the male-female differences observed in the past to the effect size observed in 2022-23). Additionally, the main analyses in this paper would have higher statistical power - for example, you only include 53 papers with a female senior author, giving you quite low power/ precision to estimate the gender difference in the average sentiment of reviews (given the high variance in sentiment between papers).

      I want to thank this reviewer for taking the time about possible ways to increase the impact of this work. I agree, these are all great suggestions, and there are many possibilities to apply ChatGPTbased natural language processing to scientific peer review. Respectfully, I chose to continue with publishing this work in the form of a proof-of-concept paper, because I currently do not have the resources to perform this (quite labor intensive) study. Below I will explain my reasoning, that I also shared with Reviewers #1 and #2.

      I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals. The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Also, if you could include some reviews of papers that were reviewed double-blind, you could test whether the gender-related differences in peer reviews are ameliorated by double-blind reviewing. Nature Comms (and many other journals with open review) do have some double-blinded papers, and there is evidence that that double-blinding is preferentially selected by authors who think they will experience discrimination in the peer review process (DOI: 10.1186/s41073-018-0049-z), and also that double-blinding does ameliorate bias (DOI: 10.1111/1365-2435.14259), so this seems very relevant to the ideas under study here.

      I note that the PLOS journals allow open peer review, and there is an API for PLOS which one can use to download the reviews for a given paper (e.g. try this query to get to the XML file of a paper which has open peer review: http://journals.plos.org/plosone/article/file?id=10.1371/ journal.pone.0239518&type=manuscript). Using an API could allow this project to be scaled up, because you can programmatically search for the papers with open reviews, download those reviews using the API and some code, and then score them using the same ChatGPT-based methods used for Nature Comms. Also, Publons recently merged with Web of Science (Clarivate), and you can now read all the open peer reviews on Web of Science for papers which had open review (e.g. for this paper: https://www-webofscience-com.napier.idm.oclc.org/wos/woscc/fullrecord/WOS:000615934800001). It would be possible to write to Web of Science, request access to their data or search engine, and programmatically download many thousands of papers and their associated reviews, and then use ChatGPT or a similar AI to score them all (especially if you can pass the reviews to ChatGPT for scoring programmatically, instead of manually copy-pasting the reviews into the chat box one at a time as it appears was done in the present study).

      These are great suggestions, and I have different plans for follow-up studies, including the use of APIs to download large batches of peer reviews. The analyses in this paper have been performed in February of this year, even before the ChatGPT API had been released, which did not let me automate the process at that time. As a result, these analyses have been performed manually. I realize that the field is moving rapidly, and that there are now different options to scale this up quickly.

      I plan on using the suggestions from this Reviewer for follow-up experiment in a next paper, and publish this revision as a proof-of-concept paper. In this way, different researchers can optimally use ChatGPT-based sentiment analyses for similar studies without a delay.

      As you acknowledge, there is a selection bias in this study, since you only include papers that were ultimately published in Nature Comms (missing reviews of papers that were rejected). This is a really big limitation on the usefulness of some of your analyses. For example, you found no relationship between author institutional prestige and reviewer sentiment. This could be evidence of a fair and impartial review process (which seems unlikely!), or it could be a direct result of selection bias (specifically a "collider bias", like the famous example involving height and skill among professional basketball players). The likelihood that a paper is published is positively related both to its quality and the prestige held by the authors, we might expect a flatter (or even negative) correlation between prestige and reviewer sentiment among papers that were published than among the whole set of papers (like how the correlation between height and speed/skill is less positive among NBA players than among the general population, since both height and speed/skill provide advantages in basketball).

      I agree with this reviewer that the selection bias is a major limitation of this study. I rewrote much of the Abstract and Discussion to tone down claims, and more prominently discuss the limitations of this study. I also made several suggestions for follow-up experiments.

      In the section "Consistency across reviewers", you write that there was little similarity between review sentiment scores from different reviewers from the same paper, and then write "This surprising result indicates high levels of disagreement between the reviewers' favorability of a paper, suggesting that the peer review process is subjective." However I disagree with this conclusion for three reasons:

      • Firstly, your dataset only includes papers that were published, and thus there is a selection bias against manuscripts where both/all reviewers disliked the paper - the removal of this (probably large) set of reviews will add a (potentially very strong) downward bias to your estimate of how consistent the review process is (since you are missing all those papers where the reviewers agreed). I think that one cannot properly answer the question "are reviewers consistent in their appraisals" without having access to papers that were rejected as well as those that were accepted.

      I agree with this reviewer that there is a selection bias in this study, which I acknowledged throughout the initial submission of this manuscript. Indeed, having access to reviews of rejected papers will greatly increase my confidence in this finding. However, if there is consistency across reviewers in the entire pool of (post-review rejected+accepted) manuscripts, some of that has to trickle down into the pool of accepted papers. The correlation between sentiment scores of the different reviewers is so strikingly low (or even absent) that I simply cannot envision a way in which there is consistency across reviewers in the pre-editioral decision stage. Yet, I realize that this point is debatable. Therefore, I changed the phrasing of the Discussion section, including the following sentence:

      That being said, the extremely low (or even absent) relation between how different reviewers scored the same paper was striking, at least to this author.

      • Secondly, the method used to assess whether the reviews for each paper tend to be similar (shown in Figure 3b) does not fully utilize the information contained in the data and could be replaced with another method. (In the paper 3 univariate regressions compare the sentiment scores for R1 vs R2, R1 vs R3, and R2 vs R3, which needlessly splits up the data in the case of papers with more than 2 reviewers, reducing power.) You could instead calculate the intraclass correlation coefficient (aka 'repeatability'), to determine what proportion of the variance in sentiment scores is between vs within papers (I suggest using the excellent R package rptR for this). Note that the sentiment scores are not normally distributed, and so regular regression (as you used) or one-way ANOVA (which you might be tempted to use for the ICC calculation) are not ideal - consider using a GLM or transformation (the rptR package automates the tricky calculation of repeatability for generalized models).

      I thank this reviewer for pointing me towards this option. I added this analysis to Fig. 3b, which confirmed the inconsistency in sentiment scores for reviews of the same paper (ICC = 0.055). As suggested by this reviewer, I decided to perform the ICC on log-transformed data, as ICC calculation is very sensitive to non-normally distributed data.

      • Thirdly, an alternative and very plausible hypothesis for this lack of similarity (besides peer review being highly subjective) is that ChatGPT is estimating the "true sentiment" of a review (i.e. what the reviewer intended to say) with some amount of error (e.g. due to limitations/biases in the AI, or reviewers struggling to make themselves understood due to issues such as writing in a second language, typos, or writing under time pressure), which dilutes the similarly in the estimated sentiment of the reviews. In other words, if the true sentiment values are strongly correlated, but there is random error in how those values are estimated by ChatGPT, then the correlation between reviewer scores for each paper will tend to zero as the error tends to infinity. Furthermore a nebulous quality like "sentiment" cannot be fully summarised in a single variable running from -100 to +100, and if you had used a more multi-dimensional classification system for the reviews (or qualitative assessment by human readers) you might have found that there is a bit more correspondence (I'm speculating here, but I think you cannot really exclude this and the paper doesn't mention this limitation).

      This point is well taken. I added caveats to the Discussion section on Page 5. Altogether, after taking these caveats into account, I do believe that this analysis convincingly demonstrates subjectivity in the peer review of this subset of papers. That said, I hope that my re-written discussion and additional analysis have added the necessary nuance to this point.

      In Figure 3C, you write "Contribution of paper scores to review time". This strongly implies to the reader that the sentiment scores inferred for the reviews have a causal effect on the review time. This is imprecise writing (since the scores were calculated by you after the papers were published, and thus cannot be causal - you mean that the actual reviews affected the review time, not the scores), but more importantly you cannot infer any causality here since your dataset is observational/correlational. You could fix this by re-phrasing to emphasise this, e.g. "Statistical associations between paper scores and review time".

      This is a very good point raised by this reviewer. I have corrected the phrasing so it no longer implies causality.

      For the analysis shown in Figure 4d and Figure 4e, I am not certain what you mean by "data split per lowest/median/highest sentiment score". This is ambiguous, and I am also not sure what the purpose of this analysis is or what it shows - I suggest re-writing for greater clarity (and ideally providing the code used in all your analyses) and perhaps revising the analysis. Additionally, an important missing piece of information from this analysis (and most analyses in the paper) is the effect size. For example, you don't report what is the difference in politeness score and sentiment score between male and female authors, and what is the SE and 95% CIs for this difference. From eyeballing the figure, it looks like the difference in politeness is about 4 points on your 200point scale - this is small in absolute terms, but might be quite large in relative terms given that "politeness score" usually hovered around a small part of the full 200-point scale. What is this as a standardised effect size (i.e. in terms of standard deviations, as captured by effect sizes like Cohen's d and Hedges' g)? Calculating this (and its 95% CIs) would allow you to say whether the difference between genders is a "big effect", and give an idea of your confidence in your effect size estimate and any inferences drawn from it. You even discuss the effect size in your discussion, so it would help to calculate the standardised effect size. If you're not familiar with effect size and why it's useful, I found this paper very instructive: https://onlinelibrary.wiley.com/ doi/abs/10.1111/j.1469-185X.2007.00027.x

      I agree with this reviewer that this phrasing was ambiguous. I now rephrased this on Page 4 of the manuscript:

      To study whether these more impolite reviews for female first authors were due to an overall lower politeness score, or due to one or some of the reviewers being more impolite, I split the reviews for each paper by its lowest/median/highest politeness score. I observed that the lower politeness scores for first authors with a female name was driven by significantly lower low and median scores (Fig. 4d, bottom panel). Thus, the least polite reviews a paper received were even more impolite for papers with a female first author.

      I also added effect sizes of the significant effects from Fig. 4 to its figure legend. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      "Double-blind peer review has been debated before, but has come under scrutiny for various reasons" - this is vague and unhelpful. I think it's worthwhile to properly engage with the debate and the substantial body of evidence in your paper, given your main focus is on potential bias in the review process based on authors' identities (e.g. gender, institutional prestige).

      I thank the reviewer for pointing this out. I rephrased this sentence to indicate that there is evidence that it helps to remove certain forms of bias (Page 5):

      To address this issue, double-blind peer review, where the authors' names are anonymized, could be implemented. Evidence suggests that this is useful in removing certain forms of bias from reviewing8,9, but has thus far not been widely implemented, perhaps because some studies have cast doubt on its merits21,22.

      I have also added a Supplementary Fig. 6 to this paper, in which I lay out how my tool can be used to study bias by applying it to single- and double-blinded reviews (see also my answer to the other question about this topic below).

      On a related note, in the first paragraph, when discussing the potential of single-blind review to allow reviewers to essentially discriminate against papers by women, there is a key missing citation. This year, the first truly experimental test of this hypothesis was published (DOI: 10.1111/1365-2435.14259); a journal conducted a randomised controlled trial in which submitted manuscripts were reviewed either single- or double-blind. They found no effect of author gender on reviewer ratings or editorial decisions (though there was an effect of review type on success rate of authors from different countries). It would be better to cite this instead of reference 6, which as you acknowledge is methodologically flawed. This paper is also worth a read given your focus on Nature journals: DOI: 10.1186/s41073-018-0049-z.

      This point is well taken. I now cite this paper (citation #8) and rephrased this part of the Introduction (Page 1).

      "Another - arguably more simple - solution [compared to double-blind peer review] could be for reviewers to be more mindful of their language use." Here, you seem to be saying that we don't need to blind author names during peer reviewers, because it would simpler if all reviewers were simply nicer! I object to this because A) double-blind review is easy to implement, and greatly reduces the opportunity to tune the review to the author's identity (and there is some experimental evidence that it works in this regard), and B) it seems like wishful thinking to say that we don't need to implement measures that reduce the scope for bias, because all reviewers could instead stop using impolite language.

      This is a very valuable comment. I rephrased this to emphasize that this is an additional measure.

      "reviewers may want to use ChatGPT to extract a politeness score for their review before submitting" Yes, that's an interesting idea, and I can imagine that some (probably small) proportion of reviewers will be interested in doing this. But I think you should think bigger about wholesale changes to the review system that are possible because of AI like ChatGPT. For example, the submission platforms where reviewers submit their reviewers (e.g. ScholarOne, Manuscript Central) could be updated to use AI to pre-screen draft reviews, and issue a warning to reviewers, like "Our AI assistant has indicated that the writing in this review might be impolite (example phrases here) - would you like to edit your review before you submit it?" Also, reviewcredit platforms like Publons could display not only the number of reviews that someone wrote, but an AI-generated assessment of how constructive, detailed, and polite their reviews are (this would help nudge people into writing better reviews, and also give credit where it's due to careful reviewers, which is part of the aim of Publons and similar platforms). This is just off the top of my head - there are many other good ideas about how AI could transform the peer review process. Indeed, AI is already good enough to generate quite useful peer reviews and constructive criticism of draft papers, and will surely get better at this... this surely has lots of implications for science publishing over the coming decades.

      These are great suggestions for implementation of this tool. I now end the first paragraph of the Discussion (Page 4) with the following sentence:

      Such an automated language analysis of peer reviews can be used in different ways, such as afterthe-fact analyses (as has been done here), providing writing support for reviewers (for example by implementation in the journal submission portal), or by helping editors pick the best papers or most constructive reviewers.

      "Further research is required to investigate the reasons behind this effect and to identify in what level of the academic system these differences emerge." Here you could mention what this research would be - I think you'd need the full sample of reviewed papers, not just those that were accepted. Spell out what analyses would be required to test and falsify the various (very plausible and interesting) competing hypotheses that you mention for the male-female difference in sentiment scores.

      Great point. I added a Supplementary Fig. 6, in which I show a visual depiction of the experiments that can be performed to answer these questions.

      "areas of concern were discovered within the academic publishing system that require immediate attention. One such area is the inconsistency between the reviews of the same paper, highlighting the need for greater standardization in the peer review process." I disagree here. I think it is natural for there to sometimes be differences in how two or more reviewers rate the quality of a paper, even if the peer review process were carefully standardised (e.g. via the use of a detailed "peer review form", which helps guide reviewers to comment on all important aspects of the paper - some journals use these). This is because reviewers differ in their experience, expertise, or interests, and so some reviewers will catch mistakes that others miss, or request stylistic changes that others would not. More broadly, it's often not possible to write a version of the paper that satisfies all possible reviewers.

      I re-phrased part of the Discussion on Page 5 to indicate other sources of inter-reviewer variability. Specifically, I mention that some variability in sentiment can be expected based on the different backgrounds of the reviewers:

      Notably, some level of variability may be expected, for example due to different backgrounds, experiences, and biases of the reviewers. In addition, ChatGPT may not always reliably assess a reviews sentiment, adding some spurious inter-reviewer variability.

      Yet, as also mentioned in my response to one of the previous questions, I still find the the extremely low levels of consistency striking, even after taking these possible sources of interreviewer variability into account.

      "the maximum score an institution could receive was 100 (in 2023 this was Massachusetts Institute of Technology)" - this seems unnecessary information (just mention the score runs from 0-100).

      I agree with this reviewer that this was unnecessary information. This has been removed.

      "reviewers are generally familiar with the senior author of papers they review and thus are likely aware of their gender identity." This seems like a strong assumption, and you don't provide any evidence for it Speaking personally, as a reviewer and journal editor I am often not familiar with the senior author, or I am familiar with the first author - I am not sure how often I know the senior author but not the first author or vice versa. It's also not always the case that the first author is a junior scientist and the last author a senior, famous one, as you imply. I suggest that you use the same approach to score the gender of both author positions, namely inferring their gender programmatically from their name (I agree that generally the important thing for the purposes of this study is the gender that reviewers will infer from the name, not the author's actual gender, and so gender estimation from first names is the correct approach).

      I appreciate this reviewer raising this point, and I have carefully weighed the pros and cons of both approaches. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process. I now more elaborately explain why I made this decision on Page 7 of the manuscript.

      In the Abstract, you write "suggesting a gender disparity in academic publishing". This part of the sentence contains no information about what you think is the cause of the male/female difference, and no further interpretation of its ramifications, so I think you can just remove it (because "disparity" just means a difference, so you are effectively saying something redundant like "there was a difference between papers with male and female senior authors, suggesting there is a difference")

      I thank the reviewer for pointing this out. I replaced the latter part of this sentence with “(…) for which I discuss potential causes.”, which I think is better than a short summary of potential causes which may lack the nuance that such a topic deserves.

    1. Like nihilism, existentialism starts with a claim that there is no fundamental meaning or morality. But in existentialism, people must create their own meaning and morality.

      I see existentialism as a branch of nihilism, and I think in modern times the two have become somewhat intertwined, many using them somewhat interchangeably. Existentialism seems to be an extension of nihilism, as stated in the text, where it begins with the fundamental idea that morality is not a set definition.

      I personally find Existentialism to be the most "scientific" or "realistic" perspective (though, again, it is perspective) as we know morality is a human construct, and a social construct, which varies based on where you grew up.

      Writing this, I now wonder if Existentialism should be a prefix (or suffix) because you may believe in Existentialism but end up practicing a specific moral principle (ie natural rights or virtue ethics)

    1. To Carthage then I came

      By this point, I have developed a key interest in the structuring of these kinds of phrases. Every time that a geographical region/location is mentioned, the articles of speech rearrange—the sentence starts with a preposition, and the subject "I" comes after the name of the place. "By Richmond I raised my knees... "On Margate Sands. I can connect..." Of course, there are exceptions to this, but the structure is nevertheless eye-catching. It reminded me of Paradise Lost, which I read last year, where Milton engages with a similar diversion from traditional sentence structure. I am not sure what to make of this—except for the fact that, just as Milton's unconventional language occurred during the Enlightenment, a time of great "political upheaval" (Wikipedia), so might Eliot's language have been written in the context of WW1 and its own societal upheavals.

      According to Wikipedia:

      Carthage, a seaside suburb of Tunisia’s capital, Tunis, is known for its ancient archaeological sites. Founded by the Phoenicians in the first millennium B.C., it was once the seat of the powerful Carthaginian (Punic) Empire, which fell to Rome in the 2nd century B.C.

      The first detail I noticed in searching up Carthage were the "Phoenicians"—of course, this holds relevance to the "drowned Phoenician Sailor" mentioned in Section I. The Phoenicians were colonizers—"sailing" across the Mediterranean to grow a vast and powerful empire. Eventually, however, Carthage fell to the Romans—as did the Phoenicians. Perhaps this loss of power is symbolized the act of "drowning"; on the other hand, it could be the act of "burning" instead.

      We see this line as "To Carthage I came," as the first line in Confessions—except why is the word then added in TWL? It doesn't make sense, unless you think of the "coming to Carthage" as the result, or action following the previous line: "My people humbl[ing] people who expect / Nothing." These "people" may be the ones referenced in Confessions as the ones who, at Carthage, "sang all around me in my ears a cauldron of unholy loves." There are several things to unpack here. First of all, the people are singing, and their music is "unholy." This unholiness is the opposite of what takes place in the "Fire Sermon," where, in escaping the burning of the senses, "he knows... that he has lived the holy life." Secondly, the music is a cauldron. Thinking about what a cauldron itself does, it is a vessel usually where something is cooked in boiling liquid—essentially, being burned and drowned at the same time. Perhaps burning and drowning, in this sense, aren't two disparate means of suffering—but two sides of the same coin. Whereas burning is the suffering derived from desire, drowning is the stifling of power, and of "rest" (going back to Burial of the Dead), as a result of the suffering.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we would like to again thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article. With those comments in mind, we have now revised our manuscript. Please see below for a point-by-point response (our responses in green) to all comments.

      Reviewer #1 (Recommendations For The Authors):

      Sun and colleagues outline structural and mechanistic studies of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. The manuscript includes a crystal structure of the Ig-like domains of PrgB, cryo-EM structures of the majority of the intact polypeptide in DNA-bound and free forms, and an assessment of the phenotypes of E. faecalis strains expressing various PrgB mutants.

      Generally, the study has been conducted with a good level of rigor, and there is consistency in the findings. However, I do have some specific technical concerns relating to the study that necessitate the undertaking of additional experiments. These are summarized as follows:

      1) Recombinant PrgB188-1233 produced in the study purifies as a mixture of monomeric and dimeric species separatable by SEC. There is very limited discussion in the text re. the significance and/or implications of this. Is it feasible that the dimeric form is biologically relevant in the context of the in vivo situation? Or alternatively, is this simply an artifact of protein production?

      Experimental data that we published in 2018 indeed indicates that the dimer is relevant in the in vivo situation. We did not discuss this here since this was discussed in detail in the previous paper: Schmitt et al, 2018. We have now added a bit more information on this in the results section, highlighting this, so that it is clearer to the reader (lines 114-116).

      2) The authors see no evidence of the adhesive domain of PrgB in their PX structure highlighting that this must have been cleaved during crystallisation. Is this claim supported by an inspection of the crystal packing? It could be that this region of the protein is dynamic within the context of the crystal and is thus not observed. This should be clarified in the text either way.

      The crystal packing does not provide any space for the PAD. We have added this to the results section. We have added a sentence describing this in lines 122-124.

      3) The Cryo-EM structures reported are both at ~10-angstrom resolution. Are the authors truly confident in the placement of their crystal structures on these maps? Visual inspection indicates that their positioning of the PrgB domains into the EM envelopes is somewhat questionable. The authors need to provide some quantitative measures of the quality of their domain fitting. The narrative of the manuscript very much hinges on this being correct.

      This is something that the other reviewer also commented on. The fitting of the crystal structures in the maps are indeed not optimal, but was the best we could do with the available data. In line with point #6, we have now constructed new protein variants of the stalk domain (the four Ig-like domains) alone, and have assayed it’s interaction with the PAD in vitro using native gels and size exclusion chromatography. The outcome of these experiments is that the two domains do not interact in any substantial way on their own. Thus, the added experiments do not support the hypothesis that the PAD interacts with the Ig-like domains, at least not without the local high concentration provided by the linker region in the in vivo situation.

      To account for these new experiments, we have moved the cryo-EM structure to the supplement, and rewritten this part of the manuscript to say that the cryo-EM data indicated that there might be an interaction, but that we have not been able to verify this in vitro, indicating that if the interaction at all exists it must have a low affinity and is likely not physiologically relevant. In line with this, we have also further modified the text throughout the manuscript to account for this.

      4) The manuscript would be significantly strengthened if the authors could include confirmatory hydrodynamic data in support of the observed conformational reorganization of PrgB in the presence of DNA. SAXS analysis of the DNA-free and bound complexes would be ideal for this and would also help address the issues raised above in pt 3.

      To analyze PrgB radius with and without DNA, we tried both SEC-MALS and DLS experiments. It proved difficult to obtain precise and reproducible values, but the initial data indicated that no large changes were observed upon DNA binding. As we could also not measure specific interaction between the PAD and the stalk in vitro, we did not perform SAXS experiments. As mentioned in the response to point #3, we have modified the results and discussion regarding the potential interaction of th PAD and Stalk domains.

      5) The authors present binding studies of various PrgB mutant-expressing strains. A number of the mutations generated delete significant portions of the polypeptide. Can the authors confirm that these mutant proteins are correctly folded despite the introduced mutations? It could be that loss of function is simply a consequence of mutation-induced misfolding. I would like to see some confirmatory data (CD, SEC, etc.) in support of the foldedness of the mutant proteins.

      We cannot completely rule out that the folding of some of the variants is affected in E. faecalis. However, CD or SEC experiments would only give indications of the contrary if the overall fold had been majorly affected in an in vitro situation where the protein is not anchored to the E. faecalis cell wall.

      To alleviate this valid concern, we probed if all variants are correctly exported and linked to the cell-wall. Therefore we have now extracted the cell wall of E. faecalis producing wild-type or variant PrgB and performed Western blot . The results of the Western blot with cell wall extract largely matches the whole cell experiments that were in the initial manuscript. If a protein variant was largely misfolded, it would likely not be targeted and linked to the cell-wall, nor would it be stable in vivo. We have added this new data as a new fig 3 – figure supplement 1 and on lines 201-214

      6) The authors suggest a direct interaction between the PAD and the stalk domains in PrgB. The discussion of this is very generic and no evidence to support this is provided other than the 10-angstrom resolution EM map. If they believe this to be the case, then additional evidence should be provided.

      Answer: As mentioned previously, we have now performed additional in vitro experiments to probe this potential interaction, but conclude that this indication from the EM data is likely not a real high affinity interaction. In line with this, we have modified the results and discussion regarding this point, see also response to point #3 and 4.


      Reviewer #2 (Recommendations For The Authors):

      As currently presented, I don't feel that the cryoEM data support the authors' proposed model, largely because the fit of the crystal structures to the EM volumes does not seem entirely reasonable for the apo- dataset and because the EM volume for the ssDNA bound dataset is not even contiguous. For me to believe the model as it is currently built, I would want to see a dataset with the PAD deleted, showing that its proposed density disappears, or a dataset with a PAD-specific antibody as a fiducial marker. It would be nice to see some goodness of fit metric with a comparison to other crystal structures fit such low-resolution data as well. At the very least, the authors must include the standard cryoEM workflow supplementary figure showing representative micrographs, 2Ds, and 3Ds along with particle numbers.

      In line with the comments raised by reviewer #1, we have now added more experiments where we have analyzed the potential interaction between PAD and the stalk domain. From this new data, it looks like they do not interact with any substantial affinity, at least not on their own without any linker region holding them together, and that this interaction if it all exist likely is not physiologically relevant. The cryo-EM data has been moved to the supplement as we agree with both reviewers that the resolution, and the fitted model, is not good enough to draw any hard conclusions. The standard table for the cryoEM workflow was present as supplementary table 2, where eg particle numbers etc are described, but we have now also added a new supplementary fig 2 – figure supplement 2 that shows the EM processing workflow, including representative micrographs, 2D and 3D classes. We debated whether we should remove the EM data, but decided against it in line of transparency and to explain why the interaction studies with the PAD and stalk domains were performed.

      The X-ray crystallographic structure is very nice, but I was a bit surprised by the R factors in Table 1. After downloading the structure factors and coordinates from the PDB (thank you for depositing before submission!) I was able to see quite a few positive peaks in the difference map that could probably use some cleaning up. I realize I may just be a bit of a masochist when it comes to adding/deleting waters and moving around side chains to get things just right, but for such lovely data, I would have liked to see the model polished up a bit more. I was going to say that the isopeptide bond should be modelled, but I can see from a cursory Google that the authors did in fact try to find a way to model this and that it is indeed a bit of a pain.

      The model refinement proved surprisingly recalcitrant with regards to the remaining difference density, so we took the decision to only model what was solidly there (which leads to slightly higher R factors). We did indeed try to model the isopeptide bond, but we did not find a good way to do so (despite trying quite extensively), and ended up determining them as a linker in the PDB file, so that the bond shows up when one opens the structure in eg. Pymol.

      For protein production/purification in general I would have liked to see actual traces for the gel filtration and pure protein on a gel in a supplementary figure. I strongly believe that this type of information is so critical for future researchers looking to replicate or build upon published work so that they have some sense that what they are doing is working in the way it should be.

      We have now added a supplementary figure (as new Fig. 1 – figure supplement 1) that shows SEC and SDS-PAGE for the purification of PrgB188-1233.

      Finally, I think for the in vivo data it only makes sense to show the reader whether any or all the differences measured across your different mutants are statistically significant. Having done the graphing and analysis in GraphPad this should be a simple thing to achieve.

      We have now added statistical test (One way Anova) that show the statistical significance between the mutants, and show that in Fig 3 and Fig 4.

      Overall, I think it's a very nice paper and while I feel that the cryoEM data in its current form doesn't support the model of occlusion from PrgA, I also don't think that removing the cryoEM data and that specific mechanistic idea from the paper detracts from its overall message and impact.

      Thank you for those comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      p. 5, l. 87-90: The control of flgM by OmrA/B (PMID 32133913) and the antisense RNA to flhD (PMID 36000733) are other examples of known regulatory RNAs that impact the flagellar regulon.

      We thank the reviewer for pointing out these references and have added citations to them (page 5, lines 87-91).

      p.11/Fig. 3: it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA. I realize that it is outside of the scope of this study, but have the authors considered the possibility that ArcZ or McaS could have a role in the previously reported repression of rpoS by LrhA (PMID 16621809)?

      We agree that it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA, and added mention of this regulatory connection (page 12, lines 247-250).

      p. 13/l. 272: I do not understand why the authors say that "r-proteins were almost exclusively found in chimeras with MotR and FliX and no other sRNAs...", given that several other chimeras between r-prot and other sRNAs are found

      While some r-proteins encoding genes were found with other sRNAs in RIL-seq datasets, MotR and FliX generally had the highest numbers. The text was revised to better describe the RIL-seq data for r-proteins interaction partners (page 14, lines 291-295), and a new panel showing the S10 operon with all the interacting sRNAs was added to Figure 3—figure supplement 1B.

      Fig. 4 and 5: One possible improvement would be to more systematically assess the effect of base-pairing mutants of the sRNAs, such as MotRM1 or FliXM1 on fliC and rps/rpl genes in vivo. This is especially important for the mutants that affected the sRNA effects in the in vitro probing assays, such as UhpU-M2, MotR-M1 and FliX-S-M1 on fliC (Fig. S7)

      As suggested, we examined fliC mRNA levels across growth in motR-M1 and fliX-M1 chromosomal mutants. The results of these northern assays, now shown in Figure 8—figure supplement 1, are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background (page 21, lines 444446, 449-453).

      Fig. 5: it may be worth including a schematic of the whole S10 operon to highlight its length and its organization?

      As suggested, a schematic representation of the S10 operon was added to Figure 3—figure supplement 1 with a summary of the RIL-seq data for this operon.

      Probing data (Fig. 5, S7 and S9): in general, it is difficult to differentiate the thin and thick brackets, and what is indicated by the dashed brackets is not always clear. Maybe using a color-code instead could help? Highlighting the predicted pairing regions on the different gels could be useful as well.

      We thank the reviewer for this suggestion and color-coded the brackets (Figure 5, Figure 4figure supplement 2, and Figure 5-figure supplement 2). The correspondences to regions of predicted pairing are described in the figures legends.

      Fig. S10: The experimental evidence used to support FliX-dependent degradation of the rpsS mRNA is indirect (primer extension to observe higher levels of cleavage intermediates). It would be nice to be able to observe a decrease in the mRNA levels as well, either by Northern, or primer extension from a region more distant to the FliX pairing site.

      The S10 operon is long (~5 KB). We have tried multiple probes for this mRNA and detect many bands with each, likely due to extensive regulation of this operon. We think teasing out the origin of the different bands to appropriately interpret changes in patterns will require a significant amount of work.

      legend of Fig. S10: from the gel, it seems that only the plasmids differ in the samples, and it is not clear where the data corresponding to the WT strain mentioned in the legend is shown

      The samples shown in this figure are all for the indicated plasmids in the WT strain. We corrected the figure legend.

      Table S1: please define the NOR (normalized odds ratio?)

      The definition of Normalized Odds Ratio was added to the legend of Supplementary file 1.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figure 1B. Please add a negative control (which could be in the supplementary section) from a large section showing transcripts that are not directly influenced by Hfq.

      We think the flgKLO browser in this figure serves as a negative control; flgK and flgL clearly are not enriched on Hfq in contrast to FlgO. Figure 1B was generated using published datasets that are easily accessible to the readers at a genome browser and show many other examples of transcripts that are not influenced by Hfq: https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hpc.nih.gov/~NICHD- core0/storz/trackhubs/ecoli_rilseq/hub.hub.txt&hgS_loadUrlName=https://hpc.nih.gov/~NICHDcore0/storz/trackhubs/ecoli_rilseq/session.txt&hgS_doLoadUrl=submit

      Line 158. MotR* is a more abundant version of [the constitutively overexpressed] MotR. Is there a Northern or qPCR to confirm this? While I understand the relevance of these mutated constructs, their high expression can lead to artefactual effects.

      This is a valuable point and therefore we provided a northern blot to document the relative levels of MotR and MotR* (Figure 2—figure supplement 1A).

      Figure 2. The overexpression of MotR/MotR* from a plasmid is increasing the number of flagella. However, when the MotR gene is deleted, is there a reduction of the number of flagella? Same question with FliX: what happens when the fliX gene is deleted? According to the model described in the manuscript, we should expect fewer flagella in ΔmotR background and an increased number of flagella in ΔfliX background. Both Figure 2 and Figure 8 would benefit from additional experiments with deleted motR and fliX genes.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provided such data in Figure 8 and Figure 8—figure supplement 1 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. The chromosomallyexpressed MotR-M1 and FliX-M1 base pairing mutants did show the expected phenotypes of reduced and increased numbers of flagella, respectively (Figure 8A-B). As suggested by reviewer 1, we added northern analysis that examined fliC mRNA levels across growth in motRM1 and fliX-M1 chromosomal mutants. The results of these northern assays are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with the expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs, respectively.

      Figure 3 is key to demonstrating the sRNAs pairing with their specific targets and potential effect on bacterial swimming. However, these results would be more relevant with endogenous expression of the sRNAs and demonstration of their effects on the same targets. A Northern blot showing the overproduced sRNA level compared to endogenous sRNA level could help us appreciate the expression ratio.

      The levels of the UhpU, MotR and FliX expressed from the overexpression plasmids are at least 100-fold higher than the endogenous levels. Thus, we agree that assays of chromosomal deletion/point mutants are important experiments. We did construct chromosomal uhpU-M1 and uhpU∆seed sequence mutants. However, under the conditions assayed, the uhpU chromosomal mutations did not result in observable effects on motility or FlhD-SPA protein levels. It is possible we would be able to detect differences between the wild type and uhpU chromosomal mutant strains under different growth conditions or in different assays, but this would require a significant amount of work. For many other sRNA chromosomal mutations have no or only subtle effects, suggesting redundancy between sRNAs or sRNA roles in fine tuning gene expression.

      Figure 4. In panel B, the empty plasmid pZE alone seems to positively affect the flagellin expression when compared to the WT background. This can also be seen in Figure 4C. There is no fliC signal with empty plasmid pBR* but a strong fliC signal with empty plasmid pZE. Maybe the authors can explain this in the manuscript.

      With respect to panel B and Figure 4—figure supplement 1A, we agree that there is some variation between the levels of flagellin in the WT and pZE control samples, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4— figure supplement 1 to better document the changes in flagellin levels.

      With respect to panel C, the pBR samples were collected in crl+ background while the pZE samples were collected in crl- background, which explains the lack of fliC signal in the pBR control sample. This is now noted in the figure legend.

      In lines 154-157, the justification for using two plasmids is described. An IPTG-inducible Plac promoter, the pBR*, is used because the constitutive overexpression of UhpU is resulting in mutated UhpU clones. These observations suggest a toxic expression level of UhpU that the cell can only tolerate when the UhpU RNA is somewhat deactivated by mutations. This does not seem like a detail and could be discussed further.

      We agree with the reviewer that this observation is important and now mention that it suggests at a critical UhpU role (page 8, lines 160-163).

      Figure 5E and I. While the bindings of MotR on rpsJ and Flix-S on rpsS are clear, the resolution of both gels in the areas of binding (upper part of both gels) could be improved.

      We found it tricky to choose the mRNA fragments for the in vitro structure probing for the regions of predicted pairing internal to CDSs. Given that we hoped to retain native RNA folding, we chose long fragments; for rpsJ, we started with the +1 of S10 leader and for rpsS, we started 147 nt into the CDS, a region that overlaps the region that was cloned to the rpsS-rplV-gfp fusion. Consequently, the region of base pairing is in the upper part of both gels. The gels were already run for an unusually long time. Thus, we do not think the resolution could be improved further. Nevertheless, we think the region of protection is evident for both mRNAs.

      Minor comments:

      Fig 1B. The promoter symbols are extremely small, please increase the size.

      As suggested, we have enlarged the promoter symbols in Figure 1B as well as in Figure 3A.

      Line 211. "the lrhA mRNA has an unusually long 5´ UTR". How long exactly?

      The 5’ UTR of the lrhA mRNA is 371 nt long. This is now mentioned in the text (page 11, line 224)

      Line 320. Should "Fig 9C" be "Fig S9C" instead?

      We thank the reviewer for noticing this typo. Callouts to supplementary figures have now been renumbered per eLife format.

      Line 384. Something seems to be missing in the sentence "a representative combined class 2 and 3 promoter".

      The sentence has been modified to clarify the designation (page 19, lines 409-411).

      Reviewer #3 (Recommendations For The Authors):

      Recommendation to clarify/strengthen the presentation of science in the paper:

      Lines 102-103: Can the authors provide some more information on how the sRNAs were initially discovered to be potentially sigma-28 dependent and selected?

      As suggested, we expanded the section discussing the discovery and the selection of these sRNAs (page 6, lines 104-109).

      Lines 192-193: It would be helpful to provide a bit more information in the main text about what are the different RIL-seq data sets (18 in total).

      As suggested, we now provide more details about the different RIL-seq datasets we used in the analysis (page 10, lines 202-205).

      It would be helpful to specify the criteria for "top" interactions in targets retrieved from RIL-seq data (Table S1 and text, e.g., line 273): e.g. number of conditions, number of chimeras, etc.

      As suggested, we now more explicitly specify the criteria for selecting targets to characterize (page 10, lines 205-206).

      Fig. 4B/ S6 and line 242: The flagellin amount in the empty vector control (pZE) looks higher than in WT, and the stated effect of MotR/MotR* OE on flagellin is not very clear from the blot. The "cross-reacting band" above flagellin also seems to vary among strains. Could the authors include a quantification of flagellin protein amount and normalize relative to a housekeeping protein (e.g., GroEL), instead of Ponceau S as loading control?

      We agree that there is some variation between the levels of flagellin in the WT and pZE control sample, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4—figure supplement 1 to better document the changes in flagellin levels.

      Figure legends: It would be helpful to have a bit more information about the method used/displayed image rather than stating results in the legends.

      As suggested, we now provide a bit more information about the methods used/displayed image in the figure legends to allow for easier comprehension of the data presented in the figures (while trying to balance this with the length of the legends).

      Fig. 2: Please include a scale for all electron microscopy images or, if it is the same for all panels, state it in the figure legend. Moreover, the same image is used for the pZE control in panel C, E and Figure S4A/C. It would be better to show different fields of bacteria for the pZE sample.

      As is now mentioned in the legends to Figure 2, Figure 2—figure supplement 2, and Figure 8, the same scale was used for all panels. We thought it was better to show the same image for the pZE control in the different panels to emphasize that these samples were all analyzed on the same day.

      Fig. 2: The sRNA OE strains seem to show some heterogeneity in cell length (pZE-MotR) or width (pZE-FliX). The authors could, e.g., check whether this is a phenotype correlated to sRNA OE by quantifying these parameters for different fields and comparing to WT or comment on this in the text if this is not consistently seen.

      We also were intrigued by the slightly different sizes and widths of cells in the EM images. However, our statistical analysis did not reveal significant differences between the different samples. We now comment on this (page 53, lines 1178-1179).

      As a follow-up to this study, it would be interesting to assess the impact of MotR and FliX regulation of ribosomal protein synthesis on overall ribosome activity (e.g., via Ribo-seq), also considering that antitermination regulates rRNA transcription. In the case of MotR, the authors suggest that MotR upregulation of S10 protein might not only impact antitermination, but also lead to the formation of more active ribosomes that would increase flagellar protein synthesis (lines 359-362). However, in the RNA-seq performed in OE MotR* several transcripts encoding rRNA and ribosomal proteins are significantly downregulated compared to EVC (Supplementary Table S2). Could the authors comment on this?

      We share the reviewer’s enthusiasm for follow-up work and thank for the suggested experiments. We hope we will be able to decipher the full mechanism of MotR and FliX action on ribosomal protein synthesis in future experiments. The observation that some ribosomal protein-coding gene levels are reduced in the RNA-seq experiment with overexpression of MotR* is interesting but we do not have an explanation other than the fact that the samples were collected early in exponential growth. We now mention the observation in the text (page 19, lines 404-407).

      Considering that OE of the WT MotR appears to increase fliC mRNA abundance but has no strong impact on flagellin protein levels, can the authors speculate what is the physiological relevance of MotR* for flagellin production?

      We agree that while we do see significant increases in the flagella number and fliC mRNA abundance with MotR and MotR* overexpression, the western analysis did not reveal a striking increase in flagellin levels and also wonder how MotR strongly increases the flagella number, which requires flagellin subunits, but only has a weak effect on the intercellular levels of flagellin. One possibility explanation is that it is more difficult to see significant increases for a protein whose levels are high to begin with. These points are now discussed (page 13, lines 264-269).

      Fig. 4C: The pZE samples seem to show variable expression of fliC mRNA although the samples are collected at the same timepoints. Try to clarify in the text.

      The northern membrane on the bottom was exposed for a longer time due to the lower fliC mRNA levels in the samples with FliX overexpression. We now note these differences in the legends to Figure 4 and Figure 4—figure supplement 1.

      Fig. 7/S13: While a volcano plot for MotR is shown in Fig. 7A, quantification of GFP reporter fusion regulation is shown for MotR. Quantifications of MotR are shown in Fig. S13. Maybe swap the figures.

      Given that the data for MotR are in the supplement figures for all other figures we would also like to retain this distribution for Figure 7 (aside from the volcano plot since this experiment was only carried out for MotR).

      Lines 135-136 (Fig. S1B): on the northern blots, only sRNA levels of MotR are comparable between rich and minimal media (excluding M63 G6P and M63 gal). Most other sRNA seem to be more abundantly expressed in minimal media conditions compared to LB. Maybe rephrase.

      As suggested, the text was revised to point out the differences in the sRNA levels for cells grown in different growth media (page 7, lines 140-144).

      Lines 229-234: this paragraph seems not directly connected to the aims of the study (i.e., no effect on motility tested of these other sRNAs) and could be removed (or moved to discussion).

      We appreciate the reviewer’s suggestion but, considering Reviewer 1’s comments, think that showing the regulation of lrhA by other sRNAs has value in highlighting the complexity of the regulatory circuit. We have revised the text to incorporate Reviewer 1’s suggestions and better explain why these results are intriguing (page 12, lines 247-250).

      Line 200 and Fig. S5: For FlgO sRNA only one target was identified in RIL-seq. This gene could be specified and labeled in Fig. S5 and the text. Does FlgO also bind ProQ?

      We now mention the single FlgO target (gatC) detected in four datasets (page 10, lines 213215). In Figure 3—figure supplement 1, we labeled only targets that we followed up with in the current study. Therefore, to be consistent, we prefer not to label gatC in the FlgO plot. FlgO was found to co-immunoprecipitate with ProQ but at much lower levels than with Hfq, and to have very few RNA partners (Melamed et al., 2020).

      Lines 493-498: It is mentioned that the four sRNAs were also detected in recent RIL-seq experiments of Salmonella and EPEC. Are any of the here identified targets also found in other species or was none detected as analyses were carried out under conditions that do not favor flagella expression?

      The targets identified in this study were not detected in the Salmonella and EPEC RIL-seq datasets. However, the Salmonella and EPEC experiments were carried out under different growth conditions. Based on the sequence conservation of the Sigma 28-dependent sRNAs across several bacterial species (Figure 8—figure supplement 2), we do think overlapping targets will be found in other bacterial species under the appropriate growth conditions.

      The strongest evidence of MotR dependent target regulation is the one on rpsJ, which does not necessarily require the additional experiments with MotR. Since the authors were able to show upregulation of the rpsJ-gfp reporter upon OE of MotR WT, it would have strengthened the results if they performed the experiments in Fig. S8C with MotR WT. Similary as an increase of flagella number was seen with OE of MotR WT in Fig. 2A, the effect of the OE S10∆loop could be compared to OE MotR instead of OE MotR (Fig. 6A). At least if would be helpful, to briefly comment on why MotR* was used instead of MotR WT for these experiments.

      As suggested, we state MotR was used in some assays given the stronger effects for some phenotypes (page 10, lines 196-197). We think, given that we established MotR and MotR cause the same effects, with increased intensity for the latter, it is reasonable to use MotR* in some of the experiments.

      p. lines 482-491 and 508-511: The authors discuss that both UhpU sRNAs and RsaG sRNA from S. aureus are derived from the 3'UTR of uhpT, but conclude there is no overlap regarding flagella regulation, suggesting independent evolution of these sRNAs. However, the authors also mention that UhpU sRNA has many additional targets beyond LhrA involved in carbon and nutrient metabolism. Thus, maybe regulation of metabolic traits could be a conserved theme and function for UhpU and RsaG? Maybe try to comment on or better connect these two parts in the discussion.

      As suggested, we now comment on the possibility of the regulation of metabolic traits being a conserved theme and function for UhpU and RsaG (page 24, lines 520-527).

      Check the text for consistency regarding the use of italics for gene names (e.g., legend of Figs. 7 and 8)

      The text was corrected.

      Please introduce abbreviations, e.g., G6P (line 139), REP (line 150), ARN (line 258), NOR/U (Table S1 legend)

      As suggested, we now introduce the abbreviations for G6P (page 7, line 142), REP (page 8, lines 155-156), and NOR (Supplementary file 1 legend). Regarding ARN, these sequences are already written in parentheses in the same sentence. However, we revised this to “ARN motif sequences” (page 13, line 278).

      Fig. S1A: Highlight REP sequence mentioned in text (line 150).

      REP sequences are now highlighted in gray in Figure 1—figure supplement 1A.

      Fig. S1C: It would be helpful to list number nt positions on the sRNAs based on full-length transcripts.

      The corresponding positions based on the full-length transcripts have also been added to this figure.

      Fig. S2: Adjust the position of UhpU-S label.

      UhpU-S label position was adjusted.

      Fig. S6: Include UhpU in the figure title.

      UhpU was added to the title.

      Fig. S10: It would be helpful to indicate on the figure (or state more clearly in the legend) which RNA was extracted from WT or ΔfliCX background.

      The samples shown in the Figure are all in a WT strain. We corrected the figure legend accordingly.

      Line 290: the effect is on flagella number, not motility.

      This typo is now corrected (page 15, line 312).

      Fig. S8: One-way ANOVA (panel A legend)

      This typo is now corrected (page 64, line 1433).

      Line 320: Fig. S9C instead of 9C

      We thank the reviewer for noticing the typo. The numbering of the supplementary figures has now been changed to the eLife format.

      It would be helpful to add reference for statement in line 57.

      A reference to (Fitzgerald et al., 2014) was added as suggested.

      Add PMID:32133913 as reference for post-transcriptional regulation of the flagellar regulon in the introduction (lines 87-91)

      The indicated reference was added as suggested (page 5, lines 87-91).

      Legend Fig. S6: expand view -> expanded view

      This typo is now corrected (page 63, line 1406).

      line 513: sRNA -> sRNAs

      This typo is now corrected (page 25, line 549).

      Fig. 8G: Maybe include lrhA as target of UhpU sRNA at top of the cascade.

      As suggested lrhA has been added as a target of UhpU at the top of the cascade.

  2. Sep 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all motor neurons are NotchON neurons while all sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single cell RNAseq on LPCs to look for molecular heterogeneities. Thanks for the great comment!

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in L1, we need to express Dl-RNAi before Dl protein is expressed in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4 that we used. There is no L1-gal4 line expressed early enough to eliminate L1 expression of Dl.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in new-born neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      Thanks for the comment! We will annotate Pdm3/Ap+ as L4/L5 fate in the corresponding figures.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons. We will include the data to support this.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show that the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently we only have Hey as an available Notch target reporter in new-born neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. Thank you for requesting it!

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree, and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5 specific gene transcription during synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiates L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree, and will update the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree, and will update the figure annotation.

      ● Bsh role in L4/L5 cell fate:

      o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a followup manuscript on LPC heterogeneity, but those experiments have just barely been started.

      o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We will include this explanation in the text.

      o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we will make that change.

      o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We will rephrase it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We will include Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we will update it.

      ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-210).

      ● Dip-β regulation:

      ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained it above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We’ll include this explanation in the text.

      ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We will add this to the text.

      ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We will include this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

      That is a great point, thank you! We will include this in the discussion.

    1. When we become more aware of the messages we are sending, we can monitor for nonverbal signals that are incongruent with other messages or may be perceived as such.

      My sister is so shy and she tends to aim her head to the ground to avoid eye contact. I think this makes her feel more comfortable but other people probably think she doesn't want to talk to them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response to reviewers, including our plans for the revision:

      ­­­Review____er #1 (Evidence, reproducibility and clarity (Required)):

      * Summary: In this manuscript by the Sanson group, Lye and colleagues try to definitively answer the question of whether pulling forces from the ventral mesoderm have significant effects on convergent extension in the Drosophila germband (germband extension). While germband extension does occur in mutant embryos lacking mesoderm invagination, it has long been an open question in the field as to whether ventral pulling forces from the mesoderm have significant effects (positive or negative) on cell intercalation during germband extension. To definitely address this question, Lye and colleagues generated high-quality, directly comparable datasets from wild-type and twist mutant embryos, and then systematically assessed nearly all aspects of cell intercalation, myosin recruitment, and tissue elongation over time. They demonstrate that pulling forces from the ventral mesoderm have negligible impacts on the course of germband extension. While there are indeed some interesting differences between wild-type and twist embryos with respect to cell intercalation and myosin recruitment, such differences are relatively minor. They conclude that the events of germband extension neither require nor are strongly affected by external forces from the mesoderm. While this is largely a negative results paper, I believe that it should be published and that it will be an impactful paper within the field. Namely, it will settle once and for all the question of whether mesoderm invagination is required for optimal germband extension in the early Drosophila embryo, and it suggests that tissues are largely autonomous developmental units that are buffered from outside mechanical inputs.*

      • * *Major comments: *

      * It seems to me that the one obvious omission from this paper is a general measure of convergent extension over time. I think it would be useful to the reader to include some measure of change in tissue aspect ratio over time between wild-type and twist embryos. This could be included in Figure 5 or 6. *

      • *

      We are happy to include a graph with what we call “tissue strain rate”, which measures the deformation of the germ-band in the direction of extension (along AP) over time, and propose to add it as a panel in Supplementary Figure 6. Note that in our measures, the “tissue” strain rate is decomposed into contributions from two cell behaviors, the “cell intercalation” strain rate and the “cell shape” strain rate (Blanchard et al., 2009). “Tissue” and “cell shape” strain rate are directly measured, and “cell intercalation” strain rate is what remains when “cell shape” strain rate is removed from “tissue” strain rate. The “cell intercalation” strain rate calculated in that way is a “continuous” measure of cell intercalation, measuring the progressive shearing of cells during convergent extension. We also use a “discrete” measure of cell intercalation, which measures the number of cell neighbor exchanges, also called T1 swaps. We found that both “continuous” and “discrete” measures of cell intercalation are unchanged in twist mutant compared to wild-type embryos (Fig. 6F and 6E, respectively). In contrast, we find that the “cell shape” strain rate is increased in twist mutants (Fig. 5B and Fig. 5S1A). Consistent with this finding, the “tissue” strain rate is also increased in twist mutants (see graph below).

      Otherwise, I have no major comments on the experimental approach or the findings of this manuscript. It seems to me a straightforward and systematic approach for determining whether mesoderm invagination affects germband extension. I do have several minor comments that should be addressed prior to publication (below).

      *Minor comments: *

      *I understand why cells would initially stretch more along the DV axis in wild-type embryos compared with twist embryos, but why do cells become so much more stretched along the AP axis (and become smaller apically) after 10 minutes of GBE in wild type compared with twist (Figure 2C and E). *

      *I think this is an interesting and non-intuitive result that would warrant a bit of explanation/conjecture. *

      This is not what Fig. 2C and E show, and we realize now that our schematics on the graphs might have been confusing. We will work on those to improve their clarity (or remove them), and also review our text.

      Figure 2C shows how cells deform along DV (cell shape strain rate projected onto the DV axis). So the graph does not show that the cells are elongating in AP, as only the DV component of the strain rate is shown in this figure. In the wild type, the DV strain rate is positive (the cells are elongating in DV) at developmental times when the mesoderm invaginate (from about -10 minutes to until 7.5 minutes). The DV strain shows an acceleration until about 5 mins, then decelerates, crossing the x-axis to become negative at 7.5 minutes. From this timepoint and until the end of GBE, the DV strain rate is negative (the cells are contracting along DV). Mirroring the positive section of the curve, the DV contraction of the cells accelerate until about 12 mins and then slows down. The strong rate of DV contraction between 7.5 and 20 mins could in part be due to the endoderm invagination pulling in the orthogonal direction (AP) and helping the cells regaining a more isotropic shape. We could add a mention about this in the discussion.

      In Figure 2E, the rate of change in cell area follows a similar time course in the wild type, showing that the cells are increasing their areas until about 10 mins (positive values) and then reduce their areas again until the end of GBE (negative values). Note that the graph does not show raw (instantaneous) cell areas as suggested by the comment, but rather a rate of change.

      So in wild type, the cells get stretched by the invaginating mesoderm, and once the mesoderm is not pulling anymore, the cells appear to relax back. As there is no stretching in twist mutants, there is no equivalent relaxation of the cells along DV. Note that in twist, there is a milder increase in cell area in the first 15 mins of GBE (Fig. 2E). This could again be caused by the pull from endoderm invagination stretching the cells along AP, which, as we have shown before, increases both cell shape strain rates along AP and cell areas (Butler et al., 2009). So the pull from endoderm invagination (along AP) will have an impact on cell area rates of change and possibly also, indirectly, on DV cell shape strain rates, in both twist and wild type embryos, during most of GBE. Therefore cell area and DV cell shape strain rates are affected by more than one process during GBE. In this paper, we are focusing on the impact of mesoderm invagination, which happens around the start of GBE, so have focused our analysis of the graphs in the results section to this period, and the differences between wildtype and *twist. *

      *I don't understand how you are defining cell orientation in Figure 2G. How are you choosing the cell axis that you are then comparing with the body axis? Is it the long axis, or something more complicated than that? I think you should briefly provide this information in the results section. If it is included in the methods, I wasn't able to locate it. *

      Yes, it is the orientation of the long axis of the cell relative to the antero-posterior embryonic axis. We will clarify this in the text, in particular in the Methods, and also try improve our schematics.

      Figure 2: Since you have the space, it might help the reader if you simply wrote out "strain rate" for panels B, D, and F, rather that used the abbreviation "SR." Thank you for this suggestion, we will reduce use of abbreviations where space permits.

      *Please ensure that all axis labels are fully visible in the final figures. In several figures, the Y-axis labels were cut off (e.g., Fig 2I, 4A, 4D, 6B, 6C). *

      These were visible to us in our submitted version, but of course we will ensure everything is visible on the final version.

      *Where space permits, I would suggest using fewer abbreviations in axis labels to increase readability of the figures (e.g., in Figures 3H or 4D). *

      Thank you for this suggestion, will do.

      * In Figure 7, I would move the wild-type panels to the left and the twist panels to the right. I think it is more conventional to describe the normal wild-type scenarios first, and then contrast the mutant state.*

      Will do.

      To be consistent with the literature, "wildtype" should be hyphenated (wild-type) when used as an adjective, or two separate words (wild type) when used as a noun. Thank you, we will change this.

      Review*er #1 (Significance (Required)): *

      * Advance: The advances in this manuscript are largely methodological, but the experiments and analyses are quite rigorous and allow the authors to make strong conclusions concerning their hypotheses. Their findings are based on a high-quality collection of movies from control and twist mutant embryos expressing a cell membrane marker and knock-in GFP-tagged myosin. Importantly, I think the researchers were correct in choosing to analyze twist single-mutant embryos (as opposed to snail or twist, snail double-mutant embryos), as the overall embryo geometry of these mutants is fairly similar to wild-type embryos, allowing the researchers to directly compare cell behaviors and myosin dynamics during germband extension. This approach also allows them to avoid indirect effects on the germband due to a completely non-internalized mesoderm. *

      *

      Audience: The primary audience for this article will be basic science researchers working in the early Drosophila embryo who are interested in the interplay between the germband and neighboring tissues. Secondary audiences will include developmental biologists more broadly who are interested in biomechanical coupling (or in this case decoupling) of neighboring tissues. *

      *

      Describe your expertise: I have been a Drosophila developmental geneticist for over twenty years, and I have been working directly on Drosophila germband extension for over a decade. I have published numerous papers and reviews in this field, and I am very familiar with the genetic backgrounds and types of experimental analyses used in this manuscript. Therefore, I believe I am highly qualified to serve as a reviewer for this manuscript.*

      ­­

      Review____er #2 (Evidence, reproducibility and clarity (Required)):

      *

      In the present manuscript, Lye et al. describe a highly detailed quantification of cell shape changes during germband extension in Drosophila melanogaster early embryo. During this process, ectodermal tissue contracts along the dorso-ventral axis, simultaneously expanding along the perpendicular antero-posterior direction, migrating from the ventral to the dorsal surface of the embryo as it extends. This important morphogenetic event is preceded by ventral furrow formation when mesodermal tissue (located in the ventral part of the embryo) contracts along the dorso-ventral axis and invaginates into the embryonic interior. The study compares cell shape dynamics in the wildtype Drosophila with that in the twist mutant, which largely lacks mesoderm and does not form ventral furrow. The major motivation of the study is to examine whether cellular behaviors and myosin recruitment in the ectoderm is cell autonomous, or if those cellular behaviors depend on mechanical interactions between mesoderm and ectoderm.*

      • The authors first examine whether transcriptional patterning of key genes involved in germband extension is different between the wildtype and the twist mutant and find no significant difference. Next, the authors thoroughly quantify cellular behaviors and patterns of myosin recruitment in the two genetic backgrounds. A number of different measures are investigated, notably the rate of change in the degree of cellular asymmetry, rate of cell area change, rate of change of cell orientation, differences in myosin recruitment to cell edges of various orientation, as well as the rates of growth, shrinkage, and re-orientation of the various cellular interfaces. It is thoroughly documented how these quantities change as a function of developmental timing and spatial position within the embryo. These data serve basis for quantitative comparison between cellular dynamics in the two genetic backgrounds considered.*

      • Overall, the study shows that cellular behaviors observed in the ectoderm are largely the same during the period of time following ventral furrow formation, as would be expected if those cellular behaviors were predominantly cell autonomous and not dependent on stresses generated in the mesoderm.*

      • The data presented in the manuscript are of excellent quality and presentation is very clear.

      Minor comments: none *

      * Reviewer #2 (Significance (Required)): *

      * I find that the study provides a thorough quantification of cell behaviors in a widely studied important model of morphogenesis. The work may be of particular interest for future model-to-data comparison, perhaps providing a basis for future modeling work. I therefore certainly think that this work warrants publication.*

      • However, the results of the study largely parallel previous findings and do not appear novel or surprising. It is well established that in snail mutant that lack mesoderm entirely, germband extension proceeds largely normally. This well-established fact suggests that since tissue dynamics in complete absence of mesoderm are largely unaffected, behaviors of individual cells are likely to not be affected either*.

      *The work is pretty much entirely observational, and for most part provides a more detailed documentation/quantification of previous findings. I do not think it is appropriate for high profile publication. *

      We are not sure which evidence the reviewer is referring to here specifically. We agree that the single mutants twist or snail, or the double twist snail mutants do extend their germ-band. However, the question we are asking here, is how well do they extend their germband and to answer this question, quantitation is needed. The first quantitation of GBE were performed by (Irvine and Wieschaus, 1994). While they quantified GBE in various mutant contexts, they did not perform quantitation for snail, twist, or twist snail mutants. Instead, they refer to these mutants once in p839, with the following sentence: Additionally, twist and snail mutant embryos, which lack mesoderm, extend their germbands almost normally (Leptin and Grunewald, 1990; Simpson, 1983)*.” *

      Following these earlier qualitative observations, various studies have quantified different aspects of GBE in mesoderm invagination mutants, with contradictory results. For example, some studies, including from our own lab, report a reduction in cell intercalation in the absence of mesoderm invagination (Butler et al., 2009; Wang et al., 2020), but there have also been reports that tissue extension and T1-transistions occur normally (Farrell et al., 2017)(see also introduction of our manuscript). These contradictory results have motivated our present study, and we have implemented rigorous comparison between wild type and mesoderm invagination mutants, being careful i) to check that the regions analyzed were comparable in terms of cell fate, and ii) to control for any confounding effects between experiments (see also response to reviewer 4, main question 2). We have also considered which mesoderm invagination mutants to use. We rejected snail or twist snail mutants because the absence of snail means that the mesodermal cells do not contract and thus stay at the surface of the embryo, which changes the spatial configuration of the embryo considerably and would make a fair quantitative comparison very difficult. Instead, we decided to use twist mutants, as in those, cell contractions still happen so the cells do not take as much space at the surface of the embryo, but the contractions are uncoordinated which means that there is no invagination (and we demonstrate here, no significant pulling on the ectoderm). We note that reviewer 1 highlights the merit of settling the question of the impact of mesoderm invagination on GBE and the pertinence of choosing twist mutants versus the alternatives (see also response to reviewer 4, suggestion 1).

      ­­

      __Review____er #3 (Evidence, reproducibility and clarity (Required)): __

      During morphogenesis, the final shape of the tissue is not only dictated by mechanical forces generated within the tissue but can also be impacted by mechanical contributions from surrounding tissues. The way and extent to which tissue deformation is influenced by tissue-extrinsic forces are not well understood. In this work, Lye et al. investigated the potential influence of Drosophila mesoderm invagination on germband extension (GBE), an epithelial convergent extension process occurring during gastrulation. Drosophila GBE is genetically controlled by the AP patterning system, which determines planar polarized enrichment of non-muscle myosin II along the DV-oriented adherens junctions. Myosin contractions drive shrinking of DV-oriented junctions into 4-way vertices, followed by formation of new, AP-oriented junctions. This process results in cell intercalation, which causes tissue convergence along the DV-axis and extension along the AP-axis. In addition, GBE is facilitated by tissue-extrinsic pulling forces produced by invagination of the posterior endoderm. Interestingly, some recent studies suggest that the invagination of the mesoderm, which occurs immediately prior to GBE, also facilitates GBE. In the proposed mechanism, invaginating mesoderm pulls on the germband tissue along the DV-axis; the resulting strain of the germband cells generates a mechanotransduction effect that promotes myosin II recruitment to the DV-oriented junctions, thereby facilitating cell intercalation. Here, the authors revisited this proposed mechanotransduction effect using quantitative live imaging approaches. By comparing the wildtype embryos with twist mutants that fail to undergo mesoderm invagination, the authors show that although the DV-oriented strain of the germband cells was greatly reduced in the absence of mesoderm pulling, this defect had a negligible impact on junctional myosin density, myosin planar polarity, the rate of junction shrinkage or the rate of cell intercalation during GBE. A mild increase in the rate of new junction extension and a slight defect in cell orientation were observed in twist mutants, but these differences did not cause obvious defects in cell intercalation. The authors conclude that myosin II-mediated cell intercalation during GBE is robust to the extrinsic mechanical forces generated by mesoderm pulling.

      • * *Overall, I found that the results described here are very interesting and of high quality. The data acquisition and analyses were elegantly performed, statistics were appropriately used, and the manuscript was clearly written. However, there are a few points where some further explanation or clarification is necessary, as detailed below: *

      • The main conclusion of the manuscript relies on appropriate quantification of myosin intensity at cell junctions. It is therefore important that the methods of quantification are well justified. Below are a few questions regarding the methods used in the analyses:*
      • -For myosin quantification, the authors state that "Background signal was subtracted by setting pixels of intensities up to 5 percentile set to zero for each timepoint" [Line826]. The rationale for selecting 5 percentile as the threshold for background should be explained. Also, how does this background value change over time? *

      • *

      For our normalization method, we stretched the intensity histogram of images to use the full dynamic range for quantification and enable meaningful comparison of intensities between different movies. The 5th percentile was chosen to set to zero intensity as this removed background signal without removing any structured Myosin signal (i.e., non-uniform, low level fluorescence - this was assessed by eye). We will provide some before and after normalization images at different timepoints to illustrate this (See reviewer 3, minor point 4 below). Since the cytoplasmic signal is uniform, it is difficult to discern from true ‘background’, therefore some cytoplasmic signal might be set to zero with this method, but all medial and junctional Myosin structures will still be visible and have none-zero intensity values. However, since cytoplasm takes up a large majority of pixels in the image, and we only set 5% of pixels to zero, the majority of the cytoplasm will have non-zero pixel values. ‘Background’ changes increases slightly as Myosin II levels increase in general over time, as expected from the embryo accumulating Myosin II as they develop.

      -The authors mention that "Intensities varied slightly between experiments due to differences in laser intensity and therefore histograms of pixel intensities were stretched" [Line828]. The method of intensity justification should be justified. For example, does this normalization result in similar cytoplasmic myosin intensity between control and twist mutant embryos?

      • *

      As stated above, we stretched the intensity histogram of images to enable meaningful comparison of intensities between different movies, as stretching the histograms would bring Myosin II structures of similar intensities into the same pixel value range. We chose to stretch histograms using a reference timepoint (30 minutes, the latest timepoint analyzed), rather than on a per timepoint basis, because we saw a general increase in Myosin II over time, and we wanted to ensure that this increase was preserved in our analysis.

      • *

      Note that we quantify Myosin from 2 µm above to 2 µm below the level of the adherens junctions (see Methods), not throughout the entire cell, and therefore we have no true measure of cytoplasmic Myosin. However, we can plot non-membrane Myosin from this same apicobasal position in the cell. Non-membrane Myosin will include both the cytoplasmic signal and the Myosin II medial web (see above). When plotting these, we find that Myosin II intensities in this pool are similar in wildtype and twist (see graph below, dotted lines show standard deviations), confirming that that we are not inappropriately brightening one set of images compared to the other (e.g., twist versus wildtype).

      Finally, our observations of rate of junction shrinkage and intercalation are consistent with our Myosin II quantification results (see Figures 4A, 4D and 6F). This further validates our methods.

      • *

      • *

      - A previous study demonstrates that the accumulation of junctional myosin is substantially reduced in twist mutant embryos compared to the wild type (Gustafson et al., 2022). In that work, junctional myosin was quantified as (I_junction - I_cytoplasm)/I_cytoplasm. In contrast, the cytoplasmic myosin intensity does not appear to be subtracted from the quantification in this study. How much of the difference in the conclusions of the two studies can be explained by this difference in myosin quantification?

              As explained above, we choose to normalize our data by stretching histograms, rather than subtracting and dividing intensities between different pools of Myosin. The setting pixels of intensities up to 5 percentiles set to zero for each will have a similar effect to subtracting a small fraction of the cytoplasmic pool. We note that the intensity measurements in (Gustafson et al., 2022) are in the apical-top 5µm of the cell, and therefore their ‘cytoplasmic’ signal is likely to also include the apical medial web of Myosin. Also, after subtraction they use division by the cytoplasmic intensity in an attempt to bring pixel intensities between different movies into a comparable range, whereas we do this by stretching the histograms themselves (see above).  We carefully designed our method to preserve the increase in Myosin levels that we see over time in our post-normalization data. This is something that their method of normalization would not be predicted to capture, if their ‘cytoplasmic’ signal increase over time as well as their junctional signal.  Indeed, in FigS6D of their paper, Myosin II levels do not appear to increase over time in these (presumably normalized) images.
      

      Additionally, we note that in (Gustafson et al., 2022), not all Myosin II is fluorescently tagged since they use a sqhGFP transgene located on the balancer chromosome. This means that the line they use will have a pool of exogeneous Myosin tagged with GFP (expressed from the CyO balancer) and a pool of endogenous Myosin (expressed from the sqh gene on the X chromosome. It is not known whether endogenous and exogeneous GFP-tagged Myosin II will be recruited equally to cell junctions when in competition with each other. Therefore, in their genetic background, the ratio of junctional/cytoplasmic sqhGFP might not reflect the true ratio. To avoid this potential caveat, in our study we have used a new knock-in of Myosin, which tags the sqh gene at the endogenous locus (Proag et al., 2019). The line is homozygous viable and thus all the molecules of Myosin II Regulatory Light Chain (encoded by sqh), and thus the Myosin II mini-filaments, are labelled with GFP.

      Additionally, we note that when comparing their images of Myosin II in wildtype and twist (Figure 5D and D’), the overall Myosin signal appears reduced in twist mutants (including in the head and posterior midgut, which is outside the area that they are claiming Myosin II is recruited in response to mesoderm invagination). This suggests that Myosin II is generally reduced in their twist mutants (or images thereof), which is not expected and might indicate issues with their methods.

      Therefore differences in the methods may explain the discrepancies between studies. Importantly, we have quantified junctional shrinkage rates and intercalation, and our analysis of these rates is consistent with our Myosin II quantification results (see above).

      -The authors used the tissue flow data to register the myosin channel and the membrane channel, which were acquired at slightly different times. The accuracy of this channel registration should be demonstrated.

      As stated in our methods: “the channel registration was corrected post-acquisition in order that information on the position of interfaces in the Gap43 channel could be used to locate them in the Myosin channel. Therefore the local flow of cell centroids between successive pairs of time frames in the Gap43 channel is used to give each interface/vertex pixel a predicted flow between frames. A fraction of this flow is applied, equal to the Myosin II to Gap43 channel time offset, divided by the frame interval. Because cells deform as well as flow, the focal cell’s cell shape strain rate is also applied, in the same fractional manner as above.”

      The images in Figure 3C and C’ show the Myosin II, with quantified membrane Myosin superimposed on the image as a color-code. Images in Figure 3B and B’ show the (normalized) Myosin II. Comparison of these images demonstrates that the channel registration is accurate. We will add a reference to these images in the methods.

      • The authors show that cell intercalation is not influenced in twist mutant embryos. However, a previous study demonstrates that the speed of GBE is substantially reduced in twist mutants (Gustafson et al., 2022). It would be interesting to see whether a similar reduction in the speed of GBE was observed in this study. *

      We do not see a reduction in the speed of GBE as reported by (Gustafson et al., 2022), we will add “tissue strain rate” graphs to demonstrate this. On the contrary, we find a slight increase in the “tissue strain rate”, because there is a slight increase in the “cell shape strain rate” contributing to extension (while “cell intercalation strain rate” is unchanged). See also response to Reviewer 1 (major comment) .

      • It has been previously shown that contractions of medioapical myosin in germband cells also contribute to cell intercalation. The authors should explain why medioapical myosin was not included in the comparison between wildtype and twist mutant embryos. *

      • *

      Indeed, it has been shown that there is a flow of medial Myosin towards the junctions (Rauzi et al., 2010). However, and as described in that paper, this flow ‘feeds’ the enrichment of Myosin II at shrinking junctions, and thus the junctional Myosin II can be taken as a readout of polarized Myosin II behavior. Additionally, medial flows are more technically challenging to quantify, especially when quantification is required in a large number of cells as is the case for our study.

      Importantly, our junctional Myosin II and junctional shrinkage rate results are consistent with each other, therefore it is very unlikely that analyzing medial Myosin II would lead us to form a different conclusion. We will add a sentence to explain why we chose to quantify junctional, and not medial, Myosin II.

      *Minor points: *

      1. * Fig. 1-S1 panel C: the number of cyan cells changes non-monotonically. It first decreases from -10 min to 10 min, then increases from 10 min to 20 min. This is confusing since in theory the number of tracked cells should not increase over time if the cells are tracked from the beginning of the movie. *
      2. *

      The cyan cells highlight tracked mesodermal and mesectodermal cells, which are not included in the analysis. The low number of mesodermal cells highlighted at 10mins germband extension is because mesodermal and mesectodermal cells are not always tracked successfully at this time. Note that the legend includes a note that ‘”Unmarked cells are poorly tracked and excluded from the analysis”. Also see Methods: “Note on number of cells in movies, for notes on changes to the number of tracked ectodermal cells throughout the timecourse of the movies.”

      • Fig. 1-S2: the vnd band in panel A appears to be much narrower than in panel B. *

      • *

      These are fixed embryos, therefore this could be (at least partially) due to slight differences in exact developmental age of the embryo. Note that we wanted to check that vnd and ind are expressed in the correct places in the ectoderm. We were motivated to check this because the width of mesoderm is reduced in twist, so we thought it was important to verify that there is not a population of ‘ectodermal’ cells with a strange fate (i.e., negative for both vnd and ind). Our experiments show that vnd abuts the mesoderm/mesectoderm in twist as in wildtype, and that the cells immediately lateral to the vnd cell population express ind as expected.

      It is possible that there is a slight difference in the number of vnd cells in twist mutants compared to wildtype, but we see no differences in Myosin II bipolarity that would coincide with the vnd/ind boundary (Fig3-S1). Therefore, this would not change the interpretation of our results. Counting the number of rows of vnd cells prior to any cell intercalation (the number of rows will reduce as cells intercalate) would be technically challenging as the lateral border of vnd expression is hard to discern at this time due to lower levels of vnd expression laterally within the vnd expression domain.

      • The schematic in Fig. 2J suggests that at the onset of mesoderm pulling the germband cells have a uniform angle of rotation (towards bottom right). Is this the case?*

      • *

      No, this schematic is purely supposed to show that as cells stretch, they also reorient. Note that we will review our schematics in Fig. 2 to increase clarity (see response to reviewer 1, first minor comment).

      • The description of myosin intensity normalization in the Methods section is somewhat difficult to follow [Line 829 - 832]. It would be helpful if the authors can show one or two images before and after intensity normalization as examples. *

      We will add some examples of before and after normalization images to this section. We will also review the Methods to improve the text’s clarity.

      • Line 704: "Z-stacks for each channel were collected sequentially" - the step size in Z-axis should be reported. *

      Thank you for this, the step size was 1µm. We will add this information.

      • Fig. 4C: what are the thin, black lines in the image? *

      This image is a 2D representation of the Gap43Cherry signal at the level of the adherens junctions extracted for tracking, not a simple confocal z-slice. When viewing these representations, you can see lines showing borders between where information from different z-stacks was used for the tracking layer. Unfortunately, our software does not allow us to remove these lines, but they do not affect tracking, quantification etc.

      Reviewer #3 (Significance (Required)):

      While most previous work on tissue mechanics and morphogenesis focuses on tissue-intrinsic mechanical input, recent studies have started to emphasize the contribution of tissue-extrinsic forces. An important challenge in understanding the function of tissue-extrinsic forces lies in the difficulties in properly comparing the wild type and the mutant samples that disrupt extrinsic forces, in particular when cell fate specification is altered in the mutants. In this work, the authors addressed this challenge by employing a number of approaches to warrant a parallel comparison between genotypes, including examining the AP- and DV-patterning of the tissue, selecting sample regions with comparable cell fate for analysis, and carefully aligning the stage of the movies. With these approaches, the authors provide compelling evidence to support their main conclusions. By teasing apart the role of the intrinsic genetic program and the extrinsic tissue forces, the work provides important clarifications on the function of mesoderm pulling in GBE and adds new insights into this well-studied tissue morphogenetic process. This work should be of interest to the broad audience of epithelial morphogenesis, tissue mechanics and myosin mechanobiology.

      • *

      Review____er #4 (Evidence, reproducibility and clarity (Required)):

      *Lye and colleagues investigate the impact of tissue-tissue interactions on morphogenesis. Specifically, they ask how disrupting mesoderm internalization affects convergence and extension of the ectoderm (germband) in Drosophila embryos. Using twi mutants in which mesoderm invagination fails, the authors find that the invagination of the mesoderm deforms germband cells, but does not significantly contribute to patterning, cell alignment, myosin polarization and cell-cell contact disassembly (which drive germband convergence). The authors find modest effects of mesoderm invagination on new junction formation and orientation (which drive extension), but these changes do not have a significant effect on germband elongation. The authors conclude that germband extension is robust to external forces from the invagination of the mesoderm. *

      *MAIN 1. The authors clearly show that myosin density is not different in wild-type and twi mutant embryos, and subsequently argue that the pulling force from the mesoderm does not elicit a mechanosensitive response in early germband extension. But if the cell density is constant, doesn't that mean that the longer, DV-oriented interfaces in the wild type accumulate more total myosin than their shorter counterparts in twi mutants? Assuming that the total number of myosin molecules per cell is not greater in the wild type, wouldn't increased total myosin at the membrane suggest a response to the increased deformation? Certainly the cells are able to maintain the same cell density despite the pulling force from the mesoderm, so can the authors rule out a mechanosensing mechanism? *

      • *

      We do not rule out a mechanosensing mechanism. We agree the total Myosin at stretched interfaces is higher than at unstretched interfaces and proposed a homeostatic mechanism to maintain Myosin II density on the cortex upon rapid stretching (summarized in Fig. 7). Indeed it is possible that this mechanism could itself be due to mechanosensitive recruitment of Myosin II (though there are also other possibilities). We have tried to address this in our discussion (under “Mechanisms regulating Myosin II density at the cortex and consequences for cell intercalation” and “Restoration of DV cell length after being stretched by mesoderm invagination”), but we will amend the wording the make the possibility of mechanosensitive recruitment of Myosin II to maintain cortical density more explicit.

      *What happens to the Gap43mCherry signal? From Figure 2A, it seem to be diluted ventrally in the wild type as compared to twi mutants? Comparing myosin and Gap43 dynamics may shed light on whether myosin accumulates more or less than one would expect simply on the basis of having longer contacts. *

      We quantify the density of Myosin, rather than the total amount. Therefore, the length of the contact should not matter. The suggestion of comparing Myosin density to Gap43Cherry density is in principle a good one, as it would allow us to compare a protein which is not diluted as cell contact length increases (Myosin) to one which appears to be (Gap43). However, it is not essential for the conclusions that we make. However, in practice quantifying the Gap43Cherry signal would not be straightforward on our existing movies due to the imaging parameters used. We capture the Gap43Cherry channel (but not the Myosin channel) with a ‘spot noise reducer’ tuned on in the camera software, due to very occasional bright spot noise, which confuses the tracking software. Therefore, our Gap43Cherry signal is manipulated during acquisition and to quantify from these images would not be appropriate. Therefore, we would have to acquire, track and quantify some new movies, which is not possible within the timeframe of a revision.

      In summary, we think that we have sufficient evidence from our analysis that Myosin II is not diluted upon junctional stretching without comparing to quantification of Gap43Cherry, and the time investment required to quantify the Gap43Cherry would not be worthwhile as it would require more data to be acquired and processed.

      • The authors previously argued that mesoderm invagination was required for the fast phase of cell intercalation [Butler et al., 2009]. However, here the authors interpret that loss of twi does not significantly slow down interface contraction, but accelerates the elongation of junctions and cells along the AP axis, which overall would mean that mesoderm invagination is (slightly) detrimental for axis elongation. The discrepancy between their previous and current results should be discussed. *

      We are happy to add more information about these discrepancies in the discussion. In a nutshell, we think that these discrepancies arise from the challenges of comparing wildtype and twist mutant embryos relative to each other, and as a consequence we have made various improvements to our methods since (Butler et al., 2009). These improvements included using markers that would be expressed at the same levels in wildtype and twist embryos. Additionally, we did not use overexpressed cadherin-FPs (namely, the ubi-CadGFP transgene), which may have confounding effects, and we used a knock-in sqhGFP to ensure we could all Myosin II molecules were labelled by GFP. We also carefully controlled the temperature at which we acquired the movies, standardized the level at which to track cells and quantify Myosin between movies, as well as improving the accuracy of our image segmentation and cell type identification since our previous study (Butler et al., 2009). See also response to reviewer 2.

      • Related to the previous point, it is surprising that the differences shown in Figure 4A-B are not significant. This is particularly troubling when in Figure 5B the authors claim a significant difference in cell elongation rate, which is higher in twi mutants (but only in very short time intervals and actually switches sign at the end of germband extension). These are just two examples, but I think the analysis of significance on a per-time point basis is problematic. *

      *Have the authors considered analyzing their results as time series rather than comparing individual time points? Or perhaps integrating the different metrics over the duration of germband extension (e.g. using areas under the curve)? That way they would not have to arbitrarily decide if significant differences in a few time points should or not be interpreted as significant overall differences. *

      • *

      For graphs plotted against time of germband extension, we do not think it is appropriate to analyze as a time series rather than comparing individual time points, since different developmental events (such as mesoderm invagination) occur at different times. For graphs plotted against time to/from cell neighbor swap, these can also change over time (e.g., ctrd-ctrd orientation, Fig6D). Therefore we do not feel that it appropriate to run statistical analyses as a timeseries for these comparisons either. Statistically cut-offs are by their nature arbitrary. We have tried to highlight non-significant trends throughout the text (including for Fig4A&B), in addition to stating where we see significant differences to highlight where there may be minor (but not significant) differences.


      • While the number of cells analyzed is impressive, the number of embryos is relatively low, particularly for the wild type (only four embryos analyzed). If I understood correctly (if not, please clarify) the authors ran their statistics using cells and not embryos as their measurement unit. But I could not find any evidence that cells from the same embryo can be considered as independent measurements. This could be easily done by demonstrating that the variance of any of the measurements (e.g. elongation, area change rate, etc.) for cells in an embryo is comparable to that calculated when mixing cells from different embryos. *

      • *

      We do not simply use the number of cells as an n for our experiments. We use a mixed effects model for our statistics as previously (Butler et al., 2009; Finegan et al., 2019; Lye et al., 2015; Sharrock et al., 2022; Tetley et al., 2016). This estimates the P value associated with a fixed effect of differences between genotypes, allowing for random effects contributed by differences between embryos within a given genotype. We will make sure that this is clear in the Methods.

      MINOR 1. Figure 4D: the authors show no difference in the proportion of neighbor swaps per minute between wild-type and twi- mutant embryos. But how about the absolute number of neighbour swaps per minute? Does that change in twi mutants (and if so, why?).

      The number of interfaces involved in a T1 swap are expressed as a proportion of the total number of DV-oriented interfaces for all tracked ectodermal germband cells, to take account of differences in the number of tracked cells between different timepoints and different movies. Presenting the absolute number of swaps per minute could lead to misleading interpretations.

      • I was a bit confused about the reason why in Figure 4A the authors measure the rate of interface contraction in units of “proportion/min”, but in Figure 5A they measure interface elongation in units of “um/min”. Unless there is a good reason not to, these two metrics should be reported using the same units. Is there a difference in the rate of interface contraction when measured in absolute units (um/min)? *

      Thank you, we will amend so that both measures are expressed in the same units.

      • The discussion of previous work on cell deformation within the mesoderm (page 16, first paragraph) should probably include recent work from Adam Martin's lab (e.g. [Heer et al., 2017]; or [Denk-Lobnig et al., 2021]). *

      Thank you, and apologies for this oversight, we will add these references__.__

      SUGGESTIONS 1. While I appreciate the arguments that the authors provide to use twi mutants rather than sna mutants or twi sna double mutants, as the authors indicate, in twi mutants there is still contractility in the mesoderm (albeit not ratcheted). Therefore, it is possible that contractile pulses from the mesoderm in twi mutants could still facilitate cell alignment and polarization of myosin in the germband. Given the previous results from the Zallen lab using twi sna double mutants (see above) this is unlikely to be the case, but the findings in this manuscript would be significantly stronger if they included similar analysis in the double mutants.

      We had concerns about using sna or twi sna double mutants due to the large amount of space the un-internalized mesoderm takes up on the exterior of the embryo. This concern is also shared by reviewer 1 “Importantly, I think the researchers were correct in choosing to analyze twist single-mutant embryos (as opposed to snail or twist, snail double-mutant embryos), as the overall embryo geometry of these mutants is fairly similar to wild-type embryos, allowing the researchers to directly compare cell behaviors and myosin dynamics during germband extension. This approach also allows them to avoid indirect effects on the germband due to a completely non-internalized mesoderm.” * In addition to this concern, imaging of snail or twist snail* embryos by confocal imaging to include the ventral midline (which is required to define embryonic axes) is problematic as the un-constricted mesodermal cells occupy virtually all the field of view, leaving very few ectodermal cells to analyze.

      Whilst we acknowledge that there are some (un-ratcheted) contractions of mesodermal cells in twist mutants, we have clearly shown that there is no DV stretch and very little reorientation of cells. Therefore, any residual contractile activity in the mesodermal cells of twist mutants does not appear to have a mechanical impact on the ectoderm. We cannot exclude the possibility that there is some transmission of forces between contracting cells of the mesoderm and the ectoderm in twist mutants. However, our evidence suggests that the large tissue scale force that transmits to the ectoderm from the invaginating mesoderm is missing in twist mutants, and it was the effects of that force that we wished to investigate (See also response to reviewer 2).

      Review*er #4 (Significance (Required)): *

      *This is an interesting study, with careful quantitative analysis of cellular and subcellular dynamics. The results follow previous findings from Jennifer Zallen and the authors themselves. The Zallen lab showed that cell alignment, myosin polarization and germband extension are normal in sna twi mutants [Fernandez-Gonzalez et al., 2009], a result that the authors fail to cite. The results in the present manuscript are similar, but the analysis is much more in depth here, so the findings by Lye and colleagues certainly warrant publication. *

      We did not specifically cite this result from (Fernandez-Gonzalez et al., 2009), because the subject of their study is the formation of multicellular rosettes, not whether a pull from mesoderm affects Myosin II polarity and cell intercalation. The formation of multicellular rosettes occurs later in germband extension, and therefore these results are not directly relevant to our study. Additionally, their measures of alignment are defined as linkage to other approximately DV oriented interfaces, rather than directly measuring orientation compared to the embryonic axes as we do here, as a different question is being addressed. Specifically, the quoted sna twi experiment is interpreted as extrinsic forces from the mesoderm not being required for linkage of Myosin enriched DV-oriented interfaces together. Myosin II quantification is more rudimentary with edges being assigned as Myosin positive or Myosin negative, as opposed to quantifying the density of Myosin on each interface and we cannot see any comparison of Myosin II quantification between wildtype and twist embryos.­

      So, although the results are consistent with each other, they are not directly comparable due to methods used and we are happy that the reviewer acknowledges that our analysis is more in depth, which was necessary to address the specific questions that we investigate in our study.

              In general, there have been inconsistencies in results between previous studies, leading reviewer one to recognize that *“…it should be published and that it will be an impactful paper within the field. Namely, it will settle once and for all the question of whether mesoderm invagination is required for optimal germband extension in the early Drosophila embryo.”  *The high amount of conflicting information in the literature led us to not exhaustively describe individual findings, but we will ensure the results from the Zallen lab are appropriately cited.
      

      However, there are a number of experimental points that I think need to be addressed to solidify the manuscript, particularly in terms of statistical analysis.

      Please see more details above (main points 3 and 4) regarding specific concerns about experimental points and statistics. Additionally, we note that reviewer 3 states “statistics were appropriately used”, and our statistical methods are the same as we have used in previous studies comparing live imaging data (Butler et al., 2009; Finegan et al., 2019; Lye et al., 2015; Sharrock et al., 2022; Tetley et al., 2016).

      • *

      __REFERENCES

      __

      Blanchard, G. B., Kabla, A. J., Schultz, N. L., Butler, L. C., Sanson, B., Gorfinkiel, N., Mahadevan, L. and Adams, R. J. (2009). Tissue tectonics: morphogenetic strain rates, cell shape change and intercalation. Nat Methods 6, 458-464.

      Butler, L. C., Blanchard, G. B., Kabla, A. J., Lawrence, N. J., Welchman, D. P., Mahadevan, L., Adams, R. J. and Sanson, B. (2009). Cell shape changes indicate a role for extrinsic tensile forces in Drosophila germ-band extension. Nat Cell Biol 11, 859-864.

      Farrell, D. L., Weitz, O., Magnasco, M. O. and Zallen, J. A. (2017). SEGGA: a toolset for rapid automated analysis of epithelial cell polarity and dynamics. Development 144, 1725-1734.

      Fernandez-Gonzalez, R., Simoes Sde, M., Roper, J. C., Eaton, S. and Zallen, J. A. (2009). Myosin II dynamics are regulated by tension in intercalating cells. Dev Cell 17, 736-743.

      Finegan, T. M., Hervieux, N., Nestor-Bergmann, A., Fletcher, A. G., Blanchard, G. B. and Sanson, B. (2019). The tricellular vertex-specific adhesion molecule Sidekick facilitates polarised cell intercalation during Drosophila axis extension. PLoS Biol 17, e3000522.

      Gustafson, H. J., Claussen, N., De Renzis, S. and Streichan, S. J. (2022). Patterned mechanical feedback establishes a global myosin gradient. Nat Commun 13, 7050.

      Irvine, K. D. and Wieschaus, E. (1994). Cell intercalation during Drosophila germband extension and its regulation by pair-rule segmentation genes. Development 120, 827-841.

      Leptin, M. and Grunewald, B. (1990). Cell shape changes during gastrulation in Drosophila. Development 110, 73-84.

      Lye, C. M., Blanchard, G. B., Naylor, H. W., Muresan, L., Huisken, J., Adams, R. J. and Sanson, B. (2015). Mechanical Coupling between Endoderm Invagination and Axis Extension in Drosophila. PLoS Biol 13, e1002292.

      Proag, A., Monier, B. and Suzanne, M. (2019). Physical and functional cell-matrix uncoupling in a developing tissue under tension. Development 146.

      Rauzi, M., Lenne, P. F. and Lecuit, T. (2010). Planar polarized actomyosin contractile flows control epithelial junction remodelling. Nature 468, 1110-1114.

      Sharrock, T. E., Evans, J., Blanchard, G. B. and Sanson, B. (2022). Different temporal requirements for tartan and wingless in the formation of contractile interfaces at compartmental boundaries. Development 149.

      Simpson, P. (1983). Maternal-Zygotic Gene Interactions during Formation of the Dorsoventral Pattern in Drosophila Embryos. Genetics 105, 615-632.

      Tetley, R. J., Blanchard, G. B., Fletcher, A. G., Adams, R. J. and Sanson, B. (2016). Unipolar distributions of junctional Myosin II identify cell stripe boundaries that drive cell intercalation throughout Drosophila axis extension. Elife 5.

      Wang, X., Merkel, M., Sutter, L. B., Erdemci-Tandogan, G., Manning, M. L. and Kasza, K. E. (2020). Anisotropy links cell shapes to tissue flow during convergent extension. Proc Natl Acad Sci U S A 117, 13541-13551.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you very much for the kind comments about our manuscript. We have improved the text to address all reviewers’ comments and suggestions. Additionally, we corrected and improved the supplementary tables.

      Reviewer #1 (Public Review):

      This paper provides new evidence on the relationship between genetic/chromosome divergence and capacity for asexual reproduction (via unreduced, clonal gametes) in hybrid males or females. Whereas previous studies have focussed just on the hybrid combinations that have yielded asexual lineages in nature, the authors take an experimental approach, analysing meiotic processes in F1 hybrids for combinations of species spanning different levels of divergence, whether or not they form asexual lineages in nature. As such, the findings here are a substantial advance towards understanding how new asexual lineages form.

      The quality of the work is high, the analyses are sound, and the authors sensibly link their observations to the speciation continuum. I should also add that the cytogenetic work here is just beautiful!

      A key finding is that the precondition for asexual reproduction - the formation of unreduced gametes - is not unusual among hybrid females, so that we have to consider other factors to explain the rarity of asexual species - a major unresolved issue in evolutionary biology. This work also highlights a previously overlooked effect of chromosome organisation on speciation.

      Thank you for the nice comments about our work as well as for appreciating our cytogenetics work and figures.

      Reviewer #2 (Public Review):

      The authors investigate the origin of asexual reproduction through hybridization between species. In loaches, diploid, polyploid, and asexual forms have been described in natural populations. The authors experimentally cross multiple species of loaches and conduct an impressively detailed characterization of gametogenesis using molecular cytogenetics to show that although meiosis arrests early in male hybrids, a subset of cells in females undergo endoreplication before meiosis, producing diploid eggs. This only occurred in hybrids of parental species that were of intermediate divergence. This work supports an expanding view of speciation where asexuality could emerge during a narrow evolutionary window where genomic divergence between species is not too high to cause hybrid inviability, but high enough to disrupt normal meiotic processes.

      Thank you.

      I enjoyed reading this study and I appreciate the amount of work it takes to conduct these types of cytogenetic experiments. But, my main concern with this study is I was left wondering if the sample sizes are large enough to get a sense how variable endoreplication is in these loach species. Most of the hybrids between species are the result of crosses between 1-2 families. Within males and females, meiocyte observations are limited to a handful of pachytene and diplotene stages. I think it would be helpful to be more transparent about the sample sizes in the main text.

      Thank you for raising this point. We have improved the Supplementary Tables S2 and S3 to clarify how many individuals we analyzed from each genetic family and added this information to the main text. In total we obtained 12 combinations with 19 F1 hybrid families. For the combination, C. elongatoides x C. taenia hybrids we obtained three families, for C. elongatoides x C. ohridana, C. elongatoides x C. tanaitica, C. elongatoides x C. bilineata and C. ohridana x C. bilineata, we obtained two families For the rest of the combinations of hybrids we obtained single family. From these families, 79 individuals were used for the analysis of the meiocites. Additionally, 24 parental individuals, males and females, were analysed. For the parental species, we analysed 852 cells, for hybrid males we investigated 244 cells, and 665 cells for hybrid females.

      Along these lines, the authors argue against the possibility that endoreplication may be predisposed to occur at a higher rate in some species (line 291). Instead, they suggest that endoreplication is a result of perturbing the cell cycle by combining the genomes of two different species. Their main argument is based on gonocyte counts from parental females in a previous reference. It is essential to include counts from the parents used in this study to make a clear comparison with the F1s.

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytene cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have a significantly lower incidence of abnormal pachytene cells. We have now included this information in the main text.

      In the discussion (lines 320-333), the authors postulate the sex-specific clonality they observe could be a result of Haldane's rule. Given these fish do not have known sex chromosomes, I do not find this argument strong. Haldane's rule refers to the exposure of recessive incompatibilities with the sex chromosomes in the hybrid heterogametic sex. This effect would therefore be limited to degenerated sex chromosomes where much of the sequence content on the Y or W has been lost. These species may have homomorphic sex chromosomes, but if this is the case, they likely are not very degenerated. Instead, it seems more plausible that the sex-specific effect the authors observe is due to intrinsic differences of spermatogenesis and oogenesis. Is there any information about sex-specific differences in the fidelity of gametogenesis from other species that would support a higher likelihood of endoreplication?

      Thank you for this important question, however, we think it was a misunderstanding. We do not postulate that our observation conforms to Haldanes’ rule as, by contrast to this rule based on sex chromosomes, our previous publication demonstrated that whatever the gonadal sex differentiation is in our taxa, the ability to overcome sterility by asexual gametogenesis is always confined to female gonadal environment (or oogenesis in general), even in the transplanted spermatogonial cells (Tichopad et al. 2022). What we meant by our text is that our results do not fully conform to Haldane’s rule. We therefore reworded our text to rule out such a misconception.

      Nonetheless, we note that it has been demonstrated that Haldanes’ rule is also applicable to species with little differentiated sex chromosomes (e.g. Presgraves and Orr 1998) and that recessive incompatibilities are not the only explanation as faster male theory or faster X may also apply in such cases (Dufresnes et al. 2016). Therefore, we have kept our remarks about Haldane’s rule here. Moreover, for several parental species, we preliminary found the occurrence of an XY gonadal sex differentiation system, albeit these are unpublished and need further validation.

      The final thing I was left wondering about was this missing link between endoreplication and activating the embryonic development of the diploid egg. In these loach species, a sperm is required to activate egg development, but the sperm genome is discarded (line 100). What is the mechanism of this and how does it evolve concurrently during hybridization?

      Thank you for the comment. There have been many speculations about why gynogens actually need sperm to activate their egg development, but to our knowledge, no explanation has been validated to date. Interestingly, a recent theoretical model by Fyon et al. BiorXiv 2023 suggested that the ability of sperm exclusion may evolve separately from the ability to produce clonal eggs. Hence, this topic is complex and remains unresolved, and we feel that it is out of the scope of the present MS. We have slightly modified the text and added 2 refs., to address your suggestion.

      Reviewer #1 (Recommendations For The Authors):

      The paper is well prepared - though the resolution of Fig 1 on the pdf is rather poor.

      Thank you! We have now provided the high-resolution figures.

      Overall, I have few suggestions for improvements:

      Line 58. How does endoduplication itself "overcome accumulated incompatibilities" other than failure of synapsis? Perhaps by maintaining the F1 state, and so avoiding reduced fitness arising from recombination and disruption of coadapted gene combinations.

      We have added a sentence to the main text “Premeiotic genome endoreplication thus not only ensures clonal reproduction but also allows hybrids to overcome problems in chromosome pairing that would otherwise lead to their sterility 15,17.” that we hope sufficiently addresses this issue.

      Line 118 - please explain the AKD index here - as you have some in SI. Also please be clearer on how you measure genetic divergence as proportion of heterozygous SNPs - presumably this is via exon sequences from F1 females?

      Please note that we have explained the AKD index in the relevant part of the Methods section already. However, we have now also added a brief explanation to the Results section, as suggested. We apologize for imprecise description of the genetic divergence measurements. As described in the Methods section, this is not measured by heterozygosity (as we wrongly stated here), but as p-distance among sequences of coding regions between parental species.

      Lines 126 ff. It is unfortunate that the design of the crosses was not more balanced or extensive. Nonetheless, I do appreciate the effort involved here and think the results are solid as is.

      Thank you.

      Line 142. Please define PS and TB (and other acronyms) at first use.

      We have added the definition for all acronyms at the first use.

      Lines 192-193. What about EP and EN - as shown to have unreduced gametes in Fig. 2?

      Thank you for this question. Based on analyses of the diplotene stage, we showed that EP and EN hybrids produced diploid eggs. However, in pachytene, we did not find duplicated oocytes due to the rarity of endoreplication. Similarly, the low incidence of duplicated pachytene cells was observed in natural as well as F1-hybrids in loaches and reptiles (Newton et al., 2016, Dedukh et al., 2021, 2022).

      Lines 217-219. The observed correlation of chromosome divergence (AKD index) and numbers of bivalents in pachytene makes sense and is an important observation. Did this GLM simultaneously consider the effect of genetic divergence (as implied in methods)?

      Thank you for this comment. We originally tested separately the fit of two models, one with AKD and the other with SNP divergence. Since the AKD model significantly outperformed the SNP-based one, we focused our interpretation on the former. However, as you suggested, we now re-calculated the model taking into account the joint effects of both predictors in a single model and indeed, this model outperformed both single predictors. In conclusion, while AKD is still the strongest single predictor for the observed amounts of bivalents, the additional effect of genetic distance still significantly improves the model fit. We have now included this result into the main text.

      This finding does not alter our conclusions, it just suggests that the effect of chromosomal morphology is probably more complex, involving the role of more subtle sequence divergence or structural variants.

      Line 242. The Discussion is a great read - careful interpretation and a really interesting interpretation in context of the broader literature.

      Thank you for the appreciation. Your positive feedback and evaluation are highly motivating us to expand our work.

      Line 396. Some references from book chapters (18, 52) are incomplete. Please fix.

      We have now corrected these references accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Transparency about meiocyte sample sizes: These counts are all in supplemental table 3. From this table, it is unclear if a majority of these meiocytes are from a single individual or from multiple males or females. Or, in the crosses where there are multiple families, are the meiocytes sampled from all families? I am trying to get a sense whether endoreplication and the fidelity of oogenesis could be influenced by genetic variants segregating within species. If the meiotcytes are only sampled from a single individual from a single cross, you may not see this variation. If this is the case, perhaps the correlation between genetic divergence and the formation of asexual clones may not be as strong. Additional replicates may not be feasible, but at a minimum I think it would be helpful to address whether endoreplication could or could not be variable and if the sample sizes are sufficient.

      Thank you for raising this point. We have improved the Supplementary table to clarify how many individuals we analyzed from each family and added this information to the main text. Unfortunately, additional replicates are not feasible due to the long generation time of the fish. We otherwise agree with your comment and included this point in the Discussion.

      Gonocyte counts from parental females: The authors say they "analysed hundreds of gonocytes of sexual females without a single incidence of genome endoreplication." I could not find a clear count in the references given. They note that the incidence of endoreplication was very low in pachytene cells in this study (0.7%).

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytenic cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have significantly lower incidence. of abnormal pachytene cells. We have now included this information in the main text.

      They refer to supplemental table 4 (line 196), which does not exist in the supplement. The authors should report these numbers in the revised manuscript.

      Thank you for pointing this out. We have corrected the name of the supplementary table, it actually is supplementary table S3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Utilization of known AhR ligands as controls will strengthen the interpretation of the conclusions.

      We agree with the reviewer that AhR ligands could be used as controls for delineating structure-activity relationships and cell context-specific effects. However, such studies are beyond the scope of the current manuscript. The AhR has many endogenous ligands, including several tryptophan derived metabolites, that have been shown to elicit different responses depending on the dose and cell type. Our unpublished data show that the expression of AhR target genes such as Cyp1a1, Cyyp2e1, and Tiparp were not modulated by I3A in RAW cells, which suggests that the observed effects may occur independent of the AhR.

      Reviewer #2:

      Specific comments:

      1) The title is misleading "Microbially-derived indole-3-actate" suggests that this article is about the production of I3A by the gut microbiota, in fact this is a dietary supplementation article. The title needs to reflect this fact.

      Our title reflects the natural source of I3A in mice. We used oral supplementation to study the effects of this metabolite. Per suggestion by the reviewer, we changed the title as follows: <br /> “Oral supplementation of gut microbial metabolite indole-3-acetate alleviates diet-induced steatosis and inflammation in mice”

      2). The description of the amount of I3A in the drinking water is not properly described. The actual concentration in the drinking water should be given.

      The concentration of I3A in drinking water was as follows: WD50 = 0.5mg/ml and WD100 = 1mg/ml. We added this information in the revised manuscript.

      3) The serum concentration data of I3A is critical data and should be moved in Figure 1.

      We have now included serum levels of I3A as part of Figure 1.

      4) The authors should have determined the actual concentration of indole-3-actetate in serum by running a standard curve of I3A during the LC-MS analysis. Also, recovery and matrix effects should be determined. Without this information their data will be difficult to compare to other studies.

      We agree with the reviewer that quantification of I3A in serum would be useful. However, we are unable to do so due to limited sample available as well as concerns with sample integrity after long-term storage.

      5) In the data in Figure S1C, there appears to be only 2-3 mice out of nine that exhibit a difference in serum indole-3-acetate levels between the WD-50 and WD-100. Do the authors have an explanation for this small difference compared to the other endpoints assessed?

      The serum I3A measurements at week 16 are a snapshot that may not reflect tissue levels due to differences in water intake, I3A metabolism in the body, and/or elimination of I3A. The other phenotypic assays are physiological measurements that reflect the result of sustained administration of I3A.

      6) Since the Ah receptor may play a role in the results obtained CYP1A1 mRNA levels in the liver and intestinal tract should have been measured.

      We measured alterations in Cyp1a1 mRNA in the liver and no significant change was observed in the WD50 and WD100 groups relative to controls. Also, see response to reviewer 1.

      7) The main mechanistic experiment performed is shown in Figure 6 and the figure legend states that they are examining macrophages, but these are cell lines, they are macrophages models, and this should be clearly stated. The first two panels are liver data, so the title of the figure legend needs to reflect that fact.

      We agree and have changed the title of Figure 6 to “I3A modulates AMPK phosphorylation and suppresses RAW 264.7 macrophage cell inflammation in an AMPK dependent manner”.

      8) In Figure 6, 1 mM I3A is added to the cells, how is this very high concentration relevant to the concentrations observed in vivo? Does adding 1 mM acetate to the cell culture media lower the pH of the media and could this influence the results obtained? Would acetic acid yield the same results? Could treatment with an acid even explain in vivo results?

      It is difficult to match the concentration of I3A in the in vitro experiments to liver tissue concentrations. Addition of 1 mM I3A did not lower the pH of cell culture media or reduce the viability of cultured RAW 264.7 macrophages. As I3A is not known to degrade into acetic acid and indole, we do not expect acetic acid to recapitulate the effects elicited by I3A.

      Reviewer #3:

      My primary concern with the manuscript is the organization and interpretation of the data. It appears that little effort was given by the authors on interpreting the data and digesting it for the reader into a coherent package. Rather, the authors have collected a vast amount of data and organized it without much thought about what the reader would take away from it. Furthermore, it seems the authors have taken this as an opportunity to overload this manuscript with data that are superfluous to the conclusions the authors draw at the end. Based on this, I think the authors need to invest more time into distilled their complex biological data into a unifying scientific interpretation for the readers that advances our understanding of I3A. My suggestions for the authors are described below.

      1) The data lack a rationale behind how they are organized within the manuscript. For example, the authors will combine disparate biological pathways and lump data together without logic as in Figure 2. Why are inflammatory pathways and bile acid synthesis combined in a figure? What was the rationale?

      We respectfully disagree that the data are presented without rationale. Both inflammation and bile acid dysregulation are commonly observed with NAFLD and thus are presented in two separate panels of Figure 2 (A, inflammatory cytokines, and B bile acids).

      2) The authors give very little effort to performing integrative omics analysis even though multi-omics is provided. Example given, the authors provide proteomic data on the fatty acid metabolism pathway, however, no mention of this pathway within the metabolomic dataset. Vice versa, the authors provide in depth investigation in the metabolic changes within the tryptophan pathway, however, no investigation into the proteomic changes that may underlie this phenomenon. It would be recommended that the authors invest more energy into performing more in-depth analysis of their multi-omics data presented.

      We attempted to co-analyze the proteomic and metabolomic data, but this analysis was not informative. Protein and metabolite abundances do not necessarily correlate, and the two types of omics data carry different observation biases. For example, label-free, untargeted proteomics data favor abundant proteins, whereas untargeted metabolomics data are influenced by concentration and ionization efficiency, among other factors. Therefore, we opted to analyze the two datasets independently, and then linked the findings from the two analyses using biological pathways as guides. For example, we describe changes in acyl-carnitine and discuss how this observation is consistent with changes in abundance of fatty acid metabolism enzymes.

      3) Figures 1&2 shows that low dose treatment reduces inflammation but does not alter hepatic TG levels. This is in direct disagreement with the graphical model provided by the authors (Supp. Fig 9). In the author's model, I3A is directing hepatic lipid metabolism through modulation of macrophage inflammation. This interpretation is erroneous and needs to be reevaluated by the authors. Furthermore, the tryptophan pathway and bile acid pathways are not even represented in the model, which begs the question of why that data are included in the manuscript to begin with.

      We would like to respectfully point out that Figure 1D does show a statistically significant (p < 0.05) difference in liver TG between the WD and WD100 groups. Supp. Figure S9 is meant to be a summary of the main biochemical changes elicited by I3A that we have shown in the current study (e.g., the involvement of AMPK) rather an atlas of all the changes detected in the metabolomics and proteomic data. Specifically, we have not included the tryptophan or bile acid pathways as we do not have mechanistic information on how these changes are mediated by I3A.

      4) The authors switch from hepatocytes to macrophages without giving any rationale, The authors need to invest more time into describing a logical flow of thought when assembling the manuscript.

      We mention the rationale for investigating the effect of I3A on macrophages in the introduction (last paragraph of the section): “In vitro, both I3A and TA attenuated the expression of inflammatory cytokines (Tnfα, Il-1β and Mcp-1) in macrophages exposed to palmitate and LPS.”. We also explain why we used an in vitro model, RAW cells, at the beginning of the corresponding Results section: “Since our previous study found that the metabolic effects of I3A in hepatocytes depend on the AhR, we tested if this was also the case in macrophages.” Moreover, the strong effects of I3A on liver inflammatory cytokines also motivates the macrophage experiments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the editors for sending our manuscript for peer review and the reviewers for careful reading and their critical comments to improve the manuscript. Below, we describe the experiments that have been carried out in response to the reviewers and incorporated in the preliminary revision. We also describe our plan for the revisions that will address the remaining comments of the reviewers. Most of the comments are addressable with additional experiments (some of which are already ongoing) and these experiments will surely strengthen the study reported in this manuscript without affecting the fundamental findings. We would require up to 4-6 weeks to complete these experiments.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      ­Summary: The authors used a conditional transgenic mouse model to demonstrate that deletion of serum response factor (SRF) from adult astrocytes provides neuroprotection in various insult/ diseases contexts without promoting any obvious phenotypic deficiencies. The work builds on the group’s previous study where SRF was embryonically deleted from astrocytes and their precursor cells. Given the role of SRF in promoting glial cell differentiation, the adult conditional KO used in the current study was designed to circumvent the limitations of the previous approach. The authors used a variety of complementary approaches (including immunohistochemistry, electrophysiology, transcriptomics, and behavior) to demonstrate the therapeutic potential of their approach. However, I have questions regarding the validity of the behavioral analyses as well as some of the imaging results that dampen my overall enthusiasm.

      Major Comment #1

      The synaptogenic factors probed in Figure 3C (e.g. glypicans, thrombospondins, etc.) are not likely to play major roles in the adult brain in a non-injury context, so I do not know that these analyses provide any significant insight into potential functional changes in the mutant mice. Along the same lines, the analysis of synapse count (Figure 3D-E) seems inconsequential given that SRF was knocked out well after the period of developmental synaptogenesis. It would have been much more interesting to have performed these analyses following insult (such as the kainate injury model used by the authors) or in one of the disease models presented later in the manuscript. As it stands, I don't think they add very much to the study.

      Response: We are grateful to the reviewer for the careful reading of the manuscript. Astrocytes are known to regulate the formation, maintenance, and elimination of synapses. It has been previously shown that LPS-induced reactive astrocytes exhibit reduced expression of several synaptogenic factors, were unable to promote synapse formation and showed reduced phagocytic activity (PMID: 28099414). We wanted to determine whether the SRF-deficient reactive-like astrocytes were likely compromised in their ability to produce pro-synaptogenic factors and/or adversely affect synapse maintenance. We agree with the reviewer that analysis of synapses in the adult brain may not address the role of these mutant astrocytes in synaptogenesis. But our results indicate that the mutant astrocytes are likely not affecting synapse maintenance or exhibit altered phagocytotic activity that would result in increased or decreased synapse numbers. We will make this clearer in the revised manuscript.

      Minor Comment #2:

      The authors should note that the use of GluA1 as a postsynaptic marker will not identify silent synapses (i.e. structurally "normal" but functionally inert).

      Response: We agree with the reviewer that GluA1 will not identify silent synapses. To study silent vs functional synapses, we will stain for Piccolo (presynaptic) and NMDA receptor NR1 subunit (post-synaptic) to label all synapses and compare this with Piccolo/GluA1 co-localized synapses to identify the functional synapses.

      Reviewer #2 (Significance (Required):

      The manuscript addresses the important area of the cellular mechanisms that underlie neuroprotection. The ms adds to our understanding of genetic control of neuroprotection and should be of significant interest to others in the field. The experimental approach systematic and the data presented are generally of high quality and believable. While the ms presents quite a bit of overall cellular data that underlies various areas of neuronal and brain function that may be affected by loss of SRF, it is still somewhat descriptive. It is unclear what aspect of astrocyte reactivity is determinative, how mechanistically in normal cells SRF suppresses reactivity, and how SRF -negative reactive astrocytes confer such broad neuroprotection. While the latter is well beyond the scope of this study, the authors do propose SRF may be involved in regulating oxidative stress and amyloid plaque clearance as a potential pathway to account for SRF's role, however a more systematic discussion based on the gene expression data and known pathways would be welcome. Overall, this is a high quality ms that should be of interest to the field that identifies a SRF as a novel player in neuroprotection.

      Response: We thank the reviewer for the careful reading of the manuscript and for the positive comments. We will include a more detailed discussion on the genes and pathways based on our gene expression data that may provide insights into how SRF may regulate astrocyte reactivity and neuroprotection.

      Additional considerations:

      1. Quantification of the extent of SRF loss in astrocytes in conditional tamoxifen knockout would strengthen the quality of the data.

      Response: We will provide this data in the revised manuscript.

      While the authos did use a Sholl analysis to show hypertophic changes in SRF negative astrocytes, given that SRF is an important regulator of actin and other cytoskeletal related proteins in other cell types, and that cytoskeletal components can play an important role in cell signaling, it is somewhat surprising that the gene array analysis did not include actin and other cytoskeletal proteins, nor did the authors consider a more careful analysis of intracellular cytoskeletal changes and the potential mechanistic implications of this for observed reactivity and neuroprotection.

      Response: We agree with the reviewer that SRF is a well-established regulator of actin cytoskeleton. However, we did not any significant changes in gene expression for actin or actin-regulatory proteins. We would have expected a decrease in astrocyte morphology similar to the neurite/axon defects exhibited by SRF-deficient neurons. It is unclear whether the hypertrophic morphology is due to transcriptional regulation of actin/actin-binding proteins or due to astrocyte reactivity. This would be a very interesting question and we will investigate these aspects in future studies.

      Reviewer #3 (Evidence, reproducibility and clarity (Required):

      Summary: The study by Thumu et al., suggests that astrocytic specific deletion of SRF in mice results in morphological changes in these cells that does not affect neuronal survival, synapse number, plasticity or cognition. However, in in vivo mouse models of excitotoxic damage and neurodegenerative disease, deletion of SRF reduced neurotoxicity. The authors provide sufficient evidence to suggest that astrocytic SRF contributes to neurotoxicity in various models however some claims are made that are currently not supported by evidence.

      Major comments:

      2) The authors claim that SRF KO astrocytes undergo hypertrophy. However, the quantification of the number of intersections gives information about morphology rather than hypertrophy. Quantification of cell size (area of S100B staining) should be provided.

      Response: We will provide the data suggested by the reviewer.

      6) For the RNAseq of isolated astrocytes did the authors confirm that other cell types (e.g microglia) did not contaminate their samples?

      Response: We will provide the information requested by the reviewer.

      Reviewer #3:

      Minor comments:

      1) The authors say that in Figure 1B many astrocytes did not show any SRF expression. However, overall averages of SRF intensity are plotted in Figure 1C. It would support their claim to instead to calculate the percentage of SRF expressing cells above a certain threshold in each condition, rather than plotting the mean intensity. As a control for their method of quantifying SRF intensity in Figure 1B, demonstrating no change in SRF in neurons would provide confidence for the specificity of the knockout.

      Response: We will provide the quantification of the extent of SRF loss in astrocytes (percent astrocytes that are deleted for SRF) as suggested by Reviewer 2. We will also provide SRF intensity from neurons as suggested by the reviewer.

      2) The authors use the term "reactivation" throughout the manuscript. This could be misconstrued as re-activation and so I would suggest using the terms "reactivity" or "reactive transformation". Furthermore, only one region is quantified in Figure 1C while in later figures multiple regions are quantified. The authors should justify this decision or update the figures with data from missing regions.

      Response: We will make this change in using the term “reactivity” as suggested by the reviewer.

      3) In Figure S2 the authors should provide a positive control for their staining.

      Response: We will provide the positive control data for this experiment.

      4) Can the authors explain the large amount of variability in number of synapses in 15 mpi in Figure 3E?

      Response: We will perform more immunostainings and update the data presented in this figure.

      5) Images in Figure 2C are poorly visible and should be improved in terms of either quality or magnification.

      Response: We will provide better quality image for Figure 2C.

      8) The authors should provide a list of differentially expressed genes from RNAseq of SRF KO mice. No information is currently given in the text about the number of differentially expressed genes in the conditional knockout.

      Response: We will include this information in the revised manuscript.

      9) In figure 5A data would be better illustrated as a volcano plot (similar to Fig. S7C).

      Response: We will provide this in the revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer #1: Major Comment #2

      There is considerable variability in the behavioral results, particularly the fear conditioning and Barnes maze tasks (Figures 4F-G). Given the extremely low sample size for mouse behavior (n=5 in on group, n=7 in the other), it is highly likely that the behavioral tests were done with a single cohort of animals (which would be far from ideal) and that these experiments are significantly underpowered. Furthermore, it does not appear that the fear conditioning task was properly optimized. For example, in the control mice in context A, there were two animals that were at or very close to 0 percent freezing; these were likely outliers, or even an indication that the foot shock conditioning protocol was not working as it should. The highest percent freezing of either group was ~70%, which would have been an ideal starting place as an average for the control group. In addition, sex of the animals was not reported for these experiments. If the authors combined sexes as they did in other analyses in this paper, it is possible that they missed reaching the appropriate reaction threshold for the foot shock for some of the animals, as sex differences have previously been demonstrated in mice (DOI: 10.1037/bne0000248). Given the age at which the animals are assessed with these tasks, these specific revisions would require greater than 6 months to complete. However, as currently presented, there simply are not enough data points to make any conclusions regarding behavior.

      Response: We have performed the behavioural experiments with an additional cohort of animals for both control and mutant groups and reanalysed the data. We now have n=11 for control and n=9 for mutant group. Only males were used for the behaviour experiments, and we do not see any significant difference in behaviour between the two groups. These results are included in revised Figure 4E-G in the Preliminary Revision of the manuscript. However, we are waiting to perform the remote recall memory for the fear conditioning experiment and will include this date in the revised manuscript.

      Minor Comment #1:

      The representative GFAP images (Figure 1 E/G) do not appear to have been taken at the same magnification. This was particularly apparent in the comparison between the control and CKO hippocampus at 12mpi. It is difficult to say with certainty, due to the lack of fiducial markers in many of the images. Inclusion of a nuclear stain (DAPI) would be highly beneficial to allow the reader to make a more informed comparison.

      Response: These images were taken at the same magnification. We have included the DAPI staining for these images in Suppl. Figure 2 in the Preliminary Revision of the manuscript.

      **Referees cross-commenting**

      After reading the comments of the other reviewer, I think we're in agreement that the cellular and molecular data, while descriptive, is of mostly excellent quality. Moreover, the significance of the study is high, and the potential readership broad. However, I stand by my initial assessment of the behavioral data and find the manuscript quite lacking in this regard. Proper revisions would take at least half a year or more, so the authors may be disinclined to go this route. That being said, if the behavioral data were to be excised, I would be happy to sign off on the rest of the manuscript provided that the other major criticisms are addressed.

      Response: We thank the reviewer for the appreciation of our work. We have increased the number of animals in the behavioural experiments and do not see any significant difference between the two groups. These results are included in revised Figure 4E-G in the Preliminary Revision of the manuscript.

      In response cross-comment of Rev 2:

      Agreed that if properly conducted and presented, the behavioral data would indeed provide a nice functional correlate to the cellular work. In its current state, I'm afraid that it is instead a hindrance to the study and I would recommend that they just remove it if they choose not to address my concerns with the quality (particularly the extreme variability and the complete lack of freezing by several of the animals, especially in the controls).

      Response: We hope that the revised behaviour data would provide a strong functional correlate to the other findings in the study.

      Additional cross-comments:

      I agree with the added criticisms raised by Reviewer #3, and I think that the manuscript would be greatly improved by revisions that address those and the original criticisms from myself and Reviewer #2. I still think that the behavioral data should be omitted, provided that the authors are not capable or willing to appropriately address those concerns within a reasonable time frame.

      Response: We will address all the concerns raised by the reviewers with the required experiments to further strengthen the findings in this study.

      Reviewer #3

      Major Comment

      3) In Figure S1 the authors provide evidence showing lack of B-gal in cell types other than astrocytes (neurons/OPCs). However, microglia are missing, which could be important as later they show that microglia undergo changes in the SRF knockout model. This staining should be provided.

      Response: We have performed double immunostaining for b-gal and IbaI and do not see any overlap between IbaI and b-gal, suggesting that there is no Cre expression in microglia. We have included this data in revised Figure S1F in the Preliminary Revision of the manuscript.

      5) The authors claim in the text that microglia have thicker processes and an amoeboid shape however no evidence of this is provided in Figure S5.

      Response: We have provided data to show larger microglia area and morphology in revised Figure S5 in the Preliminary Revision of the manuscript.

      7) In the text "Enrichment analysis of Gene Ontology terms for Biological Process (GO BP) revealed that Srf deficient astrocytes showed enrichment of pathways related to cellular response to beta amyloid and beta-amyloid clearance." This is not shown in fig 5. It would be more accurate to say that there is a downregulation of genes involved in B amyloid metabolic process.

      Response: We apologize for the omission in showing that this data was presented in Suppl. Fig. S8E. We have now indicated this in the main text.

      Minor Comments:

      4) Figure 1E is missing body weight data noted in the figure legend.

      Response: We apologize for this oversight. This data was actually included in Suppl. Figure S3E and not in Figure 1. We have made the appropriate correction to Figure legend 1.

      6) In Figure 2B figure labels are missing.

      Response: We thank the reviewer for pointing out this omission. We have added the missing labels.

      7) Details of houskeeping gene normalisation are missing from qPCR data.

      Response: We apologize for not providing this information. We have included this in the revised Methods section.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Reviewer #3, Major Comment 1:

      1) The title of the manuscript is "SRF-deficient astrocytes provide neuroprotection in mouse models of excitotoxicity and neurodegeneration". It would be more accurate to say that SRF is involved in neurotoxicity in these models. To make a comment on the role of SRF in neuroprotection, experiments should be performed in spinal cord injury or ischaemia, where deficiency of SRF would be hypothesised to worsen recovery.

      Response: We disagree with the reviewer with this assessment. There is no evidence to suggest that SRF is involved in neurotoxicity. What our data suggests is that SRF deficiency results in a reactive astrocyte state that is neuroprotective in these models. We hypothesize that in injury/infection/disease conditions that would result in generation of neuroprotective astrocytes, SRF expression or function may be negatively regulated. It would be interesting to see whether the SRF-deficient astrocytes alleviate or exacerbate pathology and recovery following spinal cord injury and ischaemia.

    1. Reviewer #2 (Public Review):

      This study investigates T-cell repertoire responses in a mouse model with a transgenic beta chain, such that all T-cells in all mice share a fixed beta chain, and repertoire diversity is determined solely by alpha chain rearrangements. Each mouse is exposed to one of a few distinct immune challenges, sacrificed, and T-cells are sampled from multiple tissues. FACS is used to sort CD4 and Treg cell populations from each sample, and TCR repertoire sequencing from UMI-tagged cDNA is done.

      Various analyses using repertoire diversity, overlap, and clustering are presented to support several principal findings: 1) TCR repertoires in this fixed beta system have highly distinct clonal compositions for each immune challenge and each cell type, 2) these are highly consistent across mice, so that mice with shared challenges have shared clones, and 3) induction of CD4-to-Treg cell type transitions is challenge-specific.

      The beta chain used for this mouse model was previously isolated based on specificity for Ovalbumin. Because the beta chain is essential for determining TCR antigen specificity, and is highly diverse in wildtype mice, I found it surprising that these mice are reported to have robust and consistently focused clonal responses to very diverse immune challenges, for which a fixed OVA-specific beta chain is unlikely to be useful. The authors don't comment on this aspect of their findings, but I would think it is not expected *a priori* that this would work. If this does work as reported, it is a valuable model system: due to massively reduced diversity, the TCR repertoire response is much more stereotyped across individual samples, and it is much easier to detect challenge-specific TCRs via the statistics of convergent responses.

      While the data and analyses present interesting signals, they are flawed in several ways that undermine the reported findings. I summarize below what I think are the most substantive data and analysis issues.

      1. There may be systematic inconsistencies in repertoire sampling depth that are not described in the manuscript. Looking at the supplementary tables (and making some plots), I found that the control samples (mice with mock challenge) have consistently much shallower sampling-in terms of both read count and UMI count-compared with the other challenge samples. There is also a strong pattern of lower counts for Treg vs CD4 cell samples within each challenge.

      2. FACS data are not reported. Although the graphical abstract shows a schematic FACS plot, there are no such plots in the manuscript. Related to the issue above, it would be important to know the FACS cell counts for each sample.

      3. For diversity estimation, UMI-wise downsampling was performed to normalize samples to 1000 random UMIs, but this procedure is not validated (the optimal normalization would require downsampling cells). What is the influence of possible sampling depth discrepancies mentioned above on diversity estimation? All of the Treg control samples have fewer than 1000 total UMIs-doesn't that pose a problem for sampling 1000 random UMIs? Indeed, I simulated this procedure and found systematic effects on diversity estimates when taking samples of different numbers of cells (each with a simulated UMI count) from the same underlying repertoire, even after normalizing to 1000 random UMIs. I don't think UMI downsampling corrects for cell sampling depth differences in diversity estimation, so it's not clear that the trends in Fig 1A are not artifactual-they would seem to show higher diversity for control samples, but these are the very same samples with an apparent systematic sampling depth bias.

      4. The Figures may be inconsistent with the data. I downloaded the Supplementary Table corresponding to Fig 1 and made my own version of panels A-C. This looked quite different from the diversity estimations depicted in the manuscript. The data does not match the scale or trends shown in the manuscript figure.

      5. For the overlap analysis, a different kind of normalization was performed, but also not validated. Instead of sampling 1000 UMIs, the repertoires were reduced to their top 1000 most frequent clones. It is not made clear why a different normalization would be needed here. There are several samples (including all Treg control samples) with only a couple hundred clones. It's also likely that the noted systematic sampling depth differences may drive the separation seen in MDS1 between Treg and CD4 cell types. I also simulated this alternative downsampling procedure and found strong effects on MDS clustering due to sampling effects alone.

      It is not made clear how the overlap scores were converted to distances for MDS. It's hard to interpret this without seeing the overlap matrix.

      6. The cluster analysis is superficial, and appears to have been cherry-picked. The clusters reported in the main text have illegibly small logo plots, and no information about V/J gene enrichments. More importantly, as the caption states they were chosen from the columns of a large (and messier-looking) cluster matrix in the supplementary figure based on association with each specific challenge. There's no detail about how this association was calculated, or how it controlled for multiple tests. I don't think it is legitimate to simply display a set of clusters that visually correlate; in a sufficiently wide random matrix you will find columns that seem to correlate with any given pattern across rows.

      7. The findings on differential plasticity and CD4 to Treg conversion are not supported. If CD4 cells are converting to Tregs, we expect more nucleotide-level overlap of clones. This intuition makes sense. But it seems that this section affirms the consequent: variation in nucleotide-level clone overlap is a readout of variation in CD4 to Treg conversion. It is claimed, based on elevated nucleotide-level overlap, that the LLC and PYMT challenges induce conversion more readily than the other challenges. It is not noted in the textual interpretations, but Fig 4 also shows that the control samples had a substantially elevated nucleotide-level overlap. There is no mention of a null hypothesis for what we'd expect if there was no induced conversion going on at all. This is a reduced-diversity mouse model, so convergent recombination is more likely than usual, and the challenges could be expected to differ in the parts of TCR sequence space they induce focus on. They use the top 100 clones for normalization in this case, but don't say why (this is the 3rd distinct normalization procedure).

      Although interpretations of the reported findings are limited due to the issues above, this is an interesting model system in which to explore convergent responses. Follow-up experimental work could validate some of the reported signals, and the data set may also be useful for other specific questions.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the valuable suggestions and the overall highly positive review of our manuscript. We have now included many suggestions provided by the reviewers, which have made our manuscript much stronger and more rigorous. One reviewer acknowledged, “This study uncovers sex-dependent mechanisms underlying cold sensitivity between male and female mice. The detailed IHC analysis of MHCII expression in DRG neurons is a clear strength of this study and supports flow cytometry results as well as existing literature. The specificity of MHCII expression on small diameter is well characterized and supported by conditional knockout mouse models of MHCII in TRVPV1-lineage neurons.”

      R1: It is not, yet, possible to conclude that all experiments are adequately powered as N's for some studies are not provided.

      All experiments include N’s both in the text and in the figure legend.

      R1: It is unclear what is meant by "novel" expression, used throughout the manuscript.

      MHCII is traditionally thought to be constitutively expressed on antigen-presenting cells (APCs) and induced by inflammation on some non-APCs, including endothelial, epithelial, and glial cells (van Velzen et al., 2009). RNA seq data sets (Nguyen et al., 2021, Tavares- Ferreira et al., 2022, Usoskin et al., 2015, Lopes et al., 2017) demonstrate that mouse and human DRG neurons express transcripts for MHCII and MHCII-associated genes. However, there are no reports to date that demonstrate MHCII protein expression in terminally differentiated neurons. To the best of our knowledge, we are the first group to show that MHCII protein is expressed in DRG neurons.

      R1: The statement at the end of the abstract, "and that neuronal MHCII may also contribute to many other neurological disorders" seems premature, beyond the scope of the present study.

      We agree with the reviewer’s comment and have changed the sentence to the following: “Collectively, our results demonstrate expression of MHCII on DRG neurons and a functional role during homeostasis and inflammation” (pg. 1).

      R1: While cold allodynia (hypersensitivity) is a clinically important feature of CIPN, especially in CIPN associated with the platinum based chemotherapeutic agents, it is less so taxane CIPN. Do 60% of patients with PTX CIPN express cold allodynia or does that number refer to CIPN in general?

      This statistic is based on a study that conducted a meta-analysis of CIPN incidence and prevalence with paclitaxel, bortezomib, cisplatin, oxaliplatin, vincristine or thalidomide. However, we now include another reference (PMID: 15082135) that demonstrates patients receiving PTX experience cold hypersensitivity (pg.3).

      R1: Again, the future direction of expanding studies of the role of MHCII in other aspects of the CIPN phenotype might bear mention.

      We have included future directions regarding other aspects of CIPN phenotype in the discussion. We state, “Reducing the expression of MHCII in TRPV1-lineage neurons exacerbated PTXinduced cold hypersensitivity in both male and female mice. Future studies will evaluate the role of MHCII in PTX-induced mechanical hypersensitivity, another prominent feature of CIPN” (pg. 29).

      R1: Is there any evidence that IL-4 and/or IL-10 influence cold sensitivity?

      IL-10 and IL-4 have been shown to suppress spontaneous activity from sensitized nociceptors (Krukowski et al., 2016; Laumet et al., 2020; Chen et al., 2020) and to reduce neuronal hyperexcitability (Li et al., 2018), respectively. In addition, IL-10 has been shown to reduce mechanical hypersensitivity (Krukowski et al., 2016); however, cold sensitivity has not been evaluated. IL-4 KO mice do not have an increase in tactile allodynia or cold sensitivity after CCI; however, there is an increase in anti-inflammatory cytokines, specifically IL-10, and opioid receptors, which may be a compensatory mechanism that protects against enhanced pain after injury (Nurcan Üçeyler et al. 2011).

      R1: Are these experiments run blinded?

      Yes, this is discussed in the materials and methods section (pg. 31).

      R1: The term "directly contacts" is unclear. No synaptic structure is identified. It might be more accurate to estimate the actual proximity between the two cells, especially as direct contact would not be necessary for the type of intercellular communication they are studying. This is not an EM study.

      We agree with the reviewer’s comment and have changed the wording to “in close proximity” (pgs. 1,5, 7, 27).

      R1: Two abbreviations are used for immunohistochemistry, ICC and IHC.

      IHC refers to immunohistochemistry, and ICC refers to immunocytochemistry. We accidently wrote ICC in the immunohistochemistry section in the materials and methods section. We have now corrected it to say IHC (pg. 32).

      R1: In some figure, group sizes are not indicated (e.g., Fig. 4D).

      All group sizes are indicated in the text and figure legends.

      R1: "small non-nociceptive neurons" - seems to refer to TRPV1+ neurons. There are, however, TRPV1-nociceptors. "Therefore, the majority of MHCII+ neurons in the DRG of naïve female mice were not TRPV1- lineage neurons but non-nociceptive C-LTMRs." Could use some clarification here. Are the authors suggesting that being TRPV1- defines a neuron a non-nociceptive?

      We never said small non-nociceptive neurons are TRPV1+ neurons. We crossed TRPV1 lineage mice with td-tomato to label TRPV1 lineage neurons, which include TRPV1 neurons, IB4, and a subset of Aẟ neurons. We found that TRPV1 lineage neurons comprise about 65% of small diameter neurons, so 35% of small diameter neurons are not TRPV1 lineage cells. These non- TRPV1 lineage small diameter neurons are non-nociceptive LTMRs, most likely TH and MrgB4 neurons.

      R2: The most pressing concern regarding this study is a lack of a vehicle control group. It is not appropriate to be comparing paclitaxel treated mice to naïve mice. Please include a vehicle treatment (cremophor:ethanol 1:1 diluted 1:3 in PBS) group for all experiments involving paclitaxel.

      We believe the most appropriate control to paclitaxel treatment is the naïve control because clinically, paclitaxel is always administered to the patient in a formulation of 50% Cremophor and 50% ethanol. In clinical studies, the controls are healthy no-pain individuals and patients receiving paclitaxel without pain. However, the percentage of patients receiving paclitaxel that do not develop CIPN is low, emphasizing the need for healthy individuals not taking paclitaxel.

      R2: Figure 1A only includes representative images of a small number of T cells in presumable contact with DRG neurons in female Day 14 paclitaxel mice but does not include images from other groups. Similarly, B-D show a single CD4+ T cell in contact with DRG neurons in Day 14 paclitaxel and naïve female mice. Please include quantification of the frequency of CD4+ T cells interacting with DRG neurons in the different experimental groups utilized in this study.

      We have now quantified the number of CD4+ T cells per mm2 of DRG tissue, which is found in the text (pg. 5) and figures (Fig. S1 and Fig. 1A). We plan to add the quantification of CD4+ T cells per mm2 of DRG tissue for naïve and day 14 PTX-treated male mice. This data will be included in the text (pg. 5) and in Fig. S1.

      R2: Please include entire blot for Figure 2A (or at least more of the blot). There is plenty of space in the figure and as it currently appears is not free from apparent manipulation.

      We included a larger area of the western blot in Fig 2A (pg. 9).

      R2: The authors conclude that MHCII helps to suppress chemotherapy-induced peripheral neuropathy, resolving cold allodynia following paclitaxel treatment. To support this conclusion, I think it is necessary to include a time-course experiment highlighting whether cKO of MHCII in TRPV1 neurons indeed increases the duration for cold hypersensitivity to resolve following paclitaxel treatment.

      We conclude that neuronal MHCII suppresses cold hypersensitivity in naïve male mice and reduces the severity of PTX-induced cold hypersensitivity at the peak of the response (day 6) (pg. 1-2). In addition, knocking out one copy of MHCII in male TRPV1-lineage mice reduced total neuronal MHCII in naïve and PTX-treated mice (day 7 and 14) (pgs. 21-22; Fig.7). Moreover, knocking out one copy of MHCII in female TRPV1-lineage mice reduced surface- MHCII in female 7 days post-PTX (pgs. 19-20; Fig.6). Future studies will investigate the distinct roles of surface and intracellular neuronal MHCII and the contribution of MHCII to the resolution of CIPN.

      R2: The graphical abstract is misleading. The authors suggest paclitaxel is acting exclusively via TLR4 and that signaling is resolved at Day 14 which their data does not support. Please adjust to reflect findings from the experiments included in this study.

      We have removed TLR4 from our graphical abstract as we do not investigate the role of TLR4 in this manuscript. However, we do not suggest paclitaxel is acting exclusively through TLR4. We modified our wording to indicate both pro-inflammatory cytokines and PTX act on neurons to induce hyperexcitability and neurotoxicity: “Pro-inflammatory cytokines and PTX act on DRG neurons inducing hyperexcitability (Li et al., 2018, Boehmerle et al., 2006, Li et al., 2017) and neurotoxicity (Goshima et al., 2010, Flatters and Bennett, 2006), which manifests as pain, tingling, and numbness in a stocking and glove distribution (Rowinsky et al., 1993)” (pg. 9).

      R2: Figure 4 and 6 MHCII labelling is oversaturated in most of the images, creating a blurry hue in the representative images. This should be fixed.

      The signal intensity of immune cell MHCII is >5 times greater than neuronal MHCII; therefore, in order to visualize neuronal MHCII, the immune cell MHCII is oversaturated. We reference this in the discussion (pg. 26).

      R2: The effects of the PTX cHET group are very mild in both the male and female cohorts, and specific to 1 trial. R3: Furthermore, the behavioral effect is seemingly variable, with only one of the three trials being significantly different between groups. This variable response needs to be discussed further.

      This behavioral assay was developed by the UNE COBRE Behavior Core, under the guidance of Dr. Tamara King, who has extensive experience in using learning and memory measures to determine changes in pain such as development of thermal hypersensitivity (1-3, King et al, Nat Neuro 2009). Methodologically, the process is as follows: In the temperature placed preference assay, mice are placed on the reference plate (25 °C) to begin each 3-minute trial. For the habituation trial, both the test and reference plates are set to 25 °C, and the mice are allowed to explore for 3 minutes. The following 3 trials are the acquisition trials where the reference plate is set to 25 °C and the test plate to 20 °C. If the animals have cold hypersensitivity, modeling cold allodynia, then they will demonstrate faster acquisition of a learned avoidance response compared to the WT controls. For the results, we will clarify our findings, which are outlined below: 1) We will change the axis labels to better distinguish BL/habituation trial from reference trials in the graphs. 2) We will add graphs comparing naïve versus PTX for male and female WT mice. 3) The changes in the graphs will now reflect 3 key findings: First, we note that PTX-treated mice learn to avoid the cold test plate faster than the naive controls in the WT mice reflecting PTX-induced cold hypersensitivity. Of interest, both males and females demonstrate learned avoidance by trial 2 and that the percent of time on the cold plate continued to decline only in the PTX-treated mice. We had not graphed this in the original figure and plan to add graphs for both male and female WT mice. These graphs are important to include as it validates that this TPP can capture the expected PTX-induced cold hypersensitivity in WT mice. Second, in terms of the naïve cHET mice, these data show that both female and male cHET mice demonstrate faster learning to avoid the cold (20 °C) plate compared to the WT mice (Fig. 8A, B. We note that the males demonstrate a more robust effect, (faster learned avoidance of the cold plate) with significant avoidance to the cold plate emerging in the cHET mice by trial 3 compared to trial 4 in the females (sig diff compared to BL trial). Third, we observed that cHET mice treated with PTX demonstrate even more accelerated learning to avoid the cold plate compared to WT mice treated with PTX. This observation suggests that PTX-treated cHET mice have heightened cold allodynia compared to the WT mice.

      R2: The statistical analysis (for the behavior) should also have been a mixed-effects repeated measures between groups ANOVA.

      We agree and re-analyzed our behavior data using repeated measures mixed-effects model (REML) with Dunnett’s multiple comparison test comparing trials 2-4 to trial 1 within same group, and Sidak’s multiple tests for significance between groups at the same trial (pgs. 23-25; Fig. 8)

      R3: Presented in Figure 3, the authors present data to show surface expression of MHCII, along with the ability of MHCII to present OVA peptide, on naïve and PTX-treated DRG neurons. These data are probably the most relevant in terms of expression as they look at the surface expression of MHCII along with the potential of MHCII to function; therefore, it is unclear why the authors only conducted this analysis on female neurons, and not both male and female neurons. Given the claims of the paper in terms of sex differences for MHC expression, I strongly suggest this is done in order to put the other observations into context.

      We completely agree and have added male mice data in Figs. 2 and 3. By western blot, we show that PTX increased the amount of MHCII protein 14 days post-PTX in DRG neurons from female mice, but there’s no change in MHCII protein after PTX in male mice (Fig. 2). In agreement with the western blot, surface-MHCII determined by flow cytometry did not increase after PTX on DRG neurons from male mice (Fig. 3B). Moreover, the frequency of DRG neurons from male mice with surface-MHCII (determined by ICC) and OVA peptide did not change after PTX treatment (Fig. 3D, E). However, the percent area with polarized MHCII on DRG neurons from male mice increased 14 days post-PTX, indicating a modest PTX-induced response in males (Fig. 3F). We have now included the frequency of surface-MHCII on DRG neurons from male and female mice after PTX treatment, and again there was no change in surface-MHCII in male mice (Fig. 6). Collectively, our data demonstrates that neuronal MHCII in male mice is not strongly regulated by PTX treatment.

      R3: Given the data presented in Figure 3, it is not clear what the relevance of investigating the subcellular puncta expression of MHCII neurons is, particularly when considering the sex differences observed, and how this was not been performed for surface expression.

      We now include surface and total MHCII quantification for male and female WT and cHET mice (Figs. 6,7). In the text, we describe the significance of surface versus endosomal MHCII. “While endosomal MHCII can promote TLR signaling events(Liu et al., 2011), expression of MHCII on the cell surface is required to activate CD4+ T cells.” (pg. 10). “Although the major role for surface MHCII is to activate CD4+ T cells, cAMP/PKC signaling occurs in the MHCII-expressing cell(Harton, 2019). In addition, it has recently been shown that endosomal MHCII plays an important role in promoting TLR responses(Liu et al., 2011), and since DRG neurons are known to express TLRs (Lopes et al., 2017, Wang et al., 2020, Cameron et al., 2007, Barajon et al., 2009, Xu et al., 2015, Zhang et al., 2018), this suggests the potential for T-independent responses in MHCII+ neurons. Knocking out one copy of MHCII in TRPV1- lineage neurons (cHET) from female mice did not change total MHCII 7 days post-PTX but reduced surface-MHCII. Accordingly, PTX-treated cHET female mice were more hypersensitive to cold than PTX-treated WT female mice, suggesting a role for neuronal MHCII in CD4+ T cell activation and/or neuronal cAMP/PKC signaling. In contrast, knocking out one copy of MHCII in TRPV1-lineage neurons (cHET) from male mice did not change surface-MHCII in naïve or PTX-treated mice but reduced total MHCII, indicating endosomal MHCII and potentially a role in TLR signaling. Future studies are required to delineate MHCII surface and endosomal signaling mechanisms in naïve and PTX-treated female and male mice.” (pg. 28).

      R3: Furthermore, the authors should provide details of what the abundant non-neuronal structures are within the DRG images that appear positive for MHCII staining.

      We now include an image of the high MHCII+ cells in mouse DRG co-stained with macrophage and dendritic cell markers (CD11b/c), indicating the presence of immune cells (Fig. S6).

      R3: The behavioral data presented in Figure 7 is somewhat confusing. Can the authors confirm how many alleles of MHCII were knocked out from the Trpv1-lineage neurons for these experiments? In Figure 7, it states cKO Het, which suggests that only one allele was deleted within the Trpv1 population. If this is the case, this needs to be clearly outlined within the results section and not simply referred to as "knocking out MHCII in Trpv1-lineage neurons". In addition, an explanation as to why heterozygous cKO were used rather than homozygous cKO needs to be provided. This is particularly relevant when discussing potential sex differences.

      The mouse behavior is performed in wild type and TRPV1lin MHCII+/- heterozygote mice (Fig 8). Instead of saying we knocked out MHCII, we changed the text to “knocking out one copy of MHCII in TRPV1-lineage neurons” (pgs. 23, 29). In the methods section, we state that “cHET×MHCIIfl/fl crosses only yielded 8% cKO mice (4% per sex) instead of the predicted 25% (12.5% per sex) based on normal Mendelian genetics. Thus, cKO mice were only used to validate MHCII protein in small nociceptive neurons” (pg. 30) (Fig 7).

      R3: A significant gap in the current manuscript is the functional assessment of MHCII protein expressed on DRG neurons in terms of T cell activity. I would suggest the authors consider performing a co-culture DRG-T cell (i.e. Treg) assay where anti-inflammatory cytokine release can be measured in the presence and absence of MHCII on DRG neurons.

      The functional implication of MHCII protein on DRG neurons in terms of T cell activity is out of the scope of this manuscript. We currently have another manuscript in progress investigating CD4+ T cell signaling and cytokine production when co-cultured with DRG neurons. R3: Within the first paragraph of the results section, the authors reference Goode et al, 2022, stating that they have previously shown that CD4+ T cells in the DRG secrete anti-inflammatory cytokines. I have read this paper and could not find any data that showed increased secretion of cytokines, only that there is an increase in T-cell populations that contain anti-inflammatory markers. Please consider rewording to reflect the observations made in the original paper. We have changed “secrete” to “produce” (pg. 5) because we detected anti-inflammatory cytokines (IL-10 and IL-4) within CD4+ T cells using intracellular staining and multi-color flow cytometry.

      R3: Figure 1A states that it is "day 14 PTX", however, there is no reference to this in the corresponding text - please state what Figure 1A is showing in the main text and legend regarding PTX treatment.

      We have now included text and Fig. 1. legend that states that the images in Fig1A are of DRG tissue collected from female mice 14 days after PTX treatment (pg. 5).

      R3: Throughout the results section (Figure 3-Figure 6), the authors provide percentage changes in observed difference in expression, however, in addition to this, it would be valuable to have the actual number of neurons analysed for each group and sex.

      We now report in the materials and methods section the number of neurons that were analyzed (pg. 33).

      R3: For Figure 5, can the authors confirm whether this was performed on tissue sections or dissociated cell culture?

      This analysis was performed in DRG tissue sections. The legend now states, “Gaussian distribution of the diameter of MHCII+ DRG neurons in DRG tissue from naïve (pink), day 7 (orange) and day 14 PTX-treated (blue) (A) female and (E) male mice (n=8/sex, pooled neurons).”

      R3: Can the authors comment on why surface expression for MHCII was not performed on the these reporter neurons?

      In the future, we plan to delineate which subsets of neurons express MHCII by co-staining for MHCII and specific neuronal markers. However, these studies are beyond the scope of the current manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      1. If doable, image dynein and dynactin simultaneously in the Halo-DYNC1H1/DCTN4-SNAP iNeurons. Co-movement of dynein and dynactin towards the somatodendritic compartment and their separate movement in the anterograde direction along the axon would provide the most convincing evidence for the key claims of the manuscript.

      Please see the planned revision section for our response

      Reviewer #2:

      Major comment (requires additional experimentation)

      1. While the data presented do certainly suggest that dynein and Lis1 are transported anterogradely on separate vesicular cargoes from dynactin and Ndel1, the study would be much stronger if supported by dual imaging of dynein and dynactin to prove that these proteins do indeed move in association with separate vesicular populations. I would like to see dual-color kymograph traces showing that the proteins move independently. The authors should be able to accomplish this using their dual Halo-DYNC1H1/DCTN4-SNAP hESC line. To acquire and analyze this data might take several months, but it would greatly strengthen this paper. If the authors do this experiment, they may also be able to address the mechanism of reversal of anterograde cargoes which they speculate about in the Discussion, which would add even more interest and insight.

      Please see the planned revision section for our response

      Minor comments (addressable without additional experimentation)

      1. The authors deduce that 1-4 Halo fluorochromes corresponds to 1-2 dynein molecules. This implies that the cells are homozygous for the Halo tag, but I do not see this addressed explicitly. The authors should state explicitly whether the lines generated for their study are heterozygous or homozygous for the tag. If the cells are heterozygous, which would seem most likely, then they may be underestimating the number of dyneins per spot and should take this into account.

      We have added whether lines are homozygous or heterozygous to the manuscript. We also include a new Supplementary Figure panel (Fig S6) showing the genotyping data. In summary, all lines are homozygous except for PAFAH1B1-Halo (hESCs) which is heterozygous.

      1. Why are the moving spots lower in intensity than the NEM-treated static spots. It appears to suggest that they may be associated with different structures. This should be clarified and discussed.

      Our data suggest that the fast-moving spots have fewer dyneins than NEM treated static spots. We suggest this is because the fast-moving cargos are smaller than the average cargo and therefore have fewer dyneins on them. This is also supported by the smaller number of dyneins reported previously on endosomes as compared to the large lysosomes. We have clarified this in the discussion (page 7-8).

      1. The authors state in the Results that most of the dynein spots were diffusing, often along microtubules, but they do not visualize microtubules so how do they know this? They may need to remove the phrase "often along microtubules".

      This has been removed.

      1. At the end of the Introduction the authors state that their data "allow us to understand how the dynein machinery drives long-range transport in the axon". This is an overstatement. The "how" in this sentence is not addressed in this study.

      We have softened the sentence by adding the phrase “better understand”.

      1. The conclusion that dynein binds to cargos stably throughout their transport along the axon is based on measurements of the fastest moving cargoes but the authors do not provide data on the distribution of velocities for the entire population of retrograde cargoes. It is not valid to extrapolate the behavior of a small number of cargoes to the entire population. The average may be much slower than the fastest cargoes. Moreover, even for the fastest organelles the authors cannot say that the dynein is stably bound because they did not track single cargoes and thus do not know that the cargoes moved continuously in one single bout of movement for 500 µm; it is possible that the cargoes moved in multiple consecutive bouts interrupted by brief pauses and dynein motors may have exchanged between bouts.

      We have added a section to the discussion to highlight that other cargos may behave differently from the fastest ones (page 7). We have also clarified the assumptions that lead us to expect a slower arrival time of the first signal (page 5).

      1. The authors say that "it is clear that at least some dyneins remain on cargoes throughout their transport along the axon". As explained above, the data do not prove this so this statement should be removed.

      We have softened this sentence from “it is clear” to “our results suggest” and explained in more detail why we make this conclusion

      1. The authors note that most of the dynein spots were not moving processively and state that this is consistent with prior studies showing that only a subset of dynein is actively involved in transport. However, as they note elsewhere, dynein is both motor and cargo and most axonal dynein is transported at slow average velocities so maybe they should be more explicit about what they mean by "involved in transport".

      We have clarified we mean fast axonal transport and thank the reviewer for highlighting this point.

      1. When the authors note that most of the dynein in axons is transported in the slow component of axonal transport, they should also cite the work of Pfister and colleagues who were the first to show this (PMID 8824315 and 8552592).

      This was an omission on our part. The references have now been added.

      1. The authors propose that dynein and Lis1 are transported together but there were significantly fewer anterogradely transported Lis1 particles than dynein particles. This should be discussed.

      We have added more information to the discussion. Although we cannot rule out this effect being due to the heterozygous tagging of our LIS1 cell line, we do not witness the same decrease in events in the retrograde direction. Therefore, we believe there is a subset of anterogradely moving dynein lacking LIS1. As discussed in the manuscript, this subset may already be bound to dynactin and therefore not require LIS1.

      1. For the statistical analysis, the authors should provide p values in the legends for the comparisons that are judged to be "not significant". The authors should also be consistent in how they label differences that are not significant - they mark them as "ns" in Fig. 1, but in the other figures they do not, leaving some ambiguity about whether particular comparisons were not tested or were found to be not significant. For example, in Fig. 4C the average speed of the dynactin is about 0.5 µm/s greater than for the other proteins and the spread in the data suggest that this could be significant, but no significance is indicated on the plot, implying p>0.05. It is not clear how confident we can be that there is no difference.

      We have now included all p values in the figure legends and have removed the “ns” in Fig 1D. In our revised manuscript, only significant differences are highlighted in the figures.

      Reviewer #3:

      • if I look at the kymographs, trajectories appear rather complex, pausing, standing still, moving and everything mixed. The explanation of how actual trajectories are extracted and on what basis is very short, too short for me. I think the authors should expand this. Furthermore, I think it would be good if the authors would present, in their kymographs examples of the tracked (and also the not included) tracks. Maybe in supplementary info.

      The analysis of this data used the Trackmate Fiji plugin. This tracks spots frame to frame in a movie and then outputs the data of the tracks. No data was extracted from kymographs but they were used as a graphical illustration of the moving spots. To better explain our analysis pipeline, we have expanded our methods section and have added an example of a tracked movie (Video 15) as well as highlighted the tracked spots in one kymograph example (Figure 7S).

      • I found 'velocity' ill defined. I get the impression, judging from the number of points (compared to the other parameters) that the authors determine the average velocity of each individual trajectory. That is an important parameter (but should indeed be called 'trajectory averaged' velocity), but might not be the only one useful to learn from the data, where trajectories do not always appear to have constant speeds (pausing, etc.). Why do the authors not determine point-to-point velocities and plot histograms of those for all the trajectories (simply plot histograms of all the displacements between subsequent data points in trajectories)? This might provide great insight into the actual maximum velocity and the fraction of pausing or moving in opposite direction etc., providing much more molecular detail than currently extracted from the data.

      The reviewer is correct. We have measured the average velocity of the spots from the beginning of the track to the end. We have clarified this in the text. Furthermore, as stated above in the revision plan, we are currently doing the additional analysis and will include it in the final revision

      • I was a bit surprised to read that the authors have gone to the effort to create a dual-color labeled cell line, but did not do actual correlative two-color measurements (or at least show them). It would be so insightful to see dynein and dynactin move separately in the anterograde direction.

      Please see the planned revision section for our response.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      REVIEW COMMENT

      The article titled "The tRNA thiolation-mediated translational control is essential for plant immunity" by Zheng et al. highlights the critical role of tRNA thiolation in Arabidopsis plant immunity through comprehensive analysis, including genetics, transcriptional, translational, and proteomic approaches. Through their investigation, the authors identified a cbp mutant, resulting in the knockout of ROL5, and discovered that ROL5 and CTU2 form a complex responsible for catalyzing the mcm5s2U modification, which plays a pivotal role in immune regulation. The findings from this study unveil a novel regulatory mechanism for plant defense. Undoubtedly, this discovery is innovative and holds significant potential impact. However, before considering publication, it is necessary for the authors to address the various questions raised in the manuscript concerning the experiments and analysis to ensure the reliability of the study's conclusions.

      Response: Thank you very much for your support and suggestions!

      Here is Comments:

      Line 64-65:

      The author mentioned that 'While NPR1 is a positive regulator of SA signaling, NPR3 and NPR4 are negative regulators.' However, several recent discoveries are suggesting that it may not be a definitive fact that NPR3 and NPR4 are negative regulators. Therefore, I recommend the authors to review this section in light of the findings from recent papers and make necessary edits to reflect the most current understanding.

      Response: Thank you for your feedback. Since we mainly focused on NPR1 in this study, we removed this sentence to avoid confusion. We provided additional information about NPR1 in the Introduction section to emphasize the importance of NPR1 (Line 64-68).

      Line 182- & Figure 4:

      The author conducted RNA-seq, Ribo-seq, and proteome analysis. Describing the analysis as "transcriptional and translational" using RNA-seq and proteome data seems not entirely accurate. Proteome data compared with RNA-seq not only reflects translational changes but may also encompass post-translational regulations that contribute to the observed differences. To maintain precision, the title of this section should either be modified to "transcriptional and protein analysis" or, alternatively, compare RNA-seq and Ribo-seq data to demonstrate the transcriptional and translational changes more explicitly.

      Responses: Thank you for your suggestions. We agree with you and thus change the description accordingly throughout the manuscript.

      Line 229-235 and Figure 5C:

      The interpretation of Figure 5C's polysome profiling results is inconclusive. There does not seem to be a noticeable difference in polysomal fractions between the cab mutant and CAM. The observed differences in the overlay of multiple polysome fractions between cab and COM could be primarily influenced by baseline variations rather than a significant decrease in the polynomial fractions in cpg. Therefore, it is necessary to carefully review other relevant papers that discuss polysome fraction data and their analysis. By doing so, the authors can make the appropriate corrections to ensure accurate interpretations.

      Responses: Thank you for your comments. We agree that the difference between cgb and COM was not dramatic visually. This is a common feature of polysome profiling assay (e.g. Extended Data Fig. 1f in Nature 545: 487–490; Fig. 1c in Nature Plants, 9: 289–301). In our case, the difference between polysome fractions was unlikely due to the baseline variation for two reasons. First, baseline variation affects monosome and polysome fractions in the same way. However, our results showed the monosome fraction of cgb is higher than that of COM, whereas the polysome fraction of cgb is lower than that of COM. Second, this result was repeatedly detected. For better visualization, we adjusted the scale of Y axis in the revised manuscript (Figure 5D).

      Line 482 Ion Leakage assay:

      I could not find the ion leakage assay in this manuscript, so I wonder why it is mentioned.

      Response: We are sorry for the mistake. The Ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version.

      Materials and Methods:

      To enhance the reproducibility of the study, the authors should provide a more detailed description of the materials and methods, especially for critical experiments like the Yeast-two-hybrid assays. Clear documentation of specific reagents, strains, and protocols used, along with information on controls, will bolster the validity of the results and facilitate future research in this area.

      Response: Thank you for your suggestions. We provided more details in the methods. For yeast two-hybrid assays, the vector information was included in “Vector constructions” section.

      Minor Point:

      Line 61: There is a space between ')' and '.', which needs to be edited.

      Response: The space was deleted.

      Reviewer #1 (Significance): This study holds significant importance within the field of plant immunity research. The authors have made valuable contributions through their comprehensive analysis, encompassing genetics, transcriptional, translational, and proteomic approaches, to elucidate the critical role of tRNA thiolation in plant immunity. One of the major strengths of this study lies in its ability to shed light on a previously unknown regulatory mechanism for plant defense. By identifying the cbp mutant and investigating the role of ROL5 and CTU2 in catalyzing the mcm5s2U modification, the authors have unveiled a novel aspect of plant immune regulation. This innovative discovery provides a deeper understanding of the intricate molecular processes governing immunity in plants.

      Moreover, the study's findings are not limited to the immediate field of plant immunity but also have broader implications for the scientific community. By employing diverse methodologies, the authors have demonstrated how tRNA thiolation exerts control over both transcriptional and translational reprogramming, revealing intricate links between these processes. This integrative approach sets a precedent for future research in the field of plant molecular biology and opens up new avenues for investigating other aspects of immune regulation.

      In terms of its relevance, the study's findings have the potential to captivate researchers across various disciplines, such as plant biology, molecular genetics, and translational research. The insights gained from this study may inspire researchers to explore further the role of tRNA in other regulation.

      Response: Thank you very much for your positive comments and support!

      Reviewer #2 (Evidence, reproducibility and clarity): The authors presented an intriguing and previously unknown mechanism that the tRNA mcm5s2U modification regulates plant immunity through the SA signaling pathway, specifically by controlling NPR1 translation. The manuscript was well-written and logically structured, allowing for a clear understanding of the research. The authors provided strong and persuasive data to support their key claims. However, further improvement is required to strengthen the conclusion that mcm5s2U regulates plant immunity by controlling NPR1 translation.

      Response: Thank you very much for your positive comments and support!

      Major comments:

      1. NPR1 translation should be examined to verify the Mass Spec (Figure 5B) and polysome profiling data (Figure 5D) by checking the NPR1 protein and mRNA level using antibodies and qPCR, respectively, in the cgb mutant background to establish a concrete confirmation of CGB regulation in NPR1 translation.

      Response: This is a very constructive suggestion. We performed these experiments and found that the transcription levels of NPR1 were similar between COM and cgb both before and after PsmES4326 infection (Figure S2), which is consistent with RNA-Seq data. Consistent with the Mass Spec and polysome profiling data, the NPR1 protein level was much higher in COM than that in cgb(Figure 5C) after Psm ES4326 infection. Together, these data further supported our conclusion that translation of NPR1 is impaired in cgb.

      1. Analyzing the genetic epistasis of CGB and NPR1 to check if CGB regulates plant immunity through the NPR1-dependent SA signal pathway. If the authors' claim is valid, I would expect no addictive effect on bacterial growth in the cgb/npr1 double mutant compared to the single mutants. Due to the broad impact of CGB on plant signaling (Figures 4E and 4F), the SA protection assay, which concentrates on the SA signal pathway, needs to be tested in WT, cgb and npr1 plants as an alternative assay to the genetic epistasis analysis. I expect that the SA-mediated protection is also compromised in cgb mutant background.

      Response: Thank you for your suggestions. We did examine the growth of Psm ES4326 in the cgb npr1_double mutant and found that _cgb npr1 was significantly more susceptible than npr1 and cgb (Figure below). Although the additive effects were observed, this result was not against our conclusion for the following reasons. First, the translation of NPR1 was reduced rather than completely blocked in cgb. In other words, NPR1 still has some function in cgb. But in the cgb npr1 double mutant, the function of NPR1 is completely abolished, which explains why cgb npr1 was more susceptible than cgb. Second, in addition to NPR1, some other immune regulators (such as PAD4, EDS5, and SAG101) were also compromised in cgb(Figure 5B), which explained why cgb npr1 was more susceptible than npr1. Since the result of the genetic analysis was not intuitive, we decided not to include it in the manuscript.

      SA signaling is known to regulate both basal resistance and systemic acquired resistance (SA-mediated protection). We have shown that cgb is defective in the defect of basal resistance, which cgb is sufficient to support our conclusion that the tRNA thiolation is essential for plant immunity. We agree that it is expected that the SA-mediated protection is also compromised in cgb. We will test this in the future study.

      1. Could the authors comment on why using COM instead of WT as a control to perform the majority of the experiments?

      Response: Thank you for your comments. In addition to ROL5, the cgb mutant may have other mutations compared with WT.COM is a complementation line in the cgb background. Therefore, the genetic background between COM and cgb may be more similar than that of WT and cgb.

      1. In Figure 5E, why does ACTIN2 have an enhanced translation while NPR1 shows a compromised one in cgb mutant? How does the mcm5s2U distinguish NPR1 and ACTIN2 codons? Does mcm5s2U modification have both positive and negative roles in regulating protein translation?

      Response: Thank you for raising this question. As previously reported, loss of the mcm5s2U modification causes ribosome pausing at AAA and CAA codons. Therefore, the translation of the mRNAs with more GAA/CAA/AAA codons (called s2 codon) is likely to be affected more dramatically in cgb. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). The average percentage is 8.5%, while NPR1 contains 10.1% s2 codon and actin contains only 4.5% s2 codon. When fewer ribosomes are used for translation of the mRNAs with high s2 codon percentage, more ribosomes are available for translation of the mRNAs with low s2 codon percentage, which may account for the enhanced translation efficiency. To focus on NPR1 and to avoid confusion, we removed the ACTIN data in the revised manuscript.

      1. Specify the protein amount used for the in vitro pull-down assay and agrobacteria concentration used for the tobacco Co-IP assay in the protocol section.

      Response: We added this information in Method section in the revised manuscript.

      4. Delete the SA quantification and Ion leakage assay in the protocol, which are not used in the study.

      Response: We are sorry for the mistake. The SA quantification and ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version. We deleted them in the revised manuscript.

      1. The strain Pst DC3000 avrRPT2 was not used in this study. Please remove it.

      Response: We are sorry for the mistake. The strain Pst DC3000 avrRPT2 was used for ion leakage assay in previous visions of the manuscript. We deleted it in the revised manuscript.

      1. In Figure 5F, did the 59 genes tested overlap with the 366 attenuated proteins in the cgb mutant? Were the 59 genes translationally regulated?

      Response: Thank you for your suggestion. Venn diagram analysis revealed that 12 genes (about 20%) are also attenuated proteins, suggesting that the mcm5s2U modification regulates the translation of some SA-responsive genes.

      Reviewer #2 (Significance): The authors' study is significant as it establishes the first connection between tRNA mcm5s2U modification and plant immunity, specifically by regulating NPR1 protein translation. This research expands our understanding of the biological role of tRNA mcm5s2U modification and highlights the importance of translational control in plant immunity. It is likely to captivate scientists working in this field.

      Response: Thank you very much for your positive comments and support!

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, the authors identified a cgb mutant that carries a mutation in the ROL5 gene Both the cgb mutant and the newly created rol5-c mutant are susceptible to the bacterial pathogen Psm. The authors showed that ROL5 interacts with CTU2, the Arabidopsis homologous protein of the yeast tRNA thiolation enzyme NCS2. A ctu2-1 mutant is also susceptible to Psm, suggesting the tRNA thiolation may play a role in plant immunity. Indeed, tRNA mcm5S2U levels are undetectable in rol5-c and ctu2-1 mutants. The authors found that the cgb mutation significantly attenuated basal and Psm-induced transcriptome and proteome changes. Furthermore, it was found that the translation efficiency of a group of SA signaling-related proteins including NPR1 is compromised.

      The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity using the rol5 and ctu2 mutants. The authors may consider the following suggestions and comments to improve the manuscript.

      Response: Thank you very much for your support and suggestions!

      1. The function of the Elongator complex in tRNA modification/thiolation has been extensively studied. In Arabidopsis Elongator mutants, mcm5S2U levels are very low, similar to the levels in the rol5 and ctu2 mutants (Mehlgarten et al., 2010, Mol Microbiology, 76, 1082-1094; Leitner et al., 2015 Cell Rep). In elp mutants, the PIN protein levels are reduced without reduced mRNA levels (Leitner et al., 2015), indicating that Elongator-mediated tRNA modification is involved in translation regulation. The Elongator complex plays an important role in plant immunity, though the reduced mcm5S2U levels in elp mutants were not proposed as the exclusive cause of the immune phenotypes. In fact, it would be difficult to establish a cause-effect relationship between tRNA modification and immunity. These results should be discussed in the manuscript.

      Response: Thank you very much for your insightful comment on the role of the ELP complex in tRNA modification and plant immunity. We added a paragraph discussing the ELP complex in the revised manuscript (Line 280-295).

      In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the tRNA modification activity of the ELP complex underlies its function in plant immunity.

      1. The interaction between CTU2 and ROL5 in Y2H has previously been reported (Philipp et al., 2014). The same report also showed reduced tRNA thiolation in the ctu2-2 mutant using polyacrylamide gel. These results should be mentioned/discussed in the manuscript.

      Response: Thank you for pointing them out. We added this information in the revised version (Line 146-147).

      1. tRNA modification unlikely plays a unique role in plant immunity. It can be inferred that mutations affecting tRNA modification (rol5, ctu2, elp, etc.) would delay both internal and external stimulus-induced signaling including immune signaling.

      Response: We agree with you that tRNA modification has other roles in addition to plant immunity. In the Discussion section, we have mentioned that “it was found that tRNA thiolation is required for heat stress tolerance (Xu et al., 2020). ……It will also be interesting to test whether tRNA thiolation is required for responses to other stresses such as drought, salinity, and cold.” (Line276-279).

      1. It would be interesting to conduct statistical analyses on the genetic codons used in the CDSs whose translation was attenuated as described in the manuscript. Do these genes including NPR1 use more than average levels of AAA, CAA, and GAA codons? If not, why their translation is impaired?

      Response: Thank you for your suggestion. We called GAA/CAA/AAA codons s2 codon. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). NPR1 does contain more s2 codon (10.1%) than the average level (8.5%). We are preparing another manuscript, which will report the relationship between s2 codon and translation.

      Referees cross-commenting

      It is important to put current research in the context of available knowledge in the field. The digram in Figure 3C shows that the Elongator complex functions upstream of ROL5 & CTU2 in modifying tRNA. The function of Elongator in plant immunity has been well established. The similarities and differences should be discussed. Additionally, it may no be a good idea to claim that the results are novel.

      Response: Thank you for your comments. We added a paragraph discussing the ELP complex in the revised manuscript (Line 280-295). The ELP complex catalyzes the cm5U modification, which is the precursor of mcm5s2U catalyzed by ROL5 and CTU2. In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the tRNA modification activity of the ELP complex underlies its function in plant immunity. Therefore, our study improved our understanding of the ELP complex in plant immunity. We have deleted the words “new” and “novel” throughout the manuscript.

      Reviewer #3 (Significance): The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity. However, the authors did not acknowledge the existing results about the Elongator complex that functions in the same pathway in modifying tRNA. The involvement of Elongator in plant immunity has been well established. The cause-effect relationship between tRNA modification and plant immunity is difficult to demonstrate.

      Response: We think that the cause-effect relationship between the activities of the ELP complex and plant immunity is difficult to demonstrate because the ELP complex has several distinct activities other than tRNA modification. However, since the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation, the cause-effect relationship between tRNA thiolation and plant immunity is clear, which indicated that the tRNA modification activity of the ELP complex contributes to plant immunity.

    1. Reviewer #3 (Public Review):

      Summary:<br /> Previous studies suggest that humans may infer objects' stability through a world model that performs mental simulations with a priori knowledge of gravity acting upon objects. In this study, the authors test two alternative hypotheses about the nature of this a priori knowledge. According to the Natural Gravity assumption, the direction of gravity encoded in this world model is straight downwards as in the physical world. According to the alternative Mental Gravity assumption, that gravity direction is encoded in a Gaussian distribution, with the vertical direction as the maximum likelihood. They present two experiments and computer simulations as evidence in support of the Mental Gravity assumption. Their conclusion is that when the brain is tasked to determine the stability of a given structure it runs a mental simulation, termed Mental Gravity Simulation, which averages the estimated temporal evolutions of that structure arising from different gravity directions sampled from a Gaussian distribution.

      Weaknesses:<br /> In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint. Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event? The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

    1. We’ve chosen to keep highlights private to avoid pages being cluttered by highlights that have no surrounding discussion. We understand that people may want to share highlights with others, and we think there are effective ways we can address that in the future.

      It's important to strike a balance between individual privacy and the desire for social sharing. Privacy concerns are valid, but so is the desire to share valuable or interesting highlights with others. Consider implementing privacy settings that allow users to control who can see their highlights, such as making them visible to specific friends or groups.

    1. At first, as we searched forrelevant studies only in English language, other potential studies written in differentlanguages were not included in our review.

      I think it is important to note a barrier such as this that not is not often thought of. There could be plenty of findings accomplished by research but if barriers are overlooked or disregarded then the most accurate findings may never be found. -CR, CM, KS

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their insightful and detailed analysis of our work, in particular to reviewer 2. We also would like to thank the Elife editorial team for organizing this form of public review and debate, which we believe will be of interest to the science community.

      Reviewer #1 (Public Review):

      Despite durable viral suppression by antiretroviral therapy (ART), HIV-1 persists in cellular reservoirs in vivo. The viral reservoir in circulating memory T cells has been well characterized, in part due to the ability to safely obtain blood via peripheral phlebotomy from people living with HIV-1 infection (PWH). Tissue reservoirs in PWH are more difficult to sample and are less well understood. Sun and colleagues describe isolation and genetic characterization of HIV-1 reservoirs from a variety of tissues including the central nervous system (CNS) obtained from three recently deceased individuals at autopsy. They identified clonally expanded proviruses in the CNS in all three individuals.

      Strengths of the work include the study of human tissues that are under-studied and difficult to access, and the sophisticated near-full length sequencing technique that allows for inferences about genetic intactness and clonality of proviruses. The small sample size (n=3) is a drawback. Furthermore, two individuals were on ART for just one year at the time of autopsy and had T cells compatible with AIDS, and one of these individuals had a low-level detectable viral load (Figure S1). This makes generalizability of these results to PWH who have been on ART for years or decades and have achieved durable viral suppression and immune reconstitution difficult.

      While anatomic tissue compartment and CNS region accompany these PCR results, it is unclear which cell types these viruses persist in. As the authors point out, it is possible that these reservoir cells might have been infiltrating T cells from blood present at the time of autopsy tissue sampling. Cell type identification would greatly enhance the impact of this work. Several other groups have undergone similar studies (with similar results) using autopsy samples (links below). These studies included more individuals, but did not make use of the near-full length sequencing described here. In particular, the Last Gift cohort, based at UCSD and led by Sara Gianella and Davey Smith, has established protocols for tissue sampling during autopsy performed soon after death. https://pubmed.ncbi.nlm.nih.gov/35867351/ https://pubmed.ncbi.nlm.nih.gov/37184401/

      We agree with reviewer 1 that studies to identify specific cell types that harbor intact HIV-1 in individual tissue compartments would be very informative; our group has recently initiated such studies.

      Overall, this small, thoughtful study contributes to our understanding of the tissue distribution of persistent HIV-1, and informs the ongoing search for viral eradication.

      We thank reviewer 1 for these encouraging remarks.

      Reviewer #2 (Public Review):

      The manuscript by Sun et al. applies the powerful technology of profiling viral DNA sequences in numerous anatomical sites in autopsy samples from participants who maintained their antiviral therapy up to the time of death. The sequencing is of high quality in using end-point dilution PCR to generate individual viral genomes. There is a thoughtful discussion, although there are points that we disagree with. This is an important data set that increases the scope of how the field thinks about the latent reservoir with a new look at the potential of a reservoir within the CNS.

      We greatly appreciate the comments by reviewer 2 and would like to thank them for their detailed and very knowledgeable analysis of this paper.

      1) The participants are very different in their exposure to HIV replication and disease progression. Participant 1 appears to have been on ART for most of the time after diagnosis of infection (16 years) and died with a high CD4 T cell count. The other two participants had only one year on ART and died with relatively low CD4 T cell counts (under 200). This could lead to differences in the nature of the reservoir. In this regard, the amount of DNA per million cells appears to be about 10-fold lower across the compartments sampled for participant 1. Also, one might expect fewer intact proviruses surviving after 16 years on ART compared to only 1 year on ART. The depth of sampling may be too limited and the number of participants too few to assess if these differences are features of these participants because of their different exposures to HIV replication. On the positive side, finding similarities across these big differences in participant profiles does reinforce the generalizability of the observations.

      Many thanks for pointing this out. We also noticed that the total number of HIV-1 proviruses is smaller in our study participant 1 (who had been on ART for 16 years), compared to study persons 2 and 3 with more limited treatment durations (1-2 years), however, due to the small number of study persons, we think we cannot use these results for inferring how treatment duration influences viral reservoir size in tissues.

      2) The following analysis will be limited by sampling depth but where possible it would be interesting to compare the ratio of intact to defective DNA. A sanctuary might allow greater persistence of cells with intact viral DNA even without viral replication (i.e. reduced immune surveillance). Detecting one or two intact proviruses in a tissue sample does not lend itself to a level of precision to address this question, but statistical tests could be applied to infer when there is sampling of 5 or more intact proviruses to determine if their frequency as a ratio of total DNA in different anatomical sites is similar or different. This would allow adjustment for the different amount of viral DNA in different compartments while addressing the question of the frequency of intact versus defective proviruses. One complication in this analysis is if there was clonal expansion of a cell with an intact genome which would represent a fortuitous overrepresentation intact genomes in that compartment.

      We have performed the analysis suggested by reviewer 2 and included a diagram reflecting the ratio of intact/defective proviruses as a new supplemental figure (Figure S2). Unfortunately, we do not feel comfortable to draw any real conclusions from this additional analysis; the sample sizes are simply too limited.

      3) The key point of this work is that the participants were on therapy up to the time of death ("enforcing" viral latency). The predominance of defective genomes is consistent with this assumption. Is there data from untreated infections to compare to as a signature of whether the viral DNA population was under selective pressure from therapy or not? Presumably untreated infections contain more intact DNA relative to total DNA. This would represent independent evidence that therapy was in place.

      We agree that an analysis of autopsy samples from untreated persons living with HIV-1 would be of great interest, and are actively collaborating with neuropathologists from multiple sites to obtain such samples. Yet, we are not convinced that selection pressure on reservoir cells during ART can be appropriately identified through quantitative virological assays. Rather, we feel that the selection of proviruses can be best assessed when qualitative parameters, including proviral integration sites and their position relative to host epigenetic chromatin features, are evaluated.

      4) There are several points in Figure 5 to raise about V3 loop sequences. The analysis includes a large number of "undetermined" sequences that did not have a V3 loop sequence to evaluate. We would argue it is a fair assumption that the deleted proviruses have the same distribution of X4 and R5 sequences as the ones that have a V3 sequence to evaluate. In this view it would be possible to exclude the sequences for which there is no data and just look at the ratio of X4 and R5 in the different compartments, specifically does this ratio change in a statistically significant way in different compartments? The authors use "CCR5 and non-CCR5" as the two entry phenotypes. The evidence is pretty strong that the "other" coreceptor the virus routinely uses is CXCR4, and G2P is providing the FPR for X4 viruses. Perhaps the authors are trying to create some space for other coreceptors on microglia, but we are pretty sure what they are measuring is X4 viruses, especially in this late disease state of participant 2. Finally, we have previously observed that the G2P FPR score of <2 is a strong indicator of being X4, FPR scores between 2 and 10 have a 50% chance of being X4, and FPR scores above 10 are reliably R5 (PMID27226378). In addition, we observed that X4 viruses form distinct phylogenetic lineages. The authors might consider these features of X4 viruses in the evaluation of their sequences. Specifically, it would be helpful to incorporate the FPR scores of the reported X4 viruses.

      Many thanks for these thoughts. We have now included FPR scores for all sequences and considered sequences with FPR score <2 as X4-tropic. Among 497 proviral sequences derived from all three participants, only 14 proviral sequences had FPR scores between 2 and 10 and their tropism was classified as CCR5 in the new Figure 5. We agree that viral tropism analysis of proviral sequences from the CNS would be of particular interest for study subject 2; however, most brain-derived sequences from that person had large deletions in the env region, precluding an analysis of viral tropism.

      5) We have puzzled over the many reports of different cell types in the CNS being infected. When we examined these cell types (both as primary cells and as iPSC-derived cells), all cells could be infected with a version of HIV that had the promiscuous VSV-G protein on the virus surface as a pseudotype. However, only macrophages and microglia could be infected using the HIV Env protein, and then only if it was the M-tropic version and not the T-tropic version (PMID35975998). RNAseq analysis was consistent with this biological readout in that only macrophages and microglia expressed CD4, neurons and astrocytes do not. From the virology point of view, astrocytes are no more infectable than neurons.

      We appreciate these comments. As described in our discussion, we agree that the role of astrocytes as target cells for HIV-1 infection is highly controversial; we look forward to future opportunities to evaluate HIV sequences in sorted astrocytes from autopsy tissues.

      6) The brain gets exposed to virus from the earliest stages of infection but this is not synonymous with viral replication. Most of the time there is virus in the CSF but it is present at 1-10% of the level of viral load in the blood and phylogenetically it looks like the virus in the blood, most consistent with trafficking T cells, some of which are infected (PMID25811757). The fact that the virus in the blood is almost always T cell-tropic in needing a high density of CD4 for entry makes it unlikely that monocytes are infected (with their low density of CD4) and thus are not the source of virus found in the CNS. It seems much more likely that infected T cells are the "Trojan Horse" carrying virus into the CNS.

      We appreciate the reviewer’s referral to Greek mythology and agree that the hypothesis of infected T cells acting as “Trojan horses” is more intuitive and better supported by available data. We have adjusted our discussion accordingly.

      7) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml, as they are in our study subject 3. Nevertheless, we have changed the title to avoid confusion.

      Reviewer #1 (Recommendations For The Authors):

      I encourage the authors to compare their autopsy and tissue sampling procedures to those used by The Last Gift researchers and consider including references to this ongoing study. If the authors plan to continue in this line of research, the field would greatly benefit from a collaboration that would bring together their excellent and advanced PCR technique with the larger sample size offered by The Last Gift. Lastly, is there some way to simultaneously determine cell type when NFL sequencing is performed?

      We look forward to collaborating with investigators from the Last Gift Cohort in the future and have integrated additional references in the manuscript to acknowledge their work. At the current stage of technology development, we think that sorting of infected cells based on canonical markers of defined cell populations is the preferred approach for identifying phenotypic properties of infected cells; however, expansion of the PheP-Seq assay (Sun et al., Nature 2023), may facilitate this process in the future.

      Reviewer #2 (Recommendations For The Authors):

      1) The authors have chosen to lump all R5 viruses together in terms of their entry phenotype, giving all viruses an equal chance of infecting all potentially susceptible cell types. This ignores the fact that normal HIV is selected to infect cells, requiring a high density of CD4 as is found on T cells. We use the term R5 T cell-tropic to describe "normal" HIV. The ability to efficiently enter cells that have a low density of CD4, such as macrophages and microglia, involves the evolution of a distinct phenotype, termed macrophage tropism (PMID24307580, and work of others). This happens most often in the CNS where T cells are infrequent thus potentiating evolution to infect an alternative cell type. This change in entry phenotype is dramatic and, like X4 viruses, results in phylogentically distinct lineages (PMID22007152). There are no sequence signatures for M-tropic viruses as there are for X4 viruses, but the fact that there are sequences shared between the CNS and lymphoid tissue makes it much more likely that there are T cells migrating around the body, including into the CNS, that are carrying R5 T cell-tropic virus with them, with the cells potentially clonally expanding in situ in the CNS. The persistence of a potential CNS T cell reservoir was the point we were trying to make in our recent paper (ref. 38), not only that these CSF rebound viruses were R5 viruses but they were selected for replication in T cells as seen by their dependence of a high density of CD4 for entry. This is the conclusion one would reach if clonally expanded viral sequences were shared between two lymphoid compartments. It is not necessary to ascribe properties of infection and clonal amplification to microglia cells when a more parsimonious explanation is that there are low levels of T cells in the CNS, especially in the absence of entry phenotype data showing these sequences encode an M-tropic entry phenotype. As is the authors are just adding to the unproven belief that virus in the CNS must be in myeloid cells, which in this case in particular we suspect is the wrong interpretation.

      We are impressed by reviewer 2’s recent work, suggesting the viral reservoir in the CNS may primarily consist of clonally-expanded R5 T-cell tropic viruses. We have adjusted our discussion to emphasize this possibility, and to highlight that viral entry phenotyping data will be informative for better understanding viral persistence in the brain.

      2) The authors noted that the frequency of intact proviruses is highest in the lymph nodes of 2/2 participants for which they had lymph node samples, relative to the other tissues examined. They thus conclude, "Together, these results indicate that intact HIV-1 proviruses are preferentially detected in lymphoid and gastrointestinal (GI) tissues." However, an examination of Figure 2 reveals that the total HIV copy number is highest in the lymph nodes of these two people. Thus, it doesn't seem like HIV is preferentially intact in the lymph nodes as much as they sampled more provirus from that tissue and therefore were able to detect more intact proviruses.

      We have adjusted our manuscript to indicate that the highest numbers of intact HIV-1 proviruses were present in lymph nodes, both in terms of absolute numbers and after normalization to the total numbers of cells analyzed.

      3) In Figure 1A, the legend should be changed so that "PMSC" is spelled out as "premature stop codon" for ease of reading. This is done for Figure 1B.

      We have corrected this issue as suggested by the reviewer.

      4) The pie charts in Figure 5 could be better labeled for ease of interpreting. In Figure 5C, instead of just labeling it as "P2" it could be "Distribution of CXCR4-using proviruses, P2", as an example. As it stands, it is hard to know what the figure is describing without reading the text.

      We have changed this accordingly.

      5) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml. Nevertheless, we have changed the title to avoid confusion.

      Editorial comments:

      In addition to the reviewers suggestion, we feel that adding more information on how you define intact proviral sequence, e.g. are only disrupted essential genes or also in accessory genes considered? Previous studies have shown that brain-derived HIV-1 strains are usually CCR5-tropic, show high affinity for the CD4 receptor and frequently contain defective vpu genes. Some information and discussion if the brainderived sequences confirm these previous finding seems of significant interest.

      As described in our previous work (e. g. Lee et al, JCI 2017; Jiang et al, Nature 2020), accessory genes are not considered in our definition of “genome intactness”; this is consistent with approaches other investigators have chosen (e. g. Hiener et al, Cell Reports 2017). Within the genome intact sequences we identified in the CNS in our study persons, we found no evidence for deletions of vpu sequences; this has been emphasized in the revised manuscript.

    1. Author Response

      We thank the reviewers and editors for their deep, thoughtful and constructive assessment of our manuscript. We nevertheless would like to reply to the Reviewers reports.

      Reviewer #1.

      (...) The data can be well described by three components involving a closed state and two open states O1 and O2, in which the second component O2 is the one affected by the mutations and deletions

      This statement is not completely clear to us. What we propose is that O1 is not visible in WT, only in the mutants. What would be affected is the access to O1 and the transition between O1 and O2, but not O2 itself.

      From the beginning, it becomes challenging for non-experts to grasp the structural basis of the perturbations that are introduced (ΔPASCap and E600R), because no structural data or schematic cartoons are provided to illustrate the rationale for those deletions or their potential mechanistic effects. In addition, the lack of additional structural information or illustrations, and a somewhat confusing discussion of the structural data, make it challenging for a reader to reconcile the experimental data and mathematical model with a particular structural mechanism for gating, limiting the impact of the work.

      Thank you very much for pointing this out and our apologies for the missing cartoon. It will be provided in the revised version.

      There are several concerns associated with the analysis and interpretations that are provided. First, the conductance-voltage (G-V) relations for the mutants do not seem to saturate, and the absolute open probability is not quantified for any mutant under any condition. This makes it impossible to quantitatively compare the relative amplitudes of the two components because the amplitude of the second component remains undetermined. […] This reduces confidence in the parameters associated with G-V relations, as the shape and position of both components might change significantly if longer pulses were used.

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data therefore supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      Further, because the mutant channel currents do not saturate at the most positive potentials and time intervals examined, the kinetic characterization based on reaching 80% of the maximum seems inappropriate, because the 100% mark is arbitrary.

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). To address the concerns, we will add time constants from these fits in the revised version. Please note that in Figure 3, we do provide time constants, and they support the statement made.

      Further, the kinetics for some of the other examined mutants (e.g. those in Fig. 2A) are not shown, making it difficult to assess the extent to which the data could be affected by having been measured before full equilibration.

      This seems to be a misunderstanding. ∆2-10 kinetics is shown in Fig. 2c. ∆-eag is shown in Fig. 3. We will make sure to state this explicitly in the revised version.

      For example, I would expect that the enhanced current amplitudes from Figure 5 are only transient, ultimately reaching a smaller steady-state current magnitude that depends only on the stimulation voltage and is independent of the pre-pulse. The entire time course including the rise-time and decay is not examined experimentally. This raises concern on whether occupancy of state O1 might be overestimated under some experimental conditions if a fraction of the occupancy is only transient. The mathematical model is not utilized to examine some of these slower relaxations - this may be because the model does not reproduce these slow processes, which would represent a serious shortcoming given that the slow kinetics appear to be intrinsic to transitions around state O1.

      Thank you for thinking so deeply about the problem. We identified the same questions and did explore them using the model (Figure 8 c). Your intuition is confirmed there, the slow kinetics leads to a decrease of O1 occupancy after a transient accumulation. We intend to study this experimentally as well in the revised version.

      The significance of the results with the Δ2-10.L341Split is unclear. First, structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 linker, and thus the Split construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both state O1 and O2 require voltage sensor activation, it is unclear why the Split construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states.

      Thank you for pointing out the unclear nature of our arguments. We rephrase in the following and will do so in the revised document: If, in non-split mutants, the upward transition of S4 allows entry to O1, it is reasonable to assume that the movement is not transmitted the same way in the split and the transition into O1 is less probable. The observation that, in the split, entry into O1 requires higher depolarization and appears to be less likely, suggests that downstream of S4 (beyond position 342), there is a mechanism to convey S4 motion to the gate of the mutants.

      The figure legends and text do not describe which solutions exactly were utilized for each experiment, [...] Because no zero-current levels are shown on the current traces, it becomes very hard to determine which voltages correspond to each of the currents (see Fig. 1A).

      Will be corrected.

      … the rationale for choosing some solutions over others is not properly explained. […] The reversal potential for solutions used to measure voltage-activation curves falls right at the spot where occupancy of the first component peaks (e.g. see Figure 1B). […] It is unclear whether any artifacts could have been introduced to the mutant activation curves at voltages close to the reversal potential.

      The high potassium extracellular solution was chosen to obtain tail currents of sufficient size, warranting precise determination of the reversal potential for every individual experiment. In this way, we ensured that there were no artifacts introduced to the activation curves. Tail currents were used when closing was reasonably fast (∆PASCapL322H and E600RL322H), but otherwise, we used the amplitude at the end of the pulse to get the reversal potential.

      One key assumption that is not well-supported by the data pertains to the difference in single-channel conductance between states O1 and O2 - no analysis or discussion is provided on whether the data could also be well described by an alternative model in which O1 and O2 have the same conductance. No additional experimental evidence is provided related to the difference in conductance, which represents a key aspect of the mathematical model utilized to interpret the data.

      We agree that the relative conductance of O1 and O2 is a key point. Our proposal mainly stems from the data presented in Fig. 4 and the amplitudes of the two components of the tail at potentials where both states are visible. We also agree that whole cell currents represent a product of occupancy and conductance and that only single channel recordings can produce unambiguous proof for the higher conductance of O1. We have embarked on a series of experiments directly addressing this in the mutants that will be reported in the revised version. Still, we did explore this issue with the model. Following the path of the least number of assumptions, we initially tested models with equal conductance for both states. None of these models was able to reproduce the shape of the tails and the prepulse-dependent increase.

      The CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional non-specific effects on the oocytes that could affect the results.

      Thank you for the appreciative comments about the relevance of our results. We are aware of the potential side effects of the use of thapsigargin and ionomycin, but we still used this approach as an established method to raise intracellular Ca2+. This said, we would like to point out that the effects of Ca2+ increase on channel behavior do revert with a time course that mirrors the estimated time course of Ca2+ itself (supplement 1 to figure 7), suggesting that we are monitoring a Ca2+-dependent event.

      The description of the mathematical model that is provided is difficult to follow, and some key aspects are left unclear, such as the precise states from which state O1 can be accessed, and whether there is any direct connectivity between states O1 and O2 - different portions of the text appear to give contradictory information regarding these points.

      This seems to be a misunderstanding: supplement 1 to figure 8 graphically details the model’s layout and explicitly shows the connections to the two open states. It also shows that these are not connected. We will make sure that the text is more clearly stating this fact. We did explore models with one open state connected to more than one other state (loops) and found that none of these models can reproduce the large range of depolarizations for with conductance is reduced as compared to lower and higher depolarization (Figure 1).

      Several rate constants other than those explicitly mentioned to represent voltage sensor activation are also assigned a voltage dependence - the mechanistic basis of that voltage dependence is unclear.

      Some fundamental properties we observed in the mutants can be explained with constant, voltage-independent rate constants into and out of both open states. Specifically, it was possible to achieve behavior very close to that displayed in Figure 8c with constant η, θ, ε, and ζ. We then attempted to also reproduce the strong prepulse-dependence (Figure 6A and B) and found that we needed additional degrees of freedom to incorporate both behaviors with one parameter set. We could either add more states, and thereby rates, or introduce voltage dependence to η and θ. With already 32 states and 10 rates, we decided to adopt the less complex model variant. We agree that this probably reduced the interpretability of the model. As a rule, a transition with a voltage-dependence of the functional form of Eq.1 corresponds to the kinetic properties of two or three transitions, where one is voltage-independent (setting the maximal rate) and one has the classical exponential shape expected from truly molecular transitions.

      We also agree that, conceptually, the transitions between the two layers – tentatively associated with a transition in the ring structure– should be voltage-independent. Interestingly, their voltage dependence is very similar to the voltage dependence of the early activation, i.e. centered at -100 and -120mV, similar to β. We therefore attempted to replace the voltage dependence of κ and λ with a state-dependence. To this end, we introduced a parameter that modified κ and λ depending on the state’s position along the α-β axis. While it seemed possible to include all desired features in a model with state-dependent κ and λ, it proved extremely difficult to tune the parameters. Eventually, we reverted to purely voltage-dependent and not state-dependent transition rates κ and λ. Nevertheless, we believe that their voltage dependence could be replaced by some form of state-dependence, i.e. by rates κ and λ that change systematically from the left-hand side of the scheme to its right-hand side.

      Finally, a clear mechanistic explanation for the full range of effects that the ΔPASCap and E600R mutants have on channel function is lacking, as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel.

      We agree. Ultimate mechanistic explanations will have to await data from protein structures of intermediate states and in particular the mutant-specific open state.

      …as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel; this latter point is important when considering whether the findings in the manuscript advance our understanding of the gating mechanism of Kv10 channels in general, or are specific to the particular mutants that are studied.

      We still do not know if the transitions to O1 are identical in the mutants and WT, although our data opens the path to dissecting the interplay of intracellular domains and voltage sensor. We think that the results are relevant for KCNH channels in general because we have made visible otherwise invisible states.

      It is unclear, for example, how both the mutation or the deletion at the cytoplasmic gating ring enable conduction by state O1, especially when considering the hypothesis put forward in this study that transition to O1 exclusively involves transitions by the voltage sensor and not the cytoplasmic gating ring.

      The transition to O1 is in our model made possible by a displacement of the voltage sensor. In our view, when this occurs with a properly folded and positioned intracellular ring, permeation (access to O1) is precluded. It is precisely the distortion in the intracellular ring induced by mutation or deletion what allows access to O1.

      It is also not clearly described whether a non-conducting state with the equivalent state-connectivity as O1 can be accessed in WT channels, or if a state like O1 can only be accessed in the mutant channels. Importantly, if a non-conducting state with the same connectivity to O1 were to be accessed in WT channels, it would be expected that an alternating pulse protocol as in Fig. 4 would result in progressively decreasing currents as the occupancy of the non-conducting state equivalent to O1 is increased. Because this is not the case, it means that mutation and deletion cause additional perturbations on the gating energetics relative to WT, which are not clearly fleshed out.

      Thank you for highlighting this important question. Following the arguments in the answer to the previous comment, our experiments cannot provide proof for the existence or accessibility of O1 in WT channels. We favor the interpretation that it is not accessible, because, as you point out, this is supported by the outcome of the alternating pulse on WT (figure 4A) and the paradoxical effect of CaM activation. However, this interpretation hinges on the hypothesis that the kinetics of entry into and departure from O1 would be the same in WT channels, as it is in the mutants. Because transitions into a non-conducting O1 would be only indirectly observable in the WT channel, this assumption would be extremely difficult to test.

      Reviewer #2.

      WT EAG currents are far right shifted compared to previously published data. It is not clear whether it is the recording conditions but at 0 mV very few channels are open. Compare this with recordings reported previously of the same channel hEAG1 by Gail Robertson's lab (Zhao et. al. (2017) JGP). In that case, most of the channels are open at 0 mV. There must be at least 25 mV shift in voltage-dependence. These differences are unusually large.

      G-V curves presented in the literature show a large variability. Depending on the conditions, reported V1/2 values in Xenopus oocytes range from -43 mV (Schönherr et al., 2002 DOI: 10.1016/s0014-5793(02)02365-7) to +16 mV (Lörinczi et al, 2015 DOI: 10.1038/ncomms7672) through +4.1 mV (Lörinczi et al., 2016 DOI: 10.1074/jbc.M116.733576), or +10 mV (in the IUPHAR database). The results in the current manuscript are not significantly different from our previously published results on WT channels. In the report the reviewer is referring to, one source of the difference could be that Zhao et al. had no independent information about the reversal potential. In our experiments, we used solutions with high [K]ext. This places the reversal potential in a voltage range within measurable eag currents and thus allows direct determination of the reversal potential, together with the slow kinetics of the tails and the negative shift in the activation. We would argue that this makes the G-V curves less prone to assumptions, albeit for the price of large error bars around the reversal potential. Additionally, the presence of Mg2+ in the extracellular solutions can change the apparent V1/2 depending on the stimulation protocol.

      In most of the mutants, O2 state becomes more prevalent at potentials above +50 mV. At these potentials, endogenous voltage-dependent currents are often observed in xenopus oocytes. The observed differences between the various mutants might simply be a function of the expression level of the channel versus endogenous currents.

      Because we were aware of the potential issue of endogenous chloride currents in oocytes, we included data recorded in chloride-free solutions. Those show comparable results, and thus we conclude that endogenous currents are not the origin of the differences between mutants. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      Voltage-dependence of the kinetics of WT currents appears a bit strange. Why is the voltage-dependence saturated at 0 mV even though very few channels have activated at that point? I cannot imagine any kinetic model that can lead to such unusual voltage-dependence of kinetics.

      The fact that voltage dependence of open probability and voltage dependence of activation time constant do not align reflects the multi-state nature of the underlying gating scheme. More than one of several sequential transitions limit the overall kinetics. In this case, the apparent kinetics can reflect a different “bottleneck” transition at different voltage ranges.

      One of the other concerns I have is that in many cases, it is clear that the pulse is too short to measure steady-state voltage-dependence. For instance, the currents in -160 mV and -100 mV in Figure 6A and 6B are not saturated.

      While we agree that steady-state curves can simplify quantitative evaluation – especially the normalization applied in the I/Imax curves in figure 6 – the conclusion of two components is independent of the absolute amplitude under steady state. The fact that in the raw current traces in Figure 6A, after a -160V prepulse, the same current amplitude is reached for two depolarizations (60 and 90 mV) but not for the intermediate depolarization, can only be explained by an I-V curve that has a minimum. Therefore, the raw data directly support the evidence of finding two components, even if the subsequent analysis is affected by insufficient test pulse durations.

      Reviewer #3

      Although very well established, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. The authors performed most of their functional studies in Cl-based solutions that can become a non-trivial issue when the range of voltages explored extends to very depolarizing potentials such as +120mV. Oocytes endogenously express Ca2+-activated Cl- channels that will rectify Cl- at very depolarizing potentials -due to an increase in the driving force- and contribute dramatically to the current's amplitude observed at the test pulse in the voltage ranges where the authors identify the second open state.

      As stated above, because we were aware of the potential issue of endogenous chloride currents in oocytes, we performed many of the experiments in chloride-free solutions. We conclude that endogenous currents are not the origin of the differences between mutants because the results were comparable regardless of the presence of chloride. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      The authors propose a two-layer Markov model with two open states approximating their results. However, the results obtained with the mutants suggest an inactivated state accessible from closed states and a change in the equilibrium between the close/inactivated/open states that could also explain the observed results; therefore, other models could approximate their data.

      In the process of model development, we tested a large number of configurations. Those included models with a single open state which we connected to two closed (or inactivated) states that were not directly connected to each other and populated at different voltage ranges. In doing so, we attempted to allow access to the single open state from different regions of the “state-space”, reflecting the two voltage ranges of high conductance. However, in our hands, such a “loop” in the state-space inadvertently leads to a weak separation of the two states and a weak effect of prepulse potentials. The underlying reason is that given the short activation and deactivation time constants, a single open state in a loop provides an effective short-cut, linking otherwise separated parts of the state-space. To achieve the clear separation of the two component’s voltage dependence, two open states that are not connected to each other were essential. As we wrote in response to other comments above, the ultimate proof of two different open states cannot come from modeling, but from single channel measurements.

    1. AbstractWhile Bacterial Artificial Chromosomes were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.93), and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Po-Hsiang Hung **

      Are all data available and do they match the descriptions in the paper?

      No. The dataset in FTP includes all the Bac sequences and the restriction enzyme recognition sites in csv files. However, I could not find the database of pairs of BACs, which have overlaps generated by restriction enzymes that linearize the BACs. The makePairs function gave me an error when I tried running it locally, so I was not able to verify what is in these datasets. Personally, I find this function to be one of the most useful features described in this manuscript.

      Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide

      Yes. This manuscript contains the necessary minimal information (Submitting author, Author list, Dataset title, Dataset description, and Funding information)

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      No. The authors provide their code in GitHub such that researchers can download the datasets and analyze the sequences locally. However, I felt that the descriptions in the readme.md file is often insufficient to reproduce the data presented in the manuscript, especially for researchers with little to no programming experience. Detailed information includes examples of how to use each function, the input format, and the location of the output folder/files. I also encountered software version issues during the installation of bacmapping. Please re-test the code in a new environment and describe all the versions of each software. For instance, I found Python version 3.11 is incompatible with this package while Python version 3.7 is compatible.

      Is there sufficient data validation and statistical analyses of data quality?

      No. The author used the BioRestriction class from Biopython to get the digestion site information. No extra validation is conducted in this manuscript. Due to the errors I encountered in re-running the code (see details in Any Additional Overall Comments to the Author), an independent method for checking several digestion sites in some Bac clones is suggested. The suggested independent method is to do enzyme digestion on some Bac clones or upload some Bac sequences to other software and compare the digestion sites.

      In the output files that contain the digestions sites for each enzyme, some of the enzyme digestion sites are either NA or []. What is the difference between the two? If they mean the same thing (no cutting by the enzyme), bugs or other coding errors may cause this inconsistency. Please check the code again and also verify some of them using the independent methods suggested above. Examples of this issue are the files in maps>sequenced>CEPHB. Here I list two enzymes that show different results in each file: 3.csv : Ragl ([]), SchI (NA) 6.csv: EspEI (NA), AccII([]) 13.csv: EcoT22I ([]), Hsp92II (NA) X.csv: PacI ([]), AcIWI (NA)

      Is the validation suitable for this type of data?

      No. No validation in this manuscript. See the answer above.

      Additional Comments: The authors make a database with enzyme digestion site information of Bac clones to help people to use the Bac clones for further usage. I think it is useful to have this information and also have the code to do further analysis locally. Thus, I think providing a very detailed user manual (or readme.md) is very important to help people use this dataset. Below I summarized the issues I encountered in running codes and also some suggestions. Major points: (1) I tested some bacmapping functions, and I discovered that some functions are not working as intended due to typos/bugs - The version of the software is required to help people properly install this package - Refining the code and also providing a better user manual is very helpful for people without a lot of coding experience to use it. The detailed information includes examples of how to use each function, the input format, and the location of the output folder/files. Descriptions for some functions in the readme file are not detailed enough and often do not describe what the input needs to be. For example, getCuts() require ‘row’ as input. But the author never gives a detailed description of what ‘row’ is in the readme file. I had to look in bacmapping.py to understand what ‘row’ is. If a function requires the variable ‘row’, show a few examples of how ‘row’ can be extracted from the proper input file. - mapPlacedClones() requires an input file (‘/home/eamon/BACPlay/longboys.csv’, line 335) that is located in the author’s local computer and is not available through github. - Typo in line 814 in getMap(). Should be: name = cloneLine[‘CloneName’] - Inconsistency in output variable type in getMap() (line 830 and 851). When local == ‘sequenced’, the output variable is a tuple, which causes issues in downstream functions such as getRestrictionMap() (line 869). (2) Add pairs of BACs into the dataset (3) The output file of digestion sites of each enzyme, some of the enzyme digestion sites showed NA or [ ]. Please double-check this and explain the differences (4) Validation of an independent method for the digestion map is suggested

      Minor points: (1) Add a title to each column of sequencedStats.csv is useful for understanding the table easier

      Re-review:

      The authors have addressed majority of my points. The software installation works great after considering version control. The updated read.me provide detailed information for each function and their required input variables, and the examples in jupyter notebook are a great help for running the code. I did, however, encounter two minor errors when I tested the Ch19_bacmapping_example.ipynb on a Mac system. Please check this and update it.

      (1)The .DS_store file that is automatically generated on a Mac system in the bacmapping/Examples/Ch19_example/maps/placed folder causes an error when running bmap.mapPlacedClones(cpustouse=cpus, chunk_size=chunksize). The same problem happened when I ran bmap.mapSequencedClones(cpustouse=cpus). After I deleted .DS_store in the folder, the code worked.

      Here is the error message when I ran bmap.mapSequencedClones(cpustouse=cpus). NotADirectoryError: [Errno 20] Not a directory: '/Users/user_nsame/bacmapping/Examples/Ch19_example/maps/sequenced/.DS_Store'

      (2) The second error is from running bmap.getRestrictionMap(name,enzyme). I got the error message, 'list' object has no attribute 'item'. I was able to run this function after changing maps[enzyme].item() to maps[enzyme] in line 779 of bacmapping.py. I encountered the same error with the drawMap function. I was able to run to run this function after changing line 847 of bacmapping.py from rmap = maps[nenzyme].item() to rmap = maps[nenzyme].item().

      Here is the error message

      AttributeError Traceback (most recent call last) Cell In[20], line 5 3 maps = bmap.getMaps(name) 4 #print(maps) #this is a big dataframe of all the maps, uncomment to check it out ----> 5 rmap = bmap.getRestrictionMap(name,enzyme) 6 print('Sites in ' + name + ' where ' + enzyme + ' cuts: '+ str(rmap)) 7 plt = bmap.drawMap(name, enzyme)

      File ~/miniconda3/envs/bacmapping/lib/python3.11/site-packages/bacmapping/bacmapping.py:779, in getRestrictionMap(name, enzyme) 777 maps = getMaps(name) 778 nenzyme, r = getRightIsoschizomer(enzyme) --> 779 return(maps[nenzyme].item())

      AttributeError: 'list' object has no attribute 'item'

      **Reviewer 2. Wei Dong **

      Is there sufficient data validation and statistical analyses of data quality? Not my area of expertise

      Is the validation suitable for this type of data? I am not sure about this.This is not my specialty.

      Overall comments: This is a great idea, fully exploring, integrating, and utilizing existing data for new research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find enclosed our revised manuscript entitled “An unconventional gatekeeper mutation sensitizes inositol hexakisphosphate kinases to an allosteric inhibitor”. We would like to thank the editorial team and the reviewers for carefully reading the manuscript and for raising a number of valuable points. We have included additional data and discussion to address the questions raised. Please find the point-by-point responses below.

      Reviewer #1:

      1) While I understand that FMP-201300 is a tool (proof-of-concept) compound it would be useful to know if it has activity against IP6K1 (or IP6K2) in cells.

      We were of course curious about this as well. Unfortunately, our attempts to generate cell lines in which IP6K1 or IP6K2 carry the gatekeeper mutation using CRISPR/Cas editing have not been successful so far. Nevertheless, to obtain information on the permeability and cellular activity of FMP-201300, we decided to treat wt cells, since the compound also inhibited IP6K1-wt and IP6K2wt at higher concentrations.

      In a previous study, we could show that reduced intracellular 5PP-InsP5 levels lead to a decrease in rRNA synthesis (https://doi.org/10.1101/2022.11.11.516170). We now repeated this experiment with FMP-201300, along-side the known IP6K inhibitors TNP and SC-919, and could show that FMP-201300 it is able to reproduce this phenotype, strongly suggesting it is capable to diffuse through the cell membrane and act on IP6Ks. We have included this data as a new Figure (Figure S10) and in the discussion part of the manuscript.

      2) Did the authors try docking studies to gain insight into the binding site of FMP-201300?

      The reviewer raises an important point, and we indeed strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K.

      Upon consulting with our colleagues with expertise in docking and molecular dynamics simulations, we believe that MD simulations would need to be performed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

      3) Regarding the SAR, it would be useful to know if both carboxylic acids are required for allosteric inhibition.

      Given the available data, it appears very likely that both carboxylic acids are required for the inhibitor to unfold its potency. Compound A2, which only contained one carboxylate group, showed drastically reduced potency. We have altered the text in the main manuscript to get this point across more clearly.

      4) It would be helpful if the authors presented a model for how they think the Leu210 to Valine mutation sensitizes IP6K1 to FMP-201300.

      We agree that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism.

      Minor:

      1) Figure 4: The authors should use the same units in panels a and b.

      Thank you for pointing this out, the figure was edited accordingly.

      2) In the supplementary Excel file, it would be helpful to include a tab that contains a legend.

      A contents page was added to help describe the layout of the supplementary Excel file.

      Reviewer #2:

      Overall, this is an excellent study of high quality. The identified FMP-201300 has the potential for further compound and probe development. My only minor comment is that the authors could spend more time discussing the proposed allosteric binding mode of FMP-201300 and provide more detailed figures to highlight the proposed interactions with the protein and the conformational changes that must ultimately take place to accommodate the allosteric modulator. I appreciate that the co-crystallization experiments did not yield bound inhibitor structures, but perhaps the authors could consider MD simulations to complete their study. However, that could be a story in itself and should not be a must for the publication of this great work.

      We agree with the reviewer (and also reviewer 1) that it is important to better visualize the structural factors that play a role in the sensitization towards the compound. We have generated a new Figure 5 (and the old Figure 5 is now Supplementary Figure 9), and added a section to demonstrate how we propose the mutation leads to the sensitization of IP6K1 to FMP-201300. For a better understanding, we have also included a depiction how the mutation already affects the apo structures. Furthermore, we have added some text in the HDX section, to better describe the proposed mechanism. In brief, we propose that the mutation leads to increased flexibility of the region in the mutation, allowing accommodation of FMP-201300 and ATP. These same regions are also the regions that have large decreases in deuterium exchange upon addition of the inhibitor.

      We also appreciate the comment about using computational methods, to predict the binding site (also a remark from reviewer 1). We strongly considered docking studies during the progress of the project. However, given that the HDX-MS data show that the region around the αC-helix becomes much more flexible upon introducing the gatekeeper mutation, we were concerned that docking studies (which would be based on the static wt structure) may not accurately reflect the more dynamic state of the mutated IP6K. As the reviewer points out, MD simulations would likely be needed to obtain a more realistic picture of this protein ligand interaction, which we would like to pursue in the future.

    1. Curatorial Activism” is a term I use to designate the practice of organizing art exhibitions with the principle aim of ensuring that certain constituencies of artists are no longer ghettoized or excluded from the master narratives of art. It is a practice that commits itself to counter-hegemonic initiatives that give voice to those who have been historically silenced or omitted altogether—and, as such, focuses almost exclusively on work produced by women, artists of color, non-Euro-Americans, and/or queer artists. The thesis of my forthcoming book, Curatorial Activism: Towards an Ethics of Curating, takes as its operative assumption that the art system—its history, institutions, market, press, and so forth—is an hegemony that privileges white male creativity to the exclusion of all Other artists. It also insists that this white Western male viewpoint, which has been unconsciously accepted as the prevailing viewpoint, “may––and does––prove to be inadequate not merely on moral and ethical grounds, or because it is elitist, but on purely intellectual ones.” THAMES & HUDSON

      Challenge This article sounds an alarm on issues in regards to representation specifically to curators and critical writing. While the author is intending to bring awareness to the readers to me Maura Reilly words start to read as white saviour. The article continuously gives us the facts to back her disapproval but, it lacks in context. Below are some examples that jumped out at me.

      1. The title asks a general question but the author takes a personal view throughout the article.

      2. Curatorial Activism” is a term I use to designate the practice of organizing art exhibitions with the principle aim of ensuring that certain constituencies of artists are no longer ghettoized or excluded from the master narratives of art. The wording " no longer ghettoized or excluded from the master narratives of art' feels performative. Why use the word ghettoized at all?and follow it with the word master? excluded is suffice. Master could be changed to the great.<br /> https://www.cbc.ca/news/canada/ottawa/words-and-phrases-commonly-used-offensive-english-language-1.6252274

      3. "Theirs is not Affirmative Action curating, it’s intelligent curating" Is this to say a show that was a result of affirmative action can't be intelligent? https://hyperallergic.com/831773/affirmative-action-and-the-art-worlds-white-elites/
      4. Exhibitions like theirs, and others like them––Magiciens de la terre, Documenta 11, The Decade Show, Century City, Sexual Politics, Hide/Seek, En Todas Partes (Everywhere), Ars Homo Erotica, Global Feminisms, Africa Remix, Women Artists: 1550–1950, Sexual Politics, Extended Sensibilities, Witnesses, In a Different Light, Queer British Art: 1867-1967––have helped to radically change the course of art history, for the better. It’s no wonder that most of these exhibitions were highly controversial; counter-hegemonic projects are rarely understood. Here is an example of where I think there is an opportunity to tell the reader how and why she feels these exhibitions changed the course of art history. This is also an example of how throughout the article her narrative is made clear but, we aren't given context.<br /> After reading this article I see it as learning tool. A reminder on how as we move forward it is important to constantly be aware of our language, the language we should/could be using and how words have power regardless of intention.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides insights into the early detection of malignancies with noninvasive methods. The study contained a large sample size with external validation cohort, which raises the credibility and universality of this model. The new model achieved high levels of AUC in discriminating malignancies from healthy controls, as well as the ability to distinguish tumor of origin. Based on these findings, prospective studies are needed to further confirm its predictive capacity.

      However, there are several concerns about the manuscript, which needs to be clarified or modified.

      1) The use of "multimodal model" will definitely increase workload of the testing. From the results of this manuscript, the integration of multimodal data did not significantly outperform the EM-based model. Is this kind of integration necessary? Is that tool really cost-effective? The authors did not convince me of its necessity, advantages, and clinical application.

      To provide further evidence supporting the advantages of using multimodal model (stack model) over EM-based model, we performed the DeLong test and provided data in Table S7 and Figure S6. Our data show that the stack model outperformed the EM-based model, with significantly higher AUC (AUC difference = 0.0286, p<0.0001). Moreover, the stack model exhibited significantly higher sensitivity for detecting cancer patients of five cancer types in both discovery (73.8% versus 59.5%, p<0.0001, Figure S6A) and validation cohort (72.4% versus 61.5%, p=0.0002, Figure S6B) at comparable specificity of > 95%. The number of misclassified cases were lower when using stack model as compared to the EM-based model (Figure S6C and S6D). Strikingly, we observed that the stack model significantly improved the sensitivity for detecting lung cancer patients compared to the EM based model in both discovery (78.5% versus 44.1%, Figure S6A) and validation cohort ( 83.7% versus 55.8%, Figure S6B), indicating that other ctDNA signatures are also the important biomarkers for detecting lung cancer. Therefore, we conclude that the combination of multiple signatures of ctDNA, ie. the multimodel approach, could improve the sensitivity of multi-cancer detection.

      Given the same wet lab protocol, the difference in computational time between a single EM-based model and the stack model is about 10-11 minutes per sample, but the real difference in analysis time can be reduced to ~1 min/sample by parallelization. With regards to the wet lab protocol, an important novelty of SPOT-MAS technology is its all-in-one approach that enables simultaneous analysis of different ctDNA signatures using a single blood draw and a single library reaction, greatly reducing the experimental cost. Thus, we strongly argue that our approach improves the detection sensitivity by increasing the breadth of ctDNA analysis while achieving cost effectiveness for sample preparation and sequencing with negligible trade-off of analysis time .

      We have also added the following sentences in the discussion to clarify this point. (Line 618-625)

      “Moreover, this study showed that the feature of EM achieved the highest performance among the five examined ctDNA signatures in discriminating cancer from healthy controls (Figure S6). Importantly, we found that combining EM with other ctDNA signatures in a stack model could further improve the sensitivity for detecting cancer samples, with significant improvement for lung cancer patients (Figure S6A and S6B). These findings highlighted that the multimodal analysis of multiple ctDNA signatures by SPOT-MAS could increase the breadth of ctDNA feature analysis, thus enhancing the detection sensitivity while maintaining the low cost of sample preparation and sequencing.”

      2) The baseline characteristics of part of the enrolled patients are not clear. It seems that some of the cancer patients were diagnosed only by imaging examinations. The manuscript described "staging information was not available for 25.7% of cancer patients, who were confirmed by specialized clinicians to have non-metastatic tumors". I have no idea how did this confirmation make? According to clinicians' experience only?

      Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment. The majority of cancer patients (74.3%) were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of tumor staging and metastasis status. For patients with unavailable staging information (25.7%), they initially went to the study hospitals for imaging examination. Upon receiving positive imaging results (MRI scan or CT scan), they moved to another hospital for surgery, leading to missing tumor staging information at the original study hospitals. The metastasis status of these patients were later obtained via communications between the clinicians at the study hospitals and the clinicians at the surgery hospitals, subject to existing data sharing agreement between the two hospitals. For those with metastatic cancer or unclear metastatic status, they were excluded from our study.

      We have added the following sentences in the method (Line 127-135) and discussion section (Line 679-688).

      “Cancer patients were confirmed to have cancer by abnormal imaging examination and subsequent tissue biopsy confirmation of malignancy. Cancer stages were determined by the TNM (Tumor, Node, Metastasis) system classification according to the American Joint Committee on Cancer and the International Union for Cancer Control. Our study only recruited cancer patients with non-systemic-metastatic stages (Stage I-IIIA) in which cancer is localized to the primary sites and has not spread to other organs. We excluded patients who were diagnosed with metastatic stage IIIB and IV cancer. All healthy subjects were confirmed to have no history of cancer at the time of enrollment. They were followed up at six months and one year after enrollment to ensure that they did not develop cancer.”

      “For patients with unavailable staging information, their initial imaging examinations were conducted at the study hospitals. However, subsequent tests and surgical procedures were performed at a different hospital, as per the patients' preferences. Consequently, the original study hospitals lacked access to comprehensive tumor staging data. To address this limitation, the metastasis status of these patients was obtained via communication channels between the clinicians at the study hospitals and those at the surgery hospitals. This enabled the retrieval of limited information, adhering to an established data-sharing agreement between the two institutions. To maintain the robustness of our analysis, patients diagnosed with metastatic cancer or those with indeterminate metastatic status were subsequently excluded from the study.”

      3) It seems that one of the important advantages of this new model is the low depth coverage in comparing to previous screening models for cancer. The authors should discuss more on the reason why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth.

      We thanked the reviewer for the suggestion. We have added the following sentences in the discussion to explain why our assay could achieve good performance at low depth sequencing. (Line 571-584)

      “However, the low amount of ctDNA fragments in plasma samples of patients with early-stage cancer as well as the molecular heterogeneity of different cancer types are known as the major challenges for liquid biopsy based multi-cancer detection assays. Thus, sequencing at high depth coverages is required to capture enough informative cancer DNA fragments in the finite plasma sample to achieve early cancer detection. In support to this notion, many groups (1-4) have developed assays that exploited high depth coverage of sequencing to detect ctDNA fragments in plasma of early stage cancer patients. However, this strategy might not be cost effective and feasible for population wide screening in developing countries. Alternatively, we argued that increasing breadth of ctDNA analysis could maximize the ability to detect ctDNA fragments with heterogeneous genetic and epigenetic changes at shallow sequencing depth, thus improving the sensitivity for multicancer detection. To demonstrate the feasibility of this approach, we built a stacking ensemble model to combine nine different ctDNA signatures and demonstrated its superior performance on cancer detection in comparison to single-feature models (Figure 7B and 7C).”

      4) The readability of this manuscript needs to be improved. The focus of the background section is not clear, with too much detail of other studies and few purposeful summaries. You need to explain the goals and clinical significance of your study. In addition, the results section is too long, and needs to be shortened and simplified. Move some of the inessential results and sentences to supplementary materials or methods.

      We thank the reviewer for these constructive suggestions. Accrodingly, we have reduced the details of other studies (Line 85-91) as follows:

      “In recent years, there has been considerable interest in exploring the potential of ctDNA alterations for early detection of cancer (5, 6). One such approach is the PanSeer test, which uses 477 differentially methylated regions (DMRs) in ctDNA to detect five different types of cancer up to four years prior to conventional diagnosis (7). The DELFI assay employs a genome-wide analysis of ctDNA fragment profiles to increase sensitivity in early detection (1). Recently, the Galleri test has emerged as a multi-cancer detection assay that analyses more than 100,000 methylation regions in the genome to detect over 50 cancer types and localize the tumor site (8).”

      We have modified the text in the introduction to explain the goals and clinical significance of our study (Line 111-123)

      “In this study, we aimed to expand our multimodal approach, SPOT-MAS, to comprehensively analyze methylomics, fragmentomics, DNA copy number and end motifs of cfDNA and evaluate its utility to simultaneously detecting and locating cancer from a single screening test.” “Our findings demonstrate that the multimodal approach of SPOT-MAS enables profiling of multiple ctDNA signatures across the entire genome at low sequencing depth to detect five different cancer types in their early stages. Beyond detecting the presence of cancer signals, our assay was able to predict the tumor location, which is important for clinicians to fast-track the follow-up diagnostic and guide necessary treatment. Thus, SPOT-MAS has the potential to become a universal, simple, and cost-effective approach for early multi-cancer detection in a large population.”

      Reviewer #2 (Public Review):

      The authors tried to diagnose cancers and pinpoint tissues of origin using cfDNA. To achieve the goal, they developed a framework to assess methylation, CNA, and other genomic features. They established discovery and validation cohorts for systematic assessment and successfully achieved robust prediction power.

      1) Still, there are places for improvement. The diagnostic effect can be maximized if their framework works well in early-stage cancer patients. According to Table 1, about 10% of the participants are stage I. Do these cancers also perform well as compared to late stage cancers?

      We have performed the comparison of SPOT-MAS performance on different stages and provided the data in Supplementary table S8 and Supplementary Figure S4J and S4L. Our data showed that SPOT-MAS achieved lower sensitivity for detecting stage I and II cancers as compared to stage IIIA cancers in both discovery (61.54% and 69.82% for stage I and II respectively versus 78.67% for stage IIIA, Supplementary table 8) and validation cohort (73.91% and 62.32% for stage I and II, respectively versus 88.31% for stage IIIA, Supplementary table 8). This suggested that cancer stages can influence the performance of our models.

      2) Can authors show a systematic comparison of their method to other previous methods to summarize what their algorithm can achieve compared to others.

      We have conducted a systematic comparison of our method with others in the Supplementary Table S11.

      Reviewer #1 (Recommendations For The Authors):

      There are still points for the authors to clarify and consider for incorporation into revision.

      • Please first clarify the issues mentioned in "public review". Several complements are needed.

      We have addressed all of the reviewer’s comments in “public review”.

      1) Line 72-73: Different approaches of early cancer screening assays have different features, application scenarios, and of course, limitations. It's too vague to describe in this way. More importantly, diagnosis of malignancies relies on pathological diagnosis, I don't think the results of unsuccessful screening would be overdiagnosis and overtreatment. That's overstatements.

      We have rewritten the statement as follows (Line 72-75)

      “Although currently guided screening tests have each been shown to provide better treatment outcomes and reduce cancer mortality, some of them are invasive, thus having low accessibility. Importantly, most of them are single cancer screening tests, which may result in high false positive rates when used sequentially.”

      2) Line 115-130: The findings in this study shouldn't be introduced here.

      We have removed this section.

      3) Line 496-498: It surprised me that the model performed even better in independent validation cohort, which is quite different from the usual situations. Please explain it.

      We agree with the reviewer that model performance in independent validation cohort is often lower than in discovery cohort. In our case, we have carefully confirmed our data by utilizing cross-validation (CV). Cross-validation is a widely used process in which the data being used for training the model is separated into folds or partitions and the model is trained and validated for each fold; the performance estimates are then calculated to obtain mean and confidence interval (GraphPad Prism, Wilson/Brown method). To further confirm our findings, we have increased the cross-validation fold into 50, and consistently detected no significant difference in the performance between Discovery and Validation cohorts (p=0.1277, DeLong’s test).

      We have added the following sentence in the discussion to explain this (Line 633-635)

      “Despite a slightly higher AUC value in the validation cohort compared to the discovery cohort, no significant differences in AUC values were observed between the two cohorts at CV of 10 or 50 (p=0.1277, DeLong’s test).”

      4) Line 499-501: For the cut-off value selection, the authors thought that for cancer screening, specificity is more important than sensitivity? It's controversial. The sensitivity is only approximately 70%, I think that a missed diagnosis is even worse.

      We agree with the reviewer that both specificity and sensitivity are important metrics of a cancer detection test. However, there is a trade-off between sensitivity and specificity and the preference for either one of them remains a controversial topic. For a screening test, the preference should be determined by considering the prevalence of the disease, in this case - cancer. The low prevalence of cancers indicates that even a small percentage of false-positive test results due to low specificity of the assay, spread across a national population, would hugely increase the demand for confirmatory imaging as well as biopsy sampling of imaging-detected benign abnormalities (9). Thus, false positives have obvious implications for health-care resources as well as patient well-being. Conversely, higher sensitivities will make sure that more cancer cases are detected and avoid delays in diagnosis. To mitigate the impact of insufficient sensitivity of a cancer screening test, it is important to consult the test-takers that current liquid biopsy tests should only be used as a complementary approach to the available diagnosis tests to increase rates of cancer detection. To be used as a stand-alone test, further work is required to improve its performance, with more focus on increasing sensitivity while maintaining high specificity.

      We have added the following sentences in the discussion to explain why we set a high threshold of specificity (Line 660-671)

      “For an effective screening test, careful consideration of disease prevalence, cancer in this context, is imperative. Given the low prevalence of cancers, even a small proportion of false-positive test results arising from reduced assay specificity, if extrapolated to a national population, could significantly escalate the need for confirmatory imaging and biopsy procedures for benign abnormalities detected during screening. Thus, false-positives can have substantial implications for both healthcare resources and patient well-being. Conversely, a screening test with high sensitivity ensures that most cancer cases are detected and minimizes delays in diagnosis. To address potential limitations posed by low sensitivity in cancer screening tests, we suggest that current liquid biopsy tests should be employed as a complementary approach to existing diagnostic methods to enhance cancer detection rates. To be used a stand-alone test, further work is required to improve its performance, with a particular emphasis on improving sensitivity while preserving high specificity.”

      5) The methylation profiles have been used broadly in ctDNA, while your also integrated the fragmentomics, copy number aberration and end motif into the new model. In the discussion section, it would be better to further compare your new model with several previous models based on conventional ctDNA methylation markers (10, 11) for early detection of malignancies. What are the advantages of adding the other two types of data? Why the new model could achieve comparable predictive accuracy with an obviously lower sequencing depth?

      We thank the reviewer for the suggestion. We have added the following sentences in the discussion to highlight the novelty of our multimodal approach. (Line 587-610)

      “Previous studies have reported that methylation changes at target regions could be exploited for detecting ctDNA in plasma of patients with early-stage cancer (10, 11).”

      “In addition to methylation alterations, recent studies have revealed that the DNA copy number, fragmentomics profile (1) and end motif profile (12) at genome wide scales have been shown as useful features for healthy-cancer classification. Therefore, we propose that the combination of these markers might provide added value to increase the performance of liquid biopsy assays. We demonstrated that the same bisulfite sequencing data could be used to identify somatic CNA (Figure 4), cancer-associated fragment length (Figure 5) and end motifs (Figure 6), highlighting the advantage of SPOT-MAS in capturing the broad landscape of ctDNA signatures without high cost deep sequencing. For cancer-associated fragment length, we pre-processed this data into five different feature tables to better reflect the information embedded within the data. Overall, we integrated multiple features of ctDNA including methylation, fragment length, end motif and copy number changes into a multi-cancer detection model and demonstrated that this approach could distinguish healthy individuals with patients from five popular cancer types. This strategy enables increased breadth of ctDNA analysis at shallow sequencing depth to overcome the limitation of low amount of ctDNA fragments in plasma samples as well as molecular heterogeneity of cancers.”

      Moreover, we have conducted a systematic comparison of our method with others in the Supplementary Table 11.

      6) Line 667-668: The wording should be modest. "Successfully detect and localize" is not appropriate.

      We have rewritten the sentence. (Line 713-716)

      “Our large-scale case-control study demonstrated that SPOT-MAS, with its unique combination of multimodal analysis of cfDNA signatures and innovative machine-learning algorithms, can detect and localize multiple types of cancer with high accuracy at a low-cost sequencing.”

      Reviewer #2 (Recommendations For The Authors):

      1) Are the patients and controls all from Vietnam? If I am not mistaken, it is hard to find demographic information for controls. Also it is not clear if samples from controls were processed simultaneously or at a same institution or using the same protocol etc.

      We thank the reviewer for asking this question. All cancer patients and controls are from Vietnam, who were recruited from five hospitals including Medic Medical Center, University Medical Center Ho Chi Minh City, Thu Duc City Hospital, National Cancer Hospital and Hanoi Medical University. At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.

      We have added the following sentences in the discussion to highlight this important point (Line 696-704)

      “At each research sites, blood samples from both cancer patients and healthy subjects were collected in in Streck Cell-Free DNA BCT tubes and subsequently transported to a central laboratory located in Medical Genetics Institute for cfDNA isolation, library preparation and sequencing. In a recent publication (10), we have investigated the impact of logistic time and hemolysis rates of blood samples collected from different clinical sites on cfDNA concentration and sequencing quality. We did not observe any noticeable impact of such variations on cfDNA concentrations or sequencing library yields. However, future analytical validation studies using a larger sample size are required to evaluate the impact of variation in sampling technique across different clinical sites on the robustness or accuracy of assay results.”

      References

      1. Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385-9.

      2. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359(6378):926-30.

      3. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31(6):745-59.

      4. Stackpole ML, Zeng W, Li S, Liu C-C, Zhou Y, He S, et al. Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer. Nature Communications. 2022;13(1):5566.

      5. Constantin N, Sina AA, Korbie D, Trau M. Opportunities for Early Cancer Detection: The Rise of ctDNA Methylation-Based Pan-Cancer Screening Technologies. Epigenomes. 2022;6(1).

      6. Phan TH, Chi Nguyen VT, Thi Pham TT, Nguyen VC, Ho TD, Quynh Pham TM, et al. Circulating DNA methylation profile improves the accuracy of serum biomarkers for the detection of nonmetastatic hepatocellular carcinoma. Future Oncol. 2022;18(39):4399-413.

      7. Chen X, Gole J, Gore A, He Q, Lu M, Min J, et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature Communications. 2020;11(1):3475.

      8. Jamshidi A, Liu MC, Klein EA, Venn O, Hubbell E, Beausang JF, et al. Evaluation of cell-free DNA approaches for multi-cancer early detection. Cancer Cell. 2022;40(12):1537-49.e12.

      9. Ignatiadis M, Sledge GW, Jeffrey SS. Liquid biopsy enters the clinic - implementation issues and future challenges. Nat Rev Clin Oncol. 2021;18(5):297-312.

      10. Xu RH, Wei W, Krawczyk M, Wang W, Luo H, Flagg K, et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat Mater. 2017;16(11):1155-61.

      11. Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524).

      12. Jiang P, Sun K, Peng W, Cheng SH, Ni M, Yeung PC, et al. Plasma DNA End-Motif Profiling as a Fragmentomic Marker in Cancer, Pregnancy, and Transplantation. Cancer Discovery. 2020;10(5):664-73.

    1. Vannevar Bush, "As We May Think," Atlantic Month1y, (July 1945).

      As We May Think

      From The Atlantic Monthly, July 1945: 101-108. Reprinted with permission. (c)1945, V. Bush.

      As Director of the Office of Scientific Research and Development, Dr. Vannevar Bush has coördinated the activities of some six thousand leading American scientists in the application of science to warfare. In this significant article he holds up an incentive for scientists when the fighting has ceased. He urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge. For many years inventions have extended man's physical powers rather than the powers of his mind. Trip hammers that multiply the fists, microscopes that sharpen the eye, and engines of destruction and detection are new results, but the end results, of modern science. Now, says Dr. Bush, instruments are at hand which, if properly developed, will give man access to and command over the inherited knowledge of the ages. The perfection of these pacific instruments should be the first objective of our scientists as they emerge from their war work. Like Emerson's famous address of 1837 on "The American Scholar," this paper by Dr. Bush calls for a new relationship between thinking man and the sum of our knowledge. - The Editor

      This has not been a scientist's war; it has been a war in which all have had a part. The scientists, burying their old professional competition in the demand of a common cause, have shared greatly and learned much. It has been exhilarating to work in effective partnership. Now, for many, this appears to be approaching an end. What are the scientists to do next?

      For the biologists, and particularly for the medical scientists, there can be little indecision, for their war work has hardly required them to leave the old paths. Many indeed have been able to carry on their war research in their familiar peacetime laboratories. Their objectives remain much the same.

      It is the physicists who have been thrown most violently off stride, who have left academic pursuits for the making of strange destructive gadgets, who have had to devise new methods for their unanticipated assignments. They have done their part on the devices that made it possible to turn back the enemy. They have worked in combined effort with the physicists of our allies. They have felt within themselves the stir of achievement. They have been part of a great team. Now, as peace approaches, one asks where they will find objectives worthy of their best.

      I

      Of what lasting benefit has been man's use of science and of the new instruments which his research brought into existence? First, they have increased his control of his material environment. They have improved his food, his clothing, his shelter; they have increased his security and released him partly from the bondage of bare existence. They have given him increased knowledge of his own biological processes so that he has had a progressive freedom from disease and an increased span of life. They are illuminating the interactions of his physiological and psychological functions, giving the promise of an improved mental health.

      Science has provided the swiftest communication between individuals; it has provided a record of ideas and has enabled man to manipulate and to make extracts from that record so that knowledge evolves and endures throughout the life of a race rather than that of an individual.

      There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers--conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.

      Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. If the aggregate time spent in writing scholarly works and in reading them could be evaluated, the ratio between these amounts of time might well be startling. Those who conscientiously attempt to keep abreast of current thought, even in restricted fields, by close and continuous reading might well shy away from an examination calculated to show how much of the previous month's efforts could be produced on call. Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.

      The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present-day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

      But there are signs of a change as new and powerful instrumentalities come into use. Photocells capable of seeing things in a physical sense, advanced photography which can record what is seen or even what is not, thermionic tubes capable of controlling potent forces under the guidance of less power than a mosquito uses to vibrate his wings, cathode ray tubes rendering visible an occurrence so brief that by comparison a microsecond is a long time, relay combinations which will carry out involved sequences of movements more reliably than any human operator and thousands of times as fast-- there are plenty of mechanical aids with which to effect a transformation in scientific records.

      Two centuries ago Leibnitz invented a calculating machine which embodied most of the essential features of recent keyboard devices, but it could not then come into use. The economics of the situation were against it: the labor involved in constructing it, before the days of mass production, exceeded the labor to be saved by its use, since all it could accomplish could be duplicated by sufficient use of pencil and paper. Moreover, it would have been subject to frequent breakdown, so that it could not have been depended upon; for at that time and long after, complexity and unreliability were synonymous.

      Babbage, even with remarkably generous support for his time, could not produce his great arithmetical machine. His idea was sound enough, but construction and maintenance costs were then too heavy. Had a Pharaoh been given detailed and explicit designs of an automobile, and had he understood them completely, it would have taxed the resources of his kingdom to have fashioned the thousands of parts for a single car, and that car would have broken down on the first trip to Giza.

      Machines with interchangeable parts can now be constructed with great economy of effort. In spite of much complexity, they perform reliably. Witness the humble typewriter, or the movie camera, or the automobile. Electrical contacts have ceased to stick when thoroughly understood. Note the automatic telephone exchange, which has hundreds of thousands of such contacts, and yet is reliable. A spider web of metal, sealed in a thin glass container, a wire heated to brilliant glow, in short, the thermionic tube of radio sets, is made by the hundred million, tossed about in packages, plugged into sockets--and it works! Its gossamer parts, the precise location and alignment involved in its construction, would have occupied a master craftsman of the guild for months; now it is built for thirty cents. The world has arrived at an age of cheap complex devices of great reliability; and something is bound to come of it.

      II

      A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted. Today we make the record conventionally by writing and photography, followed by printing; but we also record on film, on wax disks, and on magnetic wires. Even if utterly new recording procedures do not appear, these present ones are certainly in the process of modification and extension.

      Certainly progress in photography is not going to stop. Faster material and lenses, more automatic cameras, finer-grained sensitive compounds to allow an extension of the minicamera idea, are all imminent. Let us project this trend ahead to a logical, if not inevitable, outcome. The camera hound of the future wears on his forehead a lump a little larger than a walnut. It takes pictures 3 millimeters square, later to be projected or enlarged, which after all involves only a factor of 10 beyond present practice. The lens is of universal focus, down to any distance accommodated by the unaided eye, simply because it is of short focal length. There is a built-in photocell on the walnut such as we now have on at least one camera, which automatically adjusts exposure for a wide range of illumination. There is film in the walnut for a hundred exposure, and the spring for operating its shutter and shifting its film is wound once for all when the film clip is inserted. It produces its result in full color. It may well be stereoscopic, and record with spaced glass eyes, for striking improvements in stereoscopic technique are just around the corner.

      The cord which trips its shutter may reach down a man's sleeve within easy reach of his fingers. A quick squeeze, and the picture is taken. On a pair of ordinary glasses is a square of fine lines near the top of one lens, where it is out of the way of ordinary vision. When an object appears in that square, it is lined up for its j picture. As the scientist of the future moves about the laboratory or the field, every time he looks at something worthy of the record, he trips the shutter and in it goes, without even an audible click. Is this all fantastic? The only fantastic thing about it is the idea of making as many pictures as would result from its use.

      Will there be dry photography? It is already here in two forms. When Brady made his Civil War pictures, the plate had to be wet at the time of exposure. Now it has to be wet during development instead. In the future perhaps it need not be wetted at all. There have long been films impregnated with diazo dyes which form a picture without development, so that it is already there as soon as the camera has been operated. An exposure to ammonia gas destroys the unexposed dye, and the picture can then be taken out into the light and examined. The process is now slow, but someone may speed it up, and it has no grain difficulties such as now keep photographic researchers busy. Often it would be advantageous to be able to snap the camera and to look at the picture immediately.

      Another process now in use is also slow, and more or less clumsy. For fifty years impregnated papers have been used which turn dark at every point where an electrical contact touches them, by reason of the chemical change thus produced in an iodine compound included in the paper. They have been used to make records, for a pointer moving across them can leave a trail behind. If the electrical potential on the pointer is varied as it moves, the line becomes light or dark in accordance with the potential.

      This scheme is now used in facsimile transmission. The pointer draws a set of closely spaced lines across the paper one after another. As it moves, its potential is varied in accordance with a varying current received over wires from a distant station, where these variations are produced by a photocell which is similarly scanning a picture. At every instant the darkness of the line being drawn is made equal to the darkness of the point on the picture being observed by the photocell. Thus, when the whole picture has been covered, a replica appears at the receiving end.

      A scene itself can be just as well looked over line by line by the photocell in this way as can a photograph of the scene. This whole apparatus constitutes a camera, with the added feature, which can be dispensed with if desired, of making its picture at a distance. It is slow, and the picture is poor in detail. Still, it does give another process of dry photography, in which the picture is finished as soon as it is taken.

      It would be a brave man who would predict that such a process will always remain clumsy, slow, and faulty in detail. Television equipment today transmits sixteen reasonably good pictures a second, and it involves only two essential differences from the process described above. For one, the record is made by a moving beam of electrons rather than a moving pointer, for the reason that an electron beam can sweep across the picture very rapidly indeed. The other difference involves merely the use of a screen which glows momentarily when the electrons hit, rather than a chemically treated paper or film which is permanently altered. This speed is necessary in television, for motion pictures rather than stills are the object.

      Use chemically treated film in place of the glowing screen, allow the apparatus to transmit one picture only rather than a succession, and a rapid camera for dry photography results. The treated film needs to be far faster in action than present examples, but it probably could be. More serious is the objection that this scheme would involve putting the film inside a vacuum chamber, for electron beams behave normally only in such a rarefied environment. This difficulty could be avoided by allowing the electron beam to play on one side of a partition, and by pressing the film against the other side, if this partition were such as to allow the electrons to go through perpendicular to its surface, and to prevent them from spreading out sideways. Such partitions, in crude form, could certainly be constructed, and they will hardly hold up the general development.

      Like dry photography, microphotography still has a long way to go. The basic scheme of reducing the size of the record, and examining it by projection rather than directly, has possibilities too great to be ignored. The combination of optical projection and photographic reduction is already producing some results in microfilm for scholarly purposes, and the potentialities are highly suggestive. Today, with microfilm, reductions by a linear factor of 20 can be employed and still produce full clarity when the material is re-enlarged for examination. The limits are set by the graininess of the film, the excellence of the optical system, and the efficiency of the light sources employed. All of these are rapidly improving .

      Assume a linear ratio of 100 for future use. Consider film of the same thickness as paper, although thinner film will certainly be usable. Even under these conditions there would be a total factor of 10,000 between the bulk of the ordinary record on books, and its microfilm replica. The Encyclopedia Britannica could be reduced to the volume of a matchbox. A library of a million volumes could be compressed into one end of a desk. If the human race has produced since the invention of movable type a total record, in the form of magazines, newspapers, books, tracts, advertising blurbs, correspondence, having a volume corresponding to a billion books, the whole affair, assembled and compressed, could be lugged off in a moving van. Mere compression, of course, is not enough; one needs not only to make and store a record but also be able to consult it, and this aspect of the matter comes later. Even the modern great library is not generally consulted; it is nibbled at by a few.

      Compression is important, however, when it comes to costs. The material for the microfilm Britannica would cost a nickel, and it could be mailed anywhere for a cent. What would it cost to print a million copies? To print a sheet of newspaper, in a large edition, costs a small fraction of a cent. The entire material of the Britannica in reduced microfilm form would go on a sheet eight and one-half by eleven inches. Once it is available, with the photographic reproduction methods of the future, duplicates in large quantities could probably be turned out for a cent apiece beyond the cost of materials. The preparation of the original copy? That introduces the next aspect of the subject.

      III

      To make the record, we now push a pencil or tap a typewriter. Then comes the process of digestion and correction, followed by an intricate process of typesetting, printing, and distribution. To consider the first stage of the procedure, will the author of the future cease writing by hand or typewriter and talk directly to the record? He does so indirectly, by talking to a stenographer or a wax cylinder; but the elements are all present if he wishes to have his talk directly produce a typed record. All he needs to do is to take advantage of existing mechanisms and to alter his language .

      At a recent World Fair a machine called a Voder was shown. A girl stroked its keys and it emitted recognizable speech. No human vocal chords entered into the procedure at any point; the keys simply combined some electrically produced vibrations and passed these on to a loudspeaker. In the Bell Laboratories there is the converse of this machine, called a Vocoder. The loud-speaker is replaced by a microphone, which picks up sound. Speak to it, and the corresponding keys move. This may be one element of the postulated system.

      The other element is found in the stenotype, that somewhat disconcerting device encountered usually at public meetings. A girl strokes its keys languidly and looks about the room and sometimes at the speaker with a disquieting gaze. From it emerges a typed strip which records in a phonetically simplified language a record of what the speaker is supposed to have said. Later this strip is retyped into ordinary language, for in its nascent form it is intelligible only to the initiated. Combine these two elements, let the Vocoder run the stenotype, and the result is a machine which types when talked to.

      Our present languages are not especially adapted to this sort of mechanization, it is true. It is strange that the inventors of universal languages have not seized upon the idea of producing one which better fitted the technique for transmitting and recording speech. Mechanization may yet force the issue, especially in the scientific field; whereupon scientific jargon would become still less intelligible to the layman.

      One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may both be in miniature, so that he projects them for examination.

      Much needs to occur, however, between the collection of data and observations, the extraction of parallel material from the existing record, and the final insertion of new material into the general body of the common record. For mature thought there is no mechanical substitute. But creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids.

      Adding a column of figures is a repetitive thought process, and it was long ago properly relegated to the machine. True, the machine is sometimes controlled by a keyboard, and thought of a sort enters in reading the figures and poking the corresponding keys, but even this is avoidable. Machines have been made which will read typed figures by photocells and then depress the corresponding keys; these are combinations of photocells for scanning the type, electric circuits for sorting the consequent variations, and relay circuits for interpreting the result into the action of solenoids to pull the keys down.

      All this complication is needed because of the clumsy way in which we have learned to write figures. If we recorded them positionally, simply by the configuration of a set of dots on a card, the automatic reading mechanism would become comparatively simple. In fact, if the dots are holes, we have the punched-card machine long ago produced by Hollorith for the purposes of the census, and now used throughout business. Some types of complex businesses could hardly operate without these machines.

      Adding is only one operation. To perform arithmetical computation involves also subtraction, multiplication, and division, and in addition some method for temporary storage of results, removal from storage for further manipulation, and recording of final results by printing. Machines for these purposes are now of two types: keyboard machines for accounting and the like, manually controlled for the insertion of data, and usually automatically controlled as far as the sequence of operations is concerned; and punched-card machines in which separate operations are usually delegated to a series of machines, and the cards then transferred bodily from one to another. Both forms are very useful; but as far as complex computations are concerned, both are still in embryo.

      Rapid electrical counting appeared soon after the physicists found it desirable to count cosmic rays. For their own purposes the physicists promptly constructed thermionic-tube equipment capable of counting electrical impulses at the rate of 100,000 a second. The advanced arithmetical machines of the future will be electrical in nature, and they will perform at 100 times present speeds, or more.

      Moreover, they will be far more versatile than present commercial machines, so that they may readily be adapted for a wide variety of operations. They will be controlled by a control card or film, they will select their own data and manipulate it in accordance with the instructions thus inserted, they will perform complex arithmetical computations at exceedingly high speeds, and they will record results in such form as to be readily available for distribution or for later further manipulation. Such machines will have enormous appetites. One of them will take instructions and data from a whole roomful of girls armed with simple keyboard punches, and will deliver sheets of computed results every few minutes. There will always be plenty of things to compute in the detailed affairs of millions of people doing complicated things.

      IV

      The repetitive processes of thought are not confined, however, to matters of arithmetic and statistics. In fact, every time one combines and records facts in accordance with established logical processes, the creative aspect of thinking is concerned only with the selection of the data and the process to be employed, and the manipulation thereafter is repetitive in nature and hence a fit matter to be relegated to the machines. Not so much has been done along these lines, beyond the bounds of arithmetic, as might be done, primarily because of the economics of the situation. The needs of business, and the extensive market obviously waiting, assured the advent of mass-produced arithmetical machines just as soon as production methods were sufficiently advanced.

      With machines for advanced analysis no such situation existed; for there was and is no extensive market; the users of advanced methods of manipulating data are a very small part of the population. There are, however, machines for solving differential equations--and functional and integral equations, for that matter. There are many special machines, such as the harmonic synthesizer which predicts the tides. There will be many more, appearing certainly first in the hands of the scientist and in small numbers.

      If scientific reasoning were limited to the logical processes of arithmetic, we should not get far in our understanding of the physical world. One might as well attempt to grasp the game of poker entirely by the use of the mathematics of probability. The abacus, with its beads strung on parallel wires, led the Arabs to positional numeration and the concept of zero many centuries before the rest of the world; and it was a useful tool--so useful that it still exists.

      It is a far cry from the abacus to the modern keyboard accounting machine. It will be an equal step to the arithmetical machine of the future. But even this new machine will not take the scientist where he needs to go. Relief must be secured from laborious detailed manipulation of higher mathematics as well, if the users of it are to free their brains for something more than repetitive detailed transformations in accordance with established rules. A mathematician is not a man who can readily manipulate figures; often he cannot. He is not even a man who can readily perform the transformations of equations by the use of calculus. He is primarily an individual who is skilled in the use of symbolic logic on a high plane, and especially he is a man of intuitive judgment in the choice of the manipulative processes he employs.

      All else he should be able to turn over to his mechanism, just as confidently as he turns over the propelling of his car to the intricate mechanism under the hood. Only then will mathematics be practically effective in bringing the growing knowledge of atomistics to the useful solution of the advanced problems of chemistry, metallurgy, and biology. For this reason there will come more machines to handle advanced mathematics for the scientist. Some of them will be sufficiently bizarre to suit the most fastidious connoisseur of the present artifacts of civilization.

      V

      The scientist, however, is not the only person who manipulates data and examines the world about him by the use of logical processes, although he sometimes preserves this appearance by adopting into the fold anyone who becomes logical, much in the manner in which a British labor leader is elevated to knighthood. Whenever logical processes of thought are employed--that is, whenever thought for a time runs along an accepted groove--there is an opportunity for the machine. Formal logic used to be a keen instrument in the hands of the teacher in his trying of students' souls. It is readily possible to construct a machine which will manipulate premises in accordance with formal logic, simply by the clever use of relay circuits. Put a set of premises into such a device and turn the crank, and it will readily pass out conclusion after conclusion, all in accordance with logical law, and with no more slips than would be expected of a keyboard adding machine.

      Logic can become enormously difficult, and it would undoubtedly be well to produce more assurance in its use. The machines for higher analysis have usually been equation solvers. Ideas are beginning to appear for equation transformers, which will rearrange the relationship expressed by an equation in accordance with strict and rather advanced logic. Progress is inhibited by the exceedingly crude way in which mathematicians express their relationships. They employ a symbolism which grew like Topsy and has little consistency; a strange fact in that most logical field.

      A new symbolism, probably positional, must apparently precede the reduction of mathematical transformations to machine processes. Then, on beyond the strict logic of the mathematician, lies the application of logic in everyday affairs. We may some day click off arguments on a machine with the same assurance that we now enter sales on a cash register. But the machine of logic will not look like a cash register, even of the streamlined model.

      So much for the manipulation of ideas and their insertion into the record. Thus far we seem to be worse off than before--for we can enormously extend the record; yet even in its present bulk we can hardly consult it. This is a much larger matter than merely the extraction of data for the purposes of scientific research; it involves the entire process by which man profits by his inheritance of acquired knowledge. The prime action of use is selection, and here we are halting indeed. There may be millions of fine thoughts, and the account of the experience on which they are based, all encased within stone walls of acceptable architectural form; but if the scholar can get at only one a week by diligent search, his syntheses are not likely to keep up with the current scene.

      Selection, in this broad sense, is a stone adze in the hands of a cabinetmaker. Yet, in a narrow sense and in other areas, something has already been done mechanically on selection. The personnel officer of a factory drops a stack of a few thousand employee cards into a selecting machine, sets a code in accordance with an established convention, and produces in a short time a list of all employees who live in Trenton and know Spanish. Even such devices are much too slow when it comes, for example, to matching a set of fingerprints with one of five million on file. Selection devices of this sort will soon be speeded up from their present rate of reviewing data at a few hundred a minute. By the use of photocells and microfilm they will survey items at the rate of a thousand a second, and will print out duplicates of those selected.

      This process, however, is simple selection: it proceeds by examining in turn every one of a large set of items, and by picking out those which have certain specified characteristics. There is another form of selection best illustrated by the automatic telephone exchange. You dial a number and the machine selects and connects just one of a million possible stations. It does not run over them all. It pays attention only to a class given by a first digit, then only to a subclass of this given by the second digit, and so on; and thus proceeds rapidly and almost unerringly to the selected station. It requires a few seconds to make the selection, although the process could be speeded up if increased speed were economically warranted. If necessary, it could be made extremely fast by substituting thermionic-tube switching for mechanical switching, so that the full selection could be made in one one-hundredth of a second. No one would wish to spend the money necessary to make this change in the telephone system, but the general idea is applicable elsewhere.

      Take the prosaic problem of the great department store. Every time a charge sale is made, there are a number of things to be done. The inventory needs to be revised, the salesman needs to be given credit for the sale, the general accounts need an entry, and, most important, the customer needs to be charged. A central records device has been developed in which much of this work is done conveniently. The salesman places on a stand the customer's identification card, his own card, and the card taken from the article sold--all punched cards. When he pulls a lever, contacts are made through the holes, machinery at a central point makes the necessary computations and entries, and the proper receipt is printed for the salesman to pass to the customer.

      But there may be ten thousand charge customers doing business with the store, and before the full operation can be completed someone has to select the right card and insert it at the central office. Now rapid selection can slide just the proper card into position in an instant or two, and return it afterward. Another difficulty occurs, however. Someone must read a total on the card, so that the machine can add its computed item to it. Conceivably the cards might be of the dry photography type I have described. Existing totals could then be read by photocell, and the new total entered by an electron beam.

      The cards may be in miniature, so that they occupy little space. They must move quickly. They need not be transferred far, but merely into position so that the photocell and recorder can operate on them. Positional dots can enter the data. At the end of the month a machine can readily be made to read these and to print an ordinary bill. With tube selection, in which no mechanical parts are involved in the switches, little time need be occupied in bringing the correct card into use--a second should suffice for the entire operation. The whole record on the card may be made by magnetic dots on a steel sheet if desired, instead of dots to be observed optically, following the scheme by which Poulsen long ago put speech on a magnetic wire. This method has the advantage of simplicity and ease of erasure. By using photography, however, one can arrange to project the record in enlarged form, and at a distance by using the process common in television equipment.

      One can consider rapid selection of this form, and distant projection for other purposes. To be able to key one sheet of a million before an operator in a second or two, with the possibility of then adding notes thereto, is suggestive in many ways. It might even be of use in libraries, but that is another story. At any rate, there are now some interesting combinations possible. One might, for example, speak to a microphone, in the manner described in connection with the speech-controlled typewriter, and thus make his selections. It would certainly beat the usual file clerk.

      VI

      The real heart of the matter of selection, however, goes deeper than a lag in the adoption of mechanisms by libraries, or a lack of development of devices for their use. Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

      The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

      Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it. In minor ways he may even improve, for his records have relative permanency. The first idea, however, to be drawn from the analogy concerns selection. Selection by association, rather than by indexing, may yet be mechanized. One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage.

      Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

      It consists of a desk, and while it can presumably be operated from a distance, it is primarily the piece of furniture at which he works. On the top are slanting translucent screens, on which material can be projected for convenient reading. There is a keyboard, and sets of buttons and levers. Otherwise it looks like an ordinary desk.

      In one end is the stored material. The matter of bulk is well taken care of by improved microfilm. Only a small part of the interior of the memex is devoted to storage, the rest to mechanism. Yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so he can be profligate and enter material freely.

      Most of the memex contents are purchased on microfilm ready for insertion. Books of all sorts, pictures, current periodicals, newspapers, are thus obtained and dropped into place. Business correspondence takes the same path. And there is provision for direct entry. On the top of the memex is a transparent platen. On this are placed longhand notes, photographs, memoranda, all sorts of things. When one is in place, the depression of a lever causes it to be photographed onto the next blank space in a section ~_ the memex film, dry photography being employed

      There is, of course, provision for consultation of the record by the usual scheme of indexing. If the user wishes to consult a certain book, he taps its code on the keyboard, and the title page of the book promptly appears before him, projected onto one of his viewing positions. Frequently-used codes are mnemonic, so that he seldom consults his code book; but when he does, a single tap of a key projects it for his use. Moreover, he has supplemental levers. On deflecting one of these levers to the right he runs through the book before him, each page in turn being projected at a speed which just allows a recognizing glance at each. If he deflects it further to the right, he steps through the book 10 pages at a time; still further at 100 pages at a time. Deflection to the left gives him the same control backwards.

      A special button transfers him immediately to the first page of the index. Any given book of his library can thus be called up and consulted with far greater facility than if it were taken from a shelf. As he has several projection positions, he can leave one item in position while he calls up another. He can add marginal notes and comments, taking advantage of one possible type of dry photography, and it could even be arranged so that he can do this by a stylus scheme, such as is now employed in the telautograph seen in railroad waiting rooms, just as though he had the physical page before him.

      VII

      All this is conventional, except for the projection forward of present-day mechanisms and gadgetry. It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing.

      When the user is building a trail, he names it, inserts the name in his code book, and taps it ~out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item.

      Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. It is more than this, for any item can be joined into numerous trails.

      The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

      And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outraged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

      VIII

      Wholly new forms of encyclopedias will appear, ready-made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. The physician, puzzled by a patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.

      The historian, with a vast chronological account of a people, parallels it with a skip trail which stops only on the salient items, and can follow at any time contemporary trails which lead him all over civilization at a particular epoch. There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which they were erected.

      Thus science may implement the ways in which man produces, stores, and consults the record of the race. It might be striking to outline the instrumentalities of the future more spectacularly, rather than to stick closely to methods and elements now known and undergoing rapid development, as has been done here. Technical difficulties of all sorts have been ignored, certainly, but also ignored are means as yet unknown which may come any day to accelerate technical progress as violently as did the advent of the thermionic tube. In order that the picture may not be too commonplace, by reason of sticking to present-day patterns, it may be well to mention one such possibility, not to prophesy but merely to suggest, for prophecy based on extension of the known has substance, while prophecy founded on the unknown is only a doubly involved guess.

      All our steps in creating or absorbing material of the record proceed through one of the senses--the tactile when we touch keys, the oral when we speak or listen, the visual when we read. Is it not possible that some day the path may be established more directly?

      We know that when the eye sees, all the consequent information is transmitted to the brain by means of electrical vibrations in the channel of the optic nerve. This is an exact analogy with the electrical vibrations which occur in the cable of a television set: they convey the picture from the photocells which see it to the radio transmitter from which it is broadcast. We know further that if we can approach that cable with the proper instruments, we do not need to touch it; we can pick up those vibrations by electrical induction and thus discover and reproduce the scene which is being transmitted, just as a telephone wire may be tapped for its message.

      The impulses which flow in the arm nerves of a typist convey to her fingers the translated information which reaches her eye or ear, in order that the fingers may be caused to strike the proper keys. Might not these currents be intercepted, either in the original form in which information is conveyed to the brain, or in the marvelously metamorphosed form in which they then proceed to the hand?

      By bone conduction we already introduce sounds into the nerve channels of the deaf in order that they may hear. Is it not possible that we may learn to introduce them without the present cumbersomeness of first transforming electrical vibrations to mechanical ones, which the human mechanism promptly transforms back to the electrical form? With a couple of electrodes on the skull the encephalograph now produces pen-and-ink traces which bear some relation to the electrical phenomena going on in the brain itself. True, the record is unintelligible, except as it points out certain gross misfunctioning of the cerebral mechanism; but who would now place bounds on where such a thing may lead?

      In the outside world, all forms of intelligence, whether of sound or sight, have been reduced to the form of varying currents in an electric circuit in order that they may be transmitted. Inside the human frame exactly the same sort of process occurs.

      Must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another? It is a suggestive thought, but it hardly warrants prediction without losing touch with reality and immediateness.

      Presumably man's spirit should be elevated if he can better review his shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory. His excursions may be more enjoyable if he can reacquire the privilege of forgetting the manifold things he does not need to have immediately at hand, with some assurance that he can find them again if they prove important.

      The applications of science have built man a well-supplied house, and are teaching him to live healthily therein. They have enabled him to throw masses of people against one another with cruel weapons. They may yet allow him truly to encompass the great record and to grow in the wisdom of race experience. He may perish in conflict before he learns to wield that record for his true good. Yet, in the application of science to the needs and desires of man, it would seem to be a singularly unfortunate stage at which to terminate the process, or to lose hope as to the outcome.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank Review Commons for their innovative approach to scientific peer-review and publishing. We thank all the Reviewers for their positive, highly complementary assessment of the manuscript and for highlighting the high quality and reproducibility of the work and the novelty and significance of the results: “The experiments are well-designed and perfectly executed and presented”; “I felt that this is a strong manuscript for peer-review as it serves diversified interests in modern cell biology.”; “The manuscript would be of interest to basic researchers working on epithelial development. Also potentially to basic researchers working on cancer, due to the mitotic errors described.”. We are grateful for the Reviewers’ comments and suggestions that have contributed to improving the revised manuscript. We have addressed all the Reviewers’ concerns, as detailed below in the point-by-point response to the Reviewers. Textual changes in the revised manuscript are marked in Blue.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      *The manuscript "Crosstalk between the plasma membrane and cell-cell adhesion maintains epithelial identity for correct polarised cell divisions" by Dr. Hosawi and colleagues reports the characterisation of the mitotic connection between plasma membrane dynamics and division orientation in polarised mammalian epithelial cells in culture. The authors start from the comparison of mitotic events of human mammary MCF10A cells grown at optimal density or at low density. They observed that only at optimal density MCF10A cells polarise by E-cadherin mediated cell-cell contacts, and display uniform membrane enrichment at the cortex, whereas cells grown at low density do not show cortical E-Cadherin enrichment, and distribute aberrantly the plasma membrane at one side and in cytoplasmic vesicles, generating daughter cells with unequal size. Consistently, further analyses revealed that low-density MCF10A cells undergo misoriented mitosis, with chromosome congression and misegregetion defects. Mechanistically, low density MCF10A cells fail to organise a symmetric mitotic spindle and center it in metaphase. This is due to an increased cortical actomyosin thickness coupled to abnormal astral microtubule stability. Building on previous data from the Elias lab, the authors uncover a role of the membrane-associated S100A11 protein in maintaining correct plasma membrane dynamics and E-cadherin localisation in mitosis. Further dissection of the molecular mechanism underlying this mitotic function od S10011A revealed that it enriches at the cortex only in optimal-density MCF10A cells, and promotes spindle orientation by association with LGN and E-cadherin, upstream of E-cadherin. This evidence depicts the plasma membrane and S100A11 proteins as a key mechanical sensors of cell-cell adhesion orchestrating the recruitment of E-cadherin and LGN-dependent force generators to ensure correct division orientation. *

      *Major points: *

      *- Important information is presented in Supplementary Figure S3. I suggest to move these panels in the main figures. Specifically, I would replace figure 4A with S3A showing the distribution of endogenous S100A11 in MCF10A cells, rather than the one of the GFP-tagged version which is over-expressed. *

      __Authors response: __We thank the Reviewer for this suggestion. We have now moved Figure S3A to Figure 4, to replace Figure 4A and show the localisation of endogenous S100A11 during mitosis and included new quantifications in new Figure 4B. We have moved Figure 3A to supplementary figures (new Figure S4A). We have amended the text of the results section and the Source Data file accordingly.

      *- The mechanisms of division orientation governed by S100A11 seems to impinge on the control of cortical F-actin and astral microtubule dynamics. This is illustrated in figure S3C, which in my opinion should be shown in the main figures with some more explanation / experiments. The authors mention the " tight actin F-actin bundles at the cell-cell contacts" that are lost in S100A11-depleted cells, and that interact with astral microtubules. However this is not fully clear in figure S3C. I think the authors should find a way to present better these evidence which is key in supporting their molecular model. *

      __Authors response: __As requested by the Reviewer we have now moved Figure S3C to the main manuscript, as new Figure 5. To clarify further the effect of S100A11 depletion on the tight actin bundle formation at the cell-cell contacts, we have now included a new illustration in new Figure 5C. Additionally, we have clarified further these findings in the results section (page 11). While we agree with the Reviewer that additional experiments, for example using live imaging of MCF-10A cells co-labelled for F-actin and tubulin, would help assess further the crosstalk between cortical actin and astral microtubules, based on our experience these live imaging experiments are challenging and can take up to several months to optimise and may not warrant successful outcome.

      *- I think the discussion would benefit from the addition of a graphical cartoon model illustrating the role of S100A11 in controlling plasma membrane dynamics in mitosis and spindle orientation. *

      __Authors response: __We thank the Reviewer for this suggestion. We have now added a graphical cartoon (new Figure 7), summarising the role of S100A11-mediated regulation of plasma membrane dynamics in polarised cell division orientation, progression and outcome. We hope this new illustration clarifies further the mechanisms described in this study.

      *- Finally, to understand the relevance of S100A11 in the context of 3D polarised mammary epithelia, it would be very interesting to analyse the effect of S100A11 knock-downn in mouse mammary epithelial acini grown in matrigel. This is not essential for the proposed studies, but would add biological relevance to the mechanisms characterised in 2D colture. *

      __Authors response: __We agree with the Reviewer that validating our findings in 3D cultures of mammary epithelial cells will be important to determine the influence of S100A11-mediated regulation of plasma membrane dynamics during mitosis on lumen formation and tissue morphogenesis. This is exactly the direction where the follow-up of these findings will go. While the first author who led this work has graduated and left our lab, we have recently recruited a new PhD student to address this important question, which will need a few years of investigation to provide important insights, similarly to what we did in our previous work (Fankhaenel et al., 2023 Nat Commun).

      *Minor comments: *

      *- It would be preferable to mention the known functions of S100A11 in the introduction rather than at the beginning of the paragraph at pg. 9. *

      __Authors response: __In response to the Reviewer’s suggestion, we have now moved the paragraph describing known functions of S100A11 to the introduction of the revised manuscript (see page 5).

      *- at pg 10, beginning of paragraph, I find it a weird phrasing that "LGN interacts with F-actin". As reported in the reference cited here, this is through Afadin, which binds simultaneously LGN and cortical F-actin. I would rephrase it. *

      __Authors response: __We thank the Reviewer for clarifying this point, which we have now rectified in the revised manuscript (see page 11).

      __Reviewer #1 (Significance (Required)): __

      *The description of cell adhesion as key factor instructing correct mitotic progression and execution of oriented division of vertebrate epithelial cells by controlling plasma membrane dynamics is novel and interesting for scientist in the spindle orientation/polarity field. The experiments are well-designed and perfectly executed and presented. I am in favour of publication of the manuscript, providing that a few points are addressed. *

      Authors response: We thank the Reviewer for their very positive evaluation of our work.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      *Establishment and maintenance of cell polarity are fundamental processes for physiology in multi-cellular organism given the fact that more than 380 million epithelial cell renewal for every second in human adults. However, the precise mechanisms linking plasma membrane polarity and cortical cytoskeleton dynamics of epithelial cells during mitotic exit and interphase remain ill-illustrated. Salah Elias and her colleagues experimentally manipulated the density of mammary epithelial cells in culture, which led to several mitotic defects. Specifically, they found that perturbation of cell-cell adhesion integrity impairs the dynamics of the plasma membrane during mitosis, affecting the shape and size of mitotic cells and resulting in defects in mitosis progression and generating daughter cells with aberrant cytoarchitecture. In these conditions, F-actin-astral microtubule crosstalk is impaired leading to mitotic spindle misassembly and misorientation, which in turn contributes to chromosome mis-segregation. Mechanistically, they identified the S100 Ca2+-binding protein A11 as a key membrane-associated regulator that forms a complex with E-cadherin and LGN to coordinate plasma membrane remodelling with E-cadherin-mediated cell adhesion and LGN-dependent mitotic spindle machinery. I felt that this is a strong manuscript for peer-review as it serves diversified interests in modern cell biology. *

      Authors response: We thank the Reviewer for their overall very positive feedback on our manuscript.

      __Reviewer #2 (Significance (Required)): __

      Several key cellular experiments should be repeated using a second line of epithelial cells such as RPE1.

      __Authors response: __We agree with the Reviewer it will be interesting to test our findings in other epithelial cells, including RPE1 cells, a widely used epithelial cell model to study the mechanisms controlling cell division. Nonetheless, we would like to emphasise that while our work demonstrates the importance of the interplay between plasma membrane dynamics and cell-cell adhesion for correct execution of polarised cell divisions in mammary epithelial cells, our aim is not to generalise the role of these S100A11-mediated mechanisms. An elegant study has shown that the mechanisms controlling plasma membrane remodelling and elongation during mitosis to ensure correct positioning of the mitotic spindle and symmetric division differ between HeLa cells and RPE1 cells (Kiyomitsu and Cheeseman, 2013 Cell). Additional experiments in a second cell line will require a thorough characterisation of the expression and localisation of S100A11 during the cell cycle, as well as the use of extensive and time-consuming knockdown and imaging experiments over several months and may lead to different observations requiring further mechanistic investigation, which is beyond the initial scope of this study. Additionally, the PhD student who led this study has graduated and left the lab and presently we don’t have capacity or resources to conduct these suggested experiments. Finally, to precisely address the Reviewer’s concern, we have now amended the revised manuscript to make our statements more specific to mammary epithelial cells throughout the text.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      *Summary: your understanding of the study and its conclusions. *

      *The scope of the study is to understand the links between cell-cell adhesion integrity, plasma membrane dynamics and mitotic spindle in mammalian epithelial tissues. To test this, the authors cultured mammary epithelial cells at optimal or low density as a way of perturbing cell-cell adhesion. The authors conclude that perturbing cell-cell adhesion alters plasma membrane dynamics, causing mitotic defects and that S100A11 coordinates this link via E-cadherin. Whilst this is an interesting manuscript, illustrating the differences of mitotic success in optimal density vs. low density cell cultures, I do not think that the conclusions are supported by the evidence presented for the reasons stated below. *

      *Major comments: major issues affecting the conclusions. *

      *- The manuscript clearly shows that culturing cells at a lower density results in a higher incidence of asymmetric division (figure 1) and mitosis defects (figure 2). Cells round more and faster and there is more actin at the cortex during rounding (figure 3). However, whilst differences in cell-cell adhesion are likely to play a role in mediating these effects, I don't think that it is possible to claim from the data presented that these defects are specifically due to cell-cell adhesion differences. This is because the morphology of cells at low density is also very different - cells appear more mesenchymal, with migratory front-rear polarity instead of apical-basal polarity. These cells will therefore have many differences between them, cell-adhesion being just one. The data is also not showing a 'loss' of cell-cell adhesion integrity but are rather illustrating the differences between cells that have formed cell-cell adhesions and those that have not. To really test the specific role of cell-cell adhesions, the authors would need to inhibit adhesions directly but without altering the cell density - for example via chemical or genetic perturbation within a confined environment. I suggest that the authors either need to do these experiments or to requalify what their data is telling us. *

      __Authors response: __We thank the Reviewer for their insightful discussion of the proposed mechanisms described in our manuscript. Several of the Reviewer’s comments pinpoint and exactly match the messages that we would like to convey to the scientific community. Therefore, to address the Reviewer’s comments, we have carefully requalified our statements in several places in the revised manuscript, to ensure they are more clear and more precise.

      We agree with the Reviewer’s comment that our experiments using sub-optimal density of mammary epithelial cells rather prevents the formation of cell-cell adhesions than disturbing them. The Reviewer is right, our experiments in low-density cultures suggest that perturbation of cell-cell adhesion formation impairs mammary epithelial identity, where cells lose their polarity and adopt a more mesenchymal phenotype, associated with plasma membrane remodelling defects. This affects the dynamics and progression of cell division. Nonetheless, our observations suggest an interplay between cell-cell adhesion and the plasma membrane to maintains correct cell shape during mitosis. To test this hypothesis, we explored the function of S100A11 which we have identified in the LGN interactome in mitotic mammary epithelial cells (Fankhaenel et al., 2023 Nat Commun), and which has been shown to interact with E-cadherin at adherens junctions in MDCK cells (Guo et al., 2014 Sci Signal). This, together with the fact that S100A11 controls plasma membrane repair (Jaiswal et al., 2014 Nat Commun), suggested S100A11 as an interesting candidate to investigate the interplay between cell-cell adhesion and membrane remodelling during mitosis. The data presented here suggest that we were right and the perturbation of our membrane-bound target, S100A11, indeed leads to the same mitotic phenotypes. S100A11 RNAi-mediated knockdown (48h) affects E-cadherin localisation at the plasma membrane and impairs cell-cell adhesion formation with effects on plasma membrane dynamics that phenocopy the defects observed in our low-density culture experiments. Remarkably, perturbation of cell-cell adhesions persisted in cell treated with si-S100A11 for 72h (see Figure S3). Of note, all our siRNA experiments have been carried out in cells cultured at optimal density to establish cell-cell adhesions. Thus, S100A11 knockdown allows genetic perturbation of E-cadherin-mediated cell-cell adhesion and recapitulates the plasma membrane and mitotic defects observed in sub-optimal cultures of mammary epithelial cells. Future experiments will be key to dissect these S100A11-mediated mechanisms to further understand how plasma membrane remodelling and cell-cell adhesions are coordinated during mitosis. Finally, as suggested by the Reviewer, we have now requalified our conclusions as appropriate in the revised manuscript.

      *- The current manuscript also demonstrates that cell adhesion is affected when S100A11 is knocked down (figure 4). It shows binding between and colocalization of S100A11 and E-cadherin, and shows that LGN cortical distribution is affected when S100A11 is knocked down (Figure 5). The results presented are suggestive of S100A11 being upstream of E-cadherin. However, I don't understand how the data shows "crosstalk between the plasma membrane, cell-cell adhesion, and the cell cortex during mitosis". For example, on P9: "We observed unequal distribution of CellMaskTM in a vast majority of S100A11-depleted cells (si-S100A11#1: ~79% versus si-Control: ~26%), indicating defects in plasma membrane remodelling (Figures 4B and 4C)." I don't agree that this demonstrates a defect in PM remodelling. Rather the cells in the representative images are less adherent and have adopted a more migratory cell state similar to that seen in figure 1 when seeded at low density. The fluidity of the much larger cells shown in knock down cells in panel F also appears higher, again suggesting an adhesion defect. *

      • *

      __Authors response: __The Reviewer has raised very important points here, which we would like to clarify.

      We agree with the Reviewer that our results in S100A11-depleted cells indicate impaired cell adhesions which generates cells displaying an invasive/migratory behaviour. However, our analysis of S100A11-depleted mitotic cells labelled with CellMaskTM reveals abnormal plasma membrane elongation generating two daughter cells displaying defective geometry as compared to control cells. These defects in the plasma membrane and cell shape were not noticeable upon E-cadherin knockdown (see previous Figures 5K and 5L; now new Figures 6K and 6L). Thus, our results strongly suggest that S100A11 acts as an upstream cue that coordinates plasma membrane dynamics with E-cadherin-mediated cell adhesions, and that additional mechanisms may be regulated by S100A11 to coordinate cell-cell adhesion with plasma membrane remodelling. How S100A11 ensures such a dynamic interplay between the plasma membrane and E-cadherin during mitosis remains a key question that we have not fully addressed in this initial study. An attractive mechanism could be mediated by the function of S100A11 in regulating the dynamic interaction between F-actin and the plasma membrane, as previously reported (Jaiswal et al., 2014 Nat Commun). Increasing evidence shows the importance of the crosstalk between the plasma membrane, the cortex and cell shape for correct execution of mitosis (Rizzelli et al., 2020 Open Biol). In our experiments, we show that impaired plasma membrane remodelling and cell shape are associated with defects in F-actin and astral microtubule organisation. Thus, our findings reinforce a model whereby S100A11 is a key membrane-associated protein that coordinates the crosstalk between the plasma membrane, cell-cell adhesion, and the cell cortex during mitosis. It will be key to characterise the interactome of S100A11 during mitosis to provide important mechanistic insights into this new role of S100A11; it is our intention to investigate this in the future.

      To address the points raised by the Reviewer, we have changed and clarified the statements they highlighted above, in the revised manuscript (pages 10 and 11).

      *- An earlier paper from the same lab this year identified Annexin A1 as directing mitotic spindle orientation via localising LGN at lateral cortex. During this earlier paper they also identified S100A11, which is a partner for Annexin A1. The authors could more clearly explain what S100A11 is in the current manuscript and how the current study builds on this earlier study. *

      __Authors response: __We thank the Reviewer for highlighting our previous work characterising the interactome of LGN in mitotic mammary epithelial cells (Fankhaenel et al., 2023 Nat Comms), and identifying Annexin A1 (ANXA1) as a polarity cue regulating the localisation and function of the evolutionarily conserved mitotic spindle orientation LGN complex. We also showed that ANXA1 direct partner S100A11 co-purifies with LGN and that perturbation of the ANXA1-S100A11 complex impairs the localisation of the LGN complex at the cell cortex during mitosis. Thus, as rightly pointed out by the Reviewer, this work builds on our previous work discussed above, but also on previous studies establishing S100A11 as a key regulator of plasma membrane repair by regulating the dynamic interplay between F-actin and the plasma membrane (Jaiswal et al., 2014 Nat Commun), and studies showing that S100A11 interacts with E-cadherin at adherens junctions (Guo et al., 2014 Sci Signal). To address the Reviewer’s point (also raised by Reviewer 1), we have now included a paragraph in the introduction (page 5) and results (page 10) of the revised manuscript describing these and other functions of S100A11 to provide a strong rational to our decision to investigate this protein.


      *- Based on the data presented, I suggest that the authors should requalify their data. I suggest that the conclusions that can be drawn from the data are that cellular state is important for regulating mitosis orientation and fidelity (i.e. adherent epithelia cells vs. less adherent more migratory cells). S100A11 is important for promoting cell-cell adhesions and might be upstream of the known role of E-cadherin in regulating spindle orientation. Whilst I suggest that more quantified experiments would need to be included in order to assess possible effects on plasma membrane remodelling, the manuscript could be generally improved by a clearer explanation of the open question that they are addressing and what specific advance this manuscript has made in relation to the current literature, including their own. I do not currently feel that the title of the manuscript is appropriate since I don't think that a crosstalk between the plasma membrane and cell-cell adhesion has been shown here. *

      __Authors response: __We would like to reiterate our agreement with the Reviewer’s suggestion about the conclusions drawn from our data. In the initial submission we proposed that perturbation of S100A11-mediated regulation of cell adhesion and plasma membrane impairs the identity of mammary epithelial cells, which affects their shape during mitosis leading to aberrant mitotic progression and outcome. While we have not checked the migratory behaviour of cells not forming cell-cell adhesions, we suggested that the cells adopted a mesenchymal phenotype. Furthermore, we discussed the implication of epithelial-to-mesenchymal transition on chromosome segregation fidelity and execution of mitosis, and how precisely they link with our study (see initial submission’s pages 14 and 19). As suggested by the Reviewer, we have now clarified further these observations in the results (pages 7 and 11) and discussion (pages 15 and 19) of the revised manuscript.

      We have quantified several aspects of the changes in plasma membrane dynamics and remodelling throughout, in the initial manuscript (Figure 1D-H; Figure 4C). To address the Reviewer’s point, we have now added quantifications of membrane blebbing (new Figure 1I).

      We would like to emphasise that the introduction of the initial manuscript has included the open questions that led to this study. These questions have been addressed further in the discussion, where we have also formulated new hypotheses and discussed what we think are the important outstanding questions for future investigations, in light of our findings. In this study we demonstrate that maintaining epithelial identity is essential for correct execution of polarised cell divisions. Our findings indicate that mammary epithelial cells grown at sub-optimal density lose their epithelial identity, which results in several mitotic defects. We propose a novel mechanism in which S100A11 acts as a molecular sensor of external cues coordinating the interplay between plasma membrane dynamics and cell-cell adhesion to maintain epithelial identity and integrity, thereby ensuring correct progression, orientation, and outcome of cell division. As suggested by the Reviewer, we have clarified further the advances made in this study, in the revised Results and Discussion sections.

      To address the Reviewer’s final point, we would like to suggest the following revised title “Interplay between the plasma membrane and cell-cell adhesion maintains epithelial identity for correct polarised cell divisions”, which we hope reflects better the results described in our studies.

      *Minor comments: important issues that can confidently be addressed. *

      - P3: I wouldn't describe the junctional proteins listed as polarity proteins.

      __Authors response: __We have now made this rectification in page 3, as suggested by the Reviewer.

      *- Figure 1 - can the membrane blebbing phenotype by quantified? At the moment this part is observational so can't really be used to determine the role of plasma membrane remodelling. *

      • *

      __Authors response: __We have now included quantifications of blebbing in the revised manuscript, as suggested by the Reviewer (new Figure 1I).

      *- Figure 3. I'm not sure what the 'subcortical actin cloud' measurement is. Figure 3G suggests it may be the distance from the cortex to the spindle pole but how does this relate to actin? *

      __Authors response: __The Reviewer is right, the subcortical actin cloud includes a pool of dynamic subcortical actin that extends from the cortex (excluding the stiff cortical actin) to the cytoplasm, interacting with the centrosomes and concentrating near the retraction fibres. The subcortical actin cloud has been shown to mediate cortical forces and to concentrate force-generating proteins at the retraction fibres acting on centrosome dynamics and pulling on astral microtubules to orient the mitotic spindle (for example, please see Kwon et al., 2015 Dev Cell). We have now included this clarification in the revised manuscript (page 10).

      *- Figure 4A. I can't see GFP-S100A11 accumulating at the cell surface. To me these images suggest that it is relatively ubiquitously expressed throughout the cytoplasm and surface, which is different to the later antibody stains, that show localisation at the cell surface. *

      __Authors response: __A similar point has been raised by Reviewer 1. Although our retroviral-mediated transduction allows to avoid excessive expression of GFP-S100A11, the ectopic S100A11 is expressed at higher levels as compared to its endogenous counterpart. Our live images show an accumulation of the protein at the cell surface, but relatively high levels are also visible in the cytoplasm (previous Figure 4A, new Figure S4A). By contrast immunolabelling for endogenous S100A11 shows an obvious accumulation of the protein at the plasma membrane. This difference could also be due to a dynamic behaviour of the protein translocation of GFP-S100A11 between the cell surface and cytoplasm that is captured in our live imaging. Similar slight differences between immunofluorescence and live imaging of cortical proteins involved in mitosis, such as Dynein, NuMA, LGN and CAPZB, have been reported in several studies (to cite a few: di Pietro et al., 2017 Curr Biol; Elias et al., 2014 Stem Cell Rep; Fankhaenel et al., 2023). To address this point, we have now moved the panel showing S100A11 immunofluorescence in Figure S3A to new Figure 4A (also see response to Reviewer 1 Major Point 1).

      *- Fig 4H doesn't show an active process of translocation of E-Cadherin to the cytoplasm. It shows representative images with slightly higher levels of E-Cadherin in the cytoplasm. This could be due to translocation or it could be to do with lack of E-Cadherin assembly. *

      • *

      __Authors response: __We thank the Reviewer for pointing this out. We have rectified this statement accordingly (page 11).

      *- 4I I don't understand where the line profile is derived from - where is apical and where is basal in the images? Could a diagram be included? *

      __Authors response: __We have now included an illustration of this quantification, in the revised manuscript (new Figure 4J).

      - The discussion could be shortened and more clearly written - perhaps with subheadings of the main findings.

      __Authors response: __We have clarified several ideas and statements, based on the specific points addressed above. While it is challenging to reduce the size of this section, given that the study addresses several mechanisms of mitosis, we have now shortened the discussion in the revised manuscript.

      *- Methods: Why is cholera toxin used in the cell culture medium? *

      • *

      __Authors response: __Cholera toxin is a key component of MCF-10A medium, which has been shown to stimulate cAMP activation promoting cell proliferation in culture. This culture protocol is a gold standard in the field (Debnath et al., 2023 Methods). Given that cholera toxin is a highly regulated chemical and takes several months to purchase, we have tried culturing MCF-10A without the toxin, but this negatively affected proliferation and passage of this cells. Therefore, we concluded that adding it to the culture medium is important.

      __Reviewer #3 (Significance (Required)): __

      *In general, this is an interesting paper about the fidelity of mitosis in cells in adherent monolayers vs. in more migratory, non-adherent states. There is existing literature on this topic (some cited in the manuscript, alongside reviews of the topic). *

      • *

      *The main conceptual advance, as far as I can see, is that S100A11 is important for promoting cell-cell adhesions and might be upstream of the known role of E-cadherin in regulating spindle orientation via LGN. The main limitation is that plating cells at different densities is not a direct 'perturbation' of cell-cell adhesion. This means that the phenotypes seen could be due to many factors, not just cell adhesion. Assessment of plasma membrane and cytoskeletal dynamics are also often observational and not conclusive. *

      • *

      *The manuscript would be of interest to basic researchers working on epithelial development. Also potentially to basic researchers working on cancer, due to the mitotic errors described. *

      • *

      *I have expertise in epithelial cell biology. *

      I estimate the authors would need between 3 and 6 months for revisions if they decide to do further experiments and between 1 and 3 months if they decide to re-qualify their claims.

      • *

      __Authors response: __We thanks the Reviewer for their overall positive feedback on our work and its broader importance for researchers in epithelial development and cancer biology.

      We would like to reiterate our agreement with the Reviewer’s assessment of the conceptual advances of our work. We show that S100A11 complexes with E-cadherin and LGN during mitosis to control cell-cell adhesion assembly and the mitotic spindle machinery, respectively, which in turn ensures faithful chromosome segregation. Our results also suggest that S100A11 lies upstream of E-cadherin in the regulation of the LGN-mediated mitotic spindle machinery. We also agree with the Reviewer that plating epithelial cells at low density does not directly affect cell-cell adhesion, because, in these culture conditions, cells are not dense enough to establish cell-cell contacts necessary to assemble stable adherens junctions. Rather, and as rightly pointed out by the Reviewer, cells grown at low density fail to maintain their epithelial identity and adopt a more mesenchymal and elongated behaviour, which is accompanied by dramatic changes in plasma membrane remodelling throughout mitosis. Interestingly, our results show that both S100A11 and E-cadherin do not localise at the plasma membrane in these sub-optimal culture conditions. This along with our results showing that depletion of S100A11 phenocopies the effect of low-density culture conditions on plasma membrane remodelling and E-cadherin mediated cell-cell adhesion assembly, allow us to propose a mechanism whereby the membrane-associated S100A11 protein acts as a molecular sensor of external cues bridging plasma membrane remodelling to E-cadherin-dependent cell adhesion to coordinate correct progression and outcome of mammary epithelial cell divisions.

      We are grateful for the Reviewer’s insightful discussion of our findings. As we discussed above in our responses to their specific points, we have requalified many of our statements to clarify further our main findings and conclusions throughout the revised manuscript. We have also added new quantifications in response to the Reviewer’s suggestions. We believe, that together, these revisions have advanced further the initial manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: your understanding of the study and its conclusions.

      The scope of the study is to understand the links between cell-cell adhesion integrity, plasma membrane dynamics and mitotic spindle in mammalian epithelial tissues. To test this, the authors cultured mammary epithelial cells at optimal or low density as a way of perturbing cell-cell adhesion. The authors conclude that perturbing cell-cell adhesion alters plasma membrane dynamics, causing mitotic defects and that S100A11 coordinates this link via E-cadherin. Whilst this is an interesting manuscript, illustrating the differences of mitotic success in optimal density vs. low density cell cultures, I do not think that the conclusions are supported by the evidence presented for the reasons stated below.

      Major comments: major issues affecting the conclusions.

      The manuscript clearly shows that culturing cells at a lower density results in a higher incidence of asymmetric division (figure 1) and mitosis defects (figure 2). Cells round more and faster and there is more actin at the cortex during rounding (figure 3). However, whilst differences in cell-cell adhesion are likely to play a role in mediating these effects, I don't think that it is possible to claim from the data presented that these defects are specifically due to cell-cell adhesion differences. This is because the morphology of cells at low density is also very different - cells appear more mesenchymal, with migratory front-rear polarity instead of apical-basal polarity. These cells will therefore have many differences between them, cell-adhesion being just one. The data is also not showing a 'loss' of cell-cell adhesion integrity but are rather illustrating the differences between cells that have formed cell-cell adhesions and those that have not. To really test the specific role of cell-cell adhesions, the authors would need to inhibit adhesions directly but without altering the cell density - for example via chemical or genetic perturbation within a confined environment. I suggest that the authors either need to do these experiments or to requalify what their data is telling us. The current manuscript also demonstrates that cell adhesion is affected when S100A11 is knocked down (figure 4). It shows binding between and colocalization of S100A11 and E-cadherin, and shows that LGN cortical distribution is affected when S100A11 is knocked down (Figure 5). The results presented are suggestive of S100A11 being upstream of E-cadherin. However, I don't understand how the data shows "crosstalk between the plasma membrane, cell-cell adhesion, and the cell cortex during mitosis". For example, on P9: "We observed unequal distribution of CellMaskTM in a vast majority of S100A11-depleted cells (si-S100A11#1: ~79% versus si-Control: ~26%), indicating defects in plasma membrane remodelling (Figures 4B and 4C)." I don't agree that this demonstrates a defect in PM remodelling. Rather the cells in the representative images are less adherent and have adopted a more migratory cell state similar to that seen in figure 1 when seeded at low density. The fluidity of the much larger cells shown in knock down cells in panel F also appears higher, again suggesting an adhesion defect. An earlier paper from the same lab this year identified Annexin A1 as directing mitotic spindle orientation via localising LGN at lateral cortex. During this earlier paper they also identified S100A11, which is a partner for Annexin A1. The authors could more clearly explain what S100A11 is in the current manuscript and how the current study builds on this earlier study.

      Based on the data presented, I suggest that the authors should requalify their data. I suggest that the conclusions that can be drawn from the data are that cellular state is important for regulating mitosis orientation and fidelity (i.e. adherent epithelia cells vs. less adherent more migratory cells). S100A11 is important for promoting cell-cell adhesions and might be upstream of the known role of E-cadherin in regulating spindle orientation. Whilst I suggest that more quantified experiments would need to be included in order to assess possible effects on plasma membrane remodelling, the manuscript could be generally improved by a clearer explanation of the open question that they are addressing and what specific advance this manuscript has made in relation to the current literature, including their own. I do not currently feel that the title of the manuscript is appropriate since I don't think that a crosstalk between the plasma membrane and cell-cell adhesion has been shown here.

      Minor comments: important issues that can confidently be addressed.

      P3: I wouldn't describe the junctional proteins listed as polarity proteins. Figure 1 - can the membrane blebbing phenotype by quantified? At the moment this part is observational so can't really be used to determine the role of plasma membrane remodelling.

      Figure 3. I'm not sure what the 'subcortical actin cloud' measurement is. Figure 3G suggests it may be the distance from the cortex to the spindle pole but how does this relate to actin?

      Figure 4A. I can't see GFP-S100A11 accumulating at the cell surface. To me these images suggest that it is relatively ubiquitously expressed throughout the cytoplasm and surface, which is different to the later antibody stains, that show localisation at the cell surface.

      Fig 4H doesn't show an active process of translocation of E-Cadherin to the cytoplasm. It shows representative images with slightly higher levels of E-Cadherin in the cytoplasm. This could be due to translocation or it could be to do with lack of E-Cadherin assembly.

      4I I don't understand where the line profile is derived from - where is apical and where is basal in the images? Could a diagram be included?

      The discussion could be shortened and more clearly written - perhaps with subheadings of the main findings.

      Methods: Why is cholera toxin used in the cell culture medium?

      Significance

      In general, this is an interesting paper about the fidelity of mitosis in cells in adherent monolayers vs. in more migratory, non-adherent states. There is existing literature on this topic (some cited in the manuscript, alongside reviews of the topic).

      The main conceptual advance, as far as I can see, is that S100A11 is important for promoting cell-cell adhesions and might be upstream of the known role of E-cadherin in regulating spindle orientation via LGN. The main limitation is that plating cells at different densities is not a direct 'perturbation' of cell-cell adhesion. This means that the phenotypes seen could be due to many factors, not just cell adhesion. Assessment of plasma membrane and cytoskeletal dynamics are also often observational and not conclusive.

      The manuscript would be of interest to basic researchers working on epithelial development. Also potentially to basic researchers working on cancer, due to the mitotic errors described.

      I have expertise in epithelial cell biology.

      I estimate the authors would need between 3 and 6 months for revisions if they decide to do further experiments and between 1 and 3 months if they decide to re-qualify their claims.

    1. hypothesis is kind of easy to agree on after a couple deductive guesses so you 01:23:21 guys want to go through it and see if you're a simulation hypothesis that's what Elon Musk is all right first question to silently answer these do you think it's probable that our 01:23:36 descendants will have computational power that is vast compared to ours today presume the answer is probably [Music] 01:23:48 okay next question will that vast ability to simulate worlds result in any of them doing two or more High Fidelity or hyper realistic ancestor or origin 01:24:02 simulations that include fully realistic physics presume the answer is sure it's probably true that at least two out of countless trillions of our 01:24:15 descendants spread across every imaginable region of time and space will use their Advanced abilities to do origin simulations deducted conclusion in Elon musk's words 01:24:28 we're probably living in a simulation in my words it is more probable than not that we are in one of the simulated realities versus being so lucky we happen to be in the one real reality
      • for self-simulation hypothesis
      • comment
        • I agreed with a lot of what he said up to now. In fact, he does a rather good presentation summarizing the contemporary problems we face and emphasizing the acceleration of change in all human spheres, giving rise to our current polycrisis
        • I agree that the mythos of materialism needs to be seriously explored and other perspectives may give us new salient insights, but I don't think it's so obvious that the theory that we are living in a simulation.
          • and quantum gravity theory a highly abstract cultural artefact being used to prove that
        • is going to be the panacea to create a compelling new mythos..
        • If technology alone is insufficient as he earlier claimed, then quantum gravity theory, as part of the entangled STEMS nexus is part of that techno-complex that is insufficient.
        • This claim will have to be proven true by strong and compelling evidence that receives mass acceptance. Without that, it becomes an unjustified claim and the complexity of it will elude most people.
    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This MS contains carefully carried out and well controlled experiments describing a new pFFAT in ELYS. There is a similarly convincing demonstration of functionally relevant colocalisation by proximity ligation assay (PLA), particularly that both ELYS and VAP are nuclear envelope proteins in interphase without interacting (neg control in Fig 4D).

      Major Issue: Functional significance

      A key conclusion is that experiments prove that "ELYS serves as the crucial initiation factor for post-mitotic NPC-assembly" (p5). However, evidence for this is lacking as this would require reconstitution of NPC assembly with a mutant form of ELYS carefully changing the FFAT motif (e.g. 1321A 1324E) and exclusion of other probable VAP targets in experiments with mutant VAP. VAPs are among the proteins with the highest number of documented interactors (see Huttlin 2015/7 etc, e.g. PMID 26186194), so knocking down VAP may have pleiotropic effects and quite indirect read-outs in many aspects of cell function. In addition, for this work specifically there are other NE proteins that are known interactors of VAP: Emerin (EMD) and LBR both interact with VAP (high-throughput data, VAPA and VAPB). EMD has a motif similar to the canonical phospho-FFAT: 98 SYFTTRT 104. LBR has no motif. These findings should not be overlooked in this work. For example, was the interaction with emerin (page 4) sensitive to mutating VAP or ELYS? Could the effect seen in Figure 5 result from interactions with proteins other them ELYS?

      Further experiments should be carried out to justify all statements in the current MS of functional significance. Instead of doing more experiments, an alternative for the authors would be to describe the current set of results more cautiously. However, that would require changing much of the impact of the current MS, from the title onwards.

      Moderate Issue: VAPA

      From the start of the Introduction and some elements of the Discussion, include VAPA in equal measure with VAPB. When describing interactions of ELYS with VAP note that Huttlin et al., reported interactions twice for each of VAPA and VAPB. When describing own results (James et al. 2019) and those of others (Saiz-Ros et al., 2019) that focused on VAPB, clarify if the authors' view is that VAPA would (or would not) have the same interaction.

      Is there any evidence that only VAPB is on NE? Note that some refs in the Introduction relate to VAPA: Mesmin (not VAPB); ACBD5: although article titles refer to VAPB, early work (10.1083/jcb.201607055) showed almost identical involvement of VAPA. Also, this redundancy likely explains "function of VAPB in mitosis is not essential," (in Discussion). The lack of effect of VAPA knock-down may indicate that in these cells VAPB is dominant, but does not exclude a role for VAPA when VAPB is reduced. That might be tested by depleting both. Even following that, there is MOSPD2 to consider

      Other aspects of the writing

      "two amino acid residues are crucial for the interaction (VAPB K87 and M89)." This is wrong. Many residues are critical, these are merely 2 of possibly >10 that were chosen by Kaiser et al (2005) to create their non-binder.. Others have used different mutations to block FFAT binding.

      "They may exhibit a certain binding preference to specific members of the VAP ... family...". I cannot think of any example. I note no citation is given.

      When listing many or all MSP proteins, the text should state that MOSPD2 is uniquely close to VAPA/B. CFAP65 is typically not mentioned in the VAP-like lists as it does not have any of the conserved sequence that binds FFAT. If however the authors wish to include all human MSP domain protein, they should also include Hydin.

      Slightly wrong to cite De Vos et al., 2012 about PTPIP51's FFAT as that paper makes no mention of the motif. Better pick Di Mattia (again)

      On VAPB (and also A) on INM: there are references to be cited esp. relating to intranuclear Scs2 in yeast (Brickner et al 2004, Ptak et al 2021)

      Citations for VAP at ER-mito contacts "De Vos et al., 2012; Gómez-Suaga et al., 2019; Stoica et al., 2014)". These all refer to the same bridging protein, PTPIP51. Reduce to one citation. Then mention other proteins at the same site VPS13A, mitoguardin(MIGA)-2 ...

      "The domain interacts with characteristic peptide sequences ..." add citation to this sentence

      "Several variants of such motifs have been described: (i)" ... "(ii)": (i) and (ii) are entirely unlinked. Delete these and also "Several variants of such motifs have been described." Which is repeated later

      "FFAT-like motifs come in different flavors and may even lack the two phenylalanine residues (Murphy and Levine, 2016)": while motifs can tolerate variation at both positions, this text is misleading as it implies much more variation than is known. The 1st F can only be conservatively substituted (Y).

      Minor aspects in Results:

      ORP1L peptide as positive control: cite Kaiser 2005

      Was phosphoproteomics done in such a way as to find peptides that have both S1314 and S1326?

      Figure 4D, row 2: Comment on intranuclear staining in Prophase (at approx 4 o'clock) of both ELYS & VAP that is PLA positive

      Referees cross-commenting

      I agree with this point from Reviewer #1. We all agree that the main issue can be resolved experimentally to determine the effect of subtle point mutations in ELYS. Both other reviewers have done a good job in finding issues with the experiments that can also be addressed.

      Significance

      This work documents an interaction between the protein ELYS, that is involved in the reformation of nuclear pore complexes after mitosis, and the ER membrane protein VAPB. The interactions was previously known through high-throughput studies, along with many 100's of others for VAP, but here it is studied in detail and with care, identifying how the motif is induced by phosphorylation of ELYS. The two proteins are co-localised using convincing proximity ligation assays. This biochemistry and cell biological localisation is well done.

      Functional experiments then show that VAP (in this case VAPB) knock-down affects mitosis and chromosome segregation. While the result is incontrovertible, it has many possible interpretations, mainly because VAP has hundreds of interactions, including with multiple proteins involved in mitosis beyond just ELYS. This means that there are major limitations on how the interaction and co-localisation should be interpreted, reducing the advance associated with the current manuscript to incremental, and the limiting the audience to specialized.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      The VAP proteins are well established as tail anchored proteins of the ER membrane. VAPs mediates co-operation between the ER and other organelles by creating a transient molecular tether with binding partners on opposing organelles to form a membrane contact site over which lipids and metabolites are exchanged. Proteins which bind VAPs generally contain a short FFAT motif, of varying sequence which binds the MSP domain of VAP. More recently the FFAT motif has been more extensively analysed in multiple different proteins and differential phosphorylation of the FFAT motif has been shown to either enhance or block VAP binding depending on the position of the phosphosite.

      Recent work conducted by the authors demonstrated that a small population of VAPB is not exclusively localised to the ER and can also reach the inner nuclear membrane. They also identified ELYS as a potential interaction partner of VAPB in a screening approach. ELYS is a nucleoporin that can be found at the nuclear side of the nuclear envelope where it forms part of nuclear pore complexes. During mitosis, ELYS serves as an assembly platform that bridges an interaction between decondensing chromosomes and recruited nucleoporin subcomplexes to generate new nuclear pore complexes for post-mitotic daughter cells. In this manuscript, James et al seek to explore this enigmatic potential interaction between ELYS and VAPB to address why VAPB may be found at the inner nuclear membrane.

      Peptide binding assays and some co-immunoprecipitation experiments are used to demonstrate that interactions occur via the MSP-domain of VAPB and FFAT-like motifs within ELYS. In addition, it is demonstrated that, for the ELYS FFAT peptides, the interaction is dependent on the phosphorylation status of serine residues of a particular FFAT-motif that can either promote or reduce its affinity to VAPB. Of most relevance is a serine in the acidic tract (1314) which, when phosphorylated increases VAPB binding. This is completely in line with what is already known about the FFAT motif and so is not surprising, in particular when using a peptide in an in vitro assay.

      The authors then utilise cell synchronisation techniques to provide evidence that both phosphorylation of ELYS and its binding to VAPB are heightened during mitosis. Immunofluorescence and proximity ligation assays are used to demonstrate that the proteins co-localise specifically during anaphase and at the non-core regions of segregating chromosomes.

      The manuscript is concluded by investigating the effect of VAPB depletion on mitosis with some evidence to suggest that transition from meta-anaphase is delayed and defects such as lagging chromosomes are observed.

      Major comments

      Overall, this manuscript is well written and the data presented in Figures 1-3 convincingly show the nature of the interaction between ELYS and VAPB. Clearly the proteins interact via FFAT motifs and this interaction appears to be enhanced during mitosis. However, the work as is, relies heavily on peptide binding assays and would benefit from additional experiments to further support the results. The authors need to more clearly show that this specific phosphorylation happens during mitosis, they may have this data but it is not clearly explained. In addition, the data that VAPB-ELYS interaction contributes to temporal progression of mitosis (as per the title) is not sufficiently clear. VAPB silencing appears to have some impact on mitosis but this is not the same thing. So this section needs to be strengthened before this statement can be made.

      The authors claim that the study "suggests an active role of VAPB in recruiting membrane fragments to chromatin and in the biogenesis of a novel nuclear envelope during mitosis". Given the data presented in Figures 4 and 5, this appears to be rather speculative with little evidence to support it, so data should be provided or this statement toned down. Currently, without additional supporting data the authors may wish to revise the overarching conclusions of the study and change the title.

      Specific points.

      Peptide pull down assays clearly show which FFAT-like motifs are important in facilitating binding. The co-immunoprecipitation systems used in Figure 2 also provide useful information on the interaction in a cell context. The authors should combine these findings by introducing full length ELYS mutants with altered FFAT-like motifs into their stably expressing GFP-VAPB HeLa cell line and then performing Co-IPs to help identify which FFAT motif/s drive the mitotic interaction. Other mutants of ELYS harbouring either phosphomimetic or phospho-resistant residues may also be introduced to further investigate mechanisms of the molecular switch in a cellular environment to support the work currently done with peptides alone. This is an obvious gap in the work which, based on the other data the authors have shown, should presumably be straightforward and would also lead directly into the next major point.

      • Whilst silencing VAPB does appear to delay mitosis, no reference is made to ELYS throughout Figure 5 nor as part of its associated discussion. Given that VAPB has more than 250 proposed binding partners, the observed aberration of mitotic progression could result from a huge number of indirect processes. Further work is needed to link the experiment specifically to the VAPB-ELYS interaction and not just loss of VAPB. We would suggest generating a complementation system where ELYS is either knocked out or silenced and then wild-type ELYS and an ELYS FFAT mutant (which cannot interact with VAPB),and/or a phospho mutant (whose interaction cannot be regulated during mitosis) are introduced. Then the observed effects can be better attributed to the VAPB-ELYS interaction and not just loss of VAPB.
      • The immunofluorescence and PLA results in Figure 4 could be strengthened by including other ER markers. This would show that co-localisation of ELYS at the non-core region is specific to VAPB protein, not any ER protein or rather than an artefact of the ER being pushed out of the organelle exclusion zone during mitosis and therefore 'bunching' at the periphery of the nuclear envelope. It would be worthwhile repeating these experiments with candidates such as VAPA, other ER membrane proteins or at least GFP-KDEL, to make this phenomenon more convincing. As part of this the authors should ideally generate a complemented ELYS KO (see point above) to avoid the residual activity attributed to endogenous background in the PLA Figure 4E.
      • Authors should clarify if the phosphorylation events (in particular S1314) only occur or are increased during mitosis. This may be data they have from the MS experiment in Figure 3 or it could also be shown using a phospho-antibody (although this can be challenging if a suitable antibody cannot be made).
      • The authors should clarify why they need to do these semi in-vitro assays with purified GST-VAPB-MSP on beads and then lysates added and not just a standard co-IP. If this is simply signal intensity due to a very small proportion of VAPB binding to ELYS then this is fine but this should be stated and it should be made clear that ELYS is not a major binding partner - most of VAPB is on the ER. Otherwise, this is misleading.

      I estimate that the suggested alterations above would incur approximately 3-6 months of additional experimental work, depending on if KO cell lines were required.

      Minor comments

      • To show that the observed interactions and potential role of VAPB-ELYS interaction is universal it would be useful to have at least a subset of experiments also shown in another cell line or system - this is now also a requirement for some journals.
      • Consider re-wording the title of the manuscript to better reflect the data presented within the study. Alternatively, provide further evidence that VAPB-ELYS interactions directly affect temporal progression of mitosis to validate this claim, as discussed above.
      • Quantification of blots in Figure 2A could allow measurement of relative binding affinities between VAPB-ELYS throughout the cell cycle. The same could be applied to the effect of phosphorylation on binding affinity in Figure 2D.
      • The cells used are never clearly mentioned in the text - I assume this is always in HeLa but this should be added in all cases for clarity
      • Page 8: "As shown in Fig. 2A,a large proportion of GFP-VAPB was precipitated under our experimental conditions." - I don't understand how this is shown in this figure as the non-bound fraction is not shown?
      • Please provide some controls to demonstrate the extent to which the samples used are asyn, G1/M or M.
      • Page 9 - why are Phos-tag gels not shown as this would make this result more convincing?
      • Figure 3A - I find the SDS-PAGE gel confusing. Why not show the whole gel and why is the band size apparently reduced in the mitotic fraction when previously it was increased (by phosphorylation)? It would also be useful to see if there were any other band shifts.
      • "FFAT-2 of ELYS is regulated by phosphorylation" The way you have setup the experiment leads the reader to think you are going to show which sites are differentially phosphorylated in mitosis, but then this is not the case - so there seems no purpose to doing the experiment this way. If you used TMT MS approach you would be able to potentially quantify the change in phosphorylation at the FFAT motif sites in mitosis. Otherwise what is the purpose of using these 2 samples, mitotic and AS?
      • For all of the antibodies used, in particular for the PLA, please provide evidence of validation of the antibodies.
      • Just a minor point to consider - In the methods for your lysis buffer you use 400mM NaCl - might this slightly reduce the VAPB-FFAT interaction? Worth considering reducing this?
      • "The rather small difference observed between the wild-type and the mutant protein observed in this experiment probably results from the presence of endogenous VAPB in the stable cell lines, which could form dimers with the exogeneous HA-tagged versions." If this is the case then please demonstrate that this is happening, or use the KO approach in the major points above.
      • "we now show that the proteins can indeed interact with each other, without the need for additional bridging factors (Figs. 1 and 3)." You show that the peptides can bind - but this is not the same thing as the peptide in the full context of the protein - so this should be toned down or removed.
      • "Remarkably, this region is highly conserved between species, suggesting that it is important for protein functions (data not shown)". Please show the alignments so the reader can judge for themselves. It is conserved in ALL species and the phosphosites are also conserved??
      • "In our experiments, knockdown of VAPA alone did not lead to a delay in mitosis (data not shown). " Why not show this data - as this is a very interesting and potentially important observation? Also add the validation of knockdown of VAPA.
      • I find the end to the discussion to the paper rather abrupt. It would be interesting to discuss further how VAPB, but not apparently VAPA reaches the INM and if so why this function is required of an ER adaptor and not another more obvious adaptor protein. In short - why would VAPB be performing this role?

      Referees cross-commenting

      I agree with the comments of the other reviewers, and they are very much in line with my own review. We all seem convinced that VAPB binds ELYS via a pFFAT, and that this interaction is enhanced during mitosois. However the role of this interaction in mitotic progression remains unclear and based on this data should not be claimed in the title or discussion of the paper.

      Significance

      Overall, if the manuscript could be improved with the suggested changes, then this could be a considerable conceptual advance in how we understand the VAP proteins, showing functions beyond those as an ER adaptor. This would be significant for the field.

      In the context of the existing literature the work does not advance our knowledge of FFAT-VAP interactions, this has already been shown, but it would give a nice example of how this can be regulated during mitosis and how VAP can contribute beyond just as an ER adaptor at membrane contact sites.

      There would be a wide audience in the cell biology field and more widely as mutations in VAPB cause a form of ALS, and many people are working in this area.

      My field of expertise is in organelle cell biology and membrane contact sites.

    1. Author Response

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VUspecific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: 10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an ACcommitted cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could covisualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; MedwigKinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 368-385). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

      Reviewer #1 (Recommendations For The Authors):

      I am not a C. elegans researcher and I find this paper fairly hard to follow. One major recommendation I would like to see is to improve the consistency of the labeling of the figures. There are many figures showing many things and I struggled to keep track of everything. For example, the thing that we are looking at in the microscope images (typically GFP tagged to a protein of interest) is sometimes labeled above the image, sometimes to the side, and sometimes within the panel. Experimental conditions are also formatted arbitrarily. As much as they can do so, could the authors try and make their labeling consistent? This would help me follow the data.

      Response 1.2: We thank the reviewer for this suggestion and have reorganized the figures (namely Figure 3, Figure 4, Figure 4–figure supplement 1, Figure 5, and Figure 6) such that the tagged allele or marker is labeled at the top, and the time, stage, and/or perturbation is labeled on the side.

      Is the yeast one-hybrid assay enough to confirm a direct interaction between HLH-2 and NHR-67? Obviously, it supports it, but since this is not a definitive test in C. elegans, I feel the description of this result should be modified to account for this.

      Response 1.3: We agree that the yeast one-hybrid assay identifies sequences that are capable of being bound to a protein and does not prove that a DNA-protein interaction occurs in vivo. We have modified our language describing this result in our resubmission (lines 222-224).

      NHR-67 and POP-1 eventually form two large spots. This observation supports the claims that these are condensates, but it is clearly different from the observations in Ciona where the condensates remain more or less stable until they quickly dissolve at the onset of mitosis. Do the authors have any idea why these condensates are behaving this way? Is it always two spots? This implies it is forming around some sort of diploid nuclear structure.

      Response 1.4: Hes.a puncta observed in Ciona were indeed shown to be dynamic, as puncta were captured fusing together (see Figure 6B of Treen et al., 2021). However, these puncta did not appear to coalesce into two puncta specifically, as is consistently observed with NHR-67 in C. elegans. We agree with the reviewer in that this observation is very interesting and likely correlates to a diploid nuclear structure, however we have yet to identify this.

      In Ciona, for the two examples of repressive condensates, it was shown that the removal of the C-terminal Groucho recruiting repressor domains of HesA end ERF disrupts condensate formation. Have the authors attempted a similar experiment for NHR-67 or Pop1?

      Response 1.5: We agree that this would have been an ideal experiment to perform. We attempted to remove the intrinsically disordered region found at the C-terminus of NHR-67 with CRISPR, but were unable to generate a stable line, suggesting that this region may be critical for NHR-67 function in other developmental stages or tissues.

      Other minor points:

      Fig 4D - I found the labeling of this figure the most confusing.

      Response 1.6: We thank the reviewer for bringing this to our attention. For this panel, in addition to the changes we made reference above (Response 1.2), we simplified the labeling of the TIR1 transgene and instead reference it in the figure legend for simplicity.

      Line 354 - I think this is mislabeled. Is it supposed to be Figure 5H, not 5F, and 5B, not 5C?

      Response 1.7: We thank the reviewer for spotting this error. This reference to Figure 5F has been updated and now correctly references Figure 5H (line 338).

      Reviewer #2 (Recommendations For The Authors):

      The authors use several methods to overexpress NHR-67 including 1) an NHR-67 transgene (Fig. 1), 2) overexpression of the transcriptional activator HLH-2 or 3) removal of a factor that normally degrades HLH-2 in VU cells (Fig. 2). In all cases, the rate of VU AC transformation is either very low (5%) or not reported but presumed to be zero, since other groups have done similar experiments and reported no such conversion (eg. Benavidez et al., 2022). What is the significance of this finding? Does this mean that high levels of NHR-67 are not sufficient to promote AC fate because NHR-67 is sequestered in puncta when expressed in VU cells? Fig. 2A suggests that NHR-67 is in puncta in all VUs where overexpressed. Would the inactivation of GROUCHO in that background result in extra ACs?

      Response 2.10: Indeed, we would expect that overexpression of NHR-67 may not normally be sufficient to induce cell fate transformation if the Groucho/TCF complex is still functional. Unfortunately we were unable to achieve strong depletion of UNC-37 and LSY-22 through RNAi, and thus relied on the auxin-inducible protein degradation system. Since we are limited by the number of fluorescent channels we can co-visualize, it would not be feasible to combine a heat-shock inducible transgene, a TIR1 transgene, an AID-tagged protein, and multiple cell fate markers.

      The data are often presented as numbers of animals with increased or decreased expression of a particular marker, but no quantification of expression is provided. For example, in Figure 1E, 32/35 animals are reported to exhibit ectopic expression of LIN-12 in the AC and reduced expression of LAG-2. What is the range of the increase/decrease in LIN-12/LAG-2 expression and how does this compare to natural variation in wild-type? The same concerns apply to Fig. 4D.

      Response 2.11: For resubmission, we have quantified the data shown in Figure 1E and now report expression levels of LIN-12::mNeonGreen and LAG-2::P2A::H2B::mTurquoise2 in Figure 1–figure supplement 2. We have also quantified the data in Figure 4D and now report expression levels of cdh-3p::mCherry::moeABD in Figure 4E. Quantification methods have been added to the Materials and Methods section (lines 612-617).

      The authors explain that it is difficult to study a repressive role for POP-1 as this protein functions in multiple developmental pathways and POP-1 depletion needs to be carefully timed for the data to be interpretable. The authors then go on to use RNAi to deplete POP-1 but do not describe in the methods how they achieve the needed precise temporal control.

      Response 2.12: We did indeed describe methods for the GFP-targeting nanobody, which we expressed under a uterinespecific promoter expressed after AC/VU specification. However, since the penetrance of phenotypes associated with this perturbation was low, we utilized RNA interference. We separated the cell fate specification and cell fate maintenance phenotypes by visualizing AC markers (Fig. 6A), which we would expect to be expressed at equal levels if ACs adopted their fate at the same time (via misspecification). We also paired these with a marker for VU cell fate and co-visualized them over time (Fig. 6B,C).

      The authors also do not report the efficiency of protein depletion by RNAi or Auxin treatment.

      Response 2.13: Auxin-induced depletion of mNeonGreen::AID::LSY-22 resulted in more than 90% decrease in expression (n > 75 uterine cells). The AID-tagged allele for UNC-37 was labeled with BFP, which was barely detectable by our imaging system and photobleached very quickly, so we did not quantify its depletion. However, considering that UNC37 and LSY-22 are both expressed fairly uniform and ubiquitously, and that LSY-22 is expressed at higher levels than UNC-37 at the L3 stage according to WormBase (31.9 FPKM vs. 23.5 FPKM), we would predict that its auxin-induced depletion would be just as potent if not moreso.

      Some of the work presented repeats previously published observations, and it is difficult at times to keep track of what is confirmatory and what is new. For example, this group already published on the enrichment of HLH-2 and NHR-67 in the AC, as well as the positive regulation of NHR-67 by HLH-2 (Medwig-Kinney et al 2020). Additionally, prior papers have already reported the interaction between HLH-2 and the nhr-67 locus.

      Response 2.14: The work presented in this manuscript does not repeat any previously published experiments. When we introduced the endogenously tagged NHR-67 and HLH-2 strains in previous work (Medwig-Kinney & Smith et al., 2020), we quantified expression of these proteins in the AC over time but did not compare expression between the AC and VU cells. Additionally, we previously showed that HLH-2 positively regulates NHR-67 in the AC (Medwig-Kinney & Smith et al., 2020), but never showed this is the case in the VU cells. Considering that this regulatory interaction was not observed in the AC/VU cell precursors, we believe that determining whether these proteins interact in the context of the VU cells was a valid question to address.

      Treen et al. 2021 are cited as prior evidence for the existence of "repressive condensates", however, that study does NOT experimentally demonstrate a function for these structures.

      Response 2.15: By “repressive condensates” we are referring to condensation of proteins known to be transcriptional repressors. While we agree that we were not able to demonstrate transcriptional repression of specific loci, our data showing that perturbation of the Groucho repressors UNC-37 and LSY-22 results in ectopic ACs is consistent with the hypothesis that these proteins repress the default AC fate. We have modified our title and text to more clearly distinguish our interpretations versus speculations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for considering the above manuscript for publication in eLife and for sending it for review. We would like to thank the editors and reviewers for taking the time to read our manuscript and for their expert comments. These comments have been helpful and have improved our manuscript. We would like to address the following comments:

      eLife assessment

      This valuable study advances our knowledge of the effects of anxiety/depression treatment on metacognition, demonstrating that treatment increases metacognitive confidence alongside improving symptoms. The authors provide convincing evidence for the state-dependency of metacognitive confidence, based on a large longitudinal treatment dataset. However, it is unclear to what extent this effect is truly specific to treatment, as there was some improvement in metacognitive confidence in the control group.

      Thank you for this assessment of the paper. As the change in confidence was not significant among the control group, the last sentence is not factually correct – could we suggest that it be amended to the following: “However, it is unclear to what extent this effect is truly specific to treatment, as changes in metacognitive bias in the iCBT group were not statistically different from those in the control group.”

      Reviewer #1 (Public Review)

      1) It has been shown previously that there are relationships between a transdiagnostic construct of anxious-depression (AD), and average confidence rating in a perceptual decision task. This study sought to investigate these results, which have been replicated several times but only in cross-sectional studies. This work applies a perceptual decision-making task with confidence ratings and a transdiagnostic psychometric questionnaire battery to participants before and after an iCBT course. The iCBT course reduced AD scores in participants, and their mean confidence ratings increased without a change in performance. Participants with larger AD changes had larger confidence changes. These results were also shown in a separate smaller group receiving antidepressant medication. A similar sized control group with no intervention did not show changes.

      The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review)

      The authors of this study investigated the relationship between (under)confidence and the anxious-depressive symptom dimension in a longitudinal intervention design. The aim was to determine whether confidence bias improves in a state-like manner when symptoms improve. The primary focus was on patients receiving internet-based CBT (iCBT; n=649), while secondary aims compared these changes to patients receiving antidepressants (n=82) and a control group (n=88).

      The results support the authors' conclusions, and the authors convincingly demonstrated a weak link between changes in confidence bias and anxious-depressive symptoms (not specific to the intervention arm)

      The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      We thank the author for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time * anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time * anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      2) This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      1) This study reports data collected across time and treatment modalities (internet CBT (iCBT), pharmacological intervention, and control), with a particularly large sample in the iCBT group. This study addresses the question of whether metacognitive confidence is related to mental health symptoms in a trait-like manner, or whether it shows state-dependency. The authors report an increase in metacognitive confidence as anxious-depression symptoms improve with iCBT (and the extent to which confidence increases is related to the magnitude of symptom improvement), a finding that is largely mirrored in those who receive antidepressants (without the correlation between symptom change and confidence change). I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (=-2.02, SE=0.44, p<0.001, r2=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      2) Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (=0.16, SE=0.02, p<0.001), change in anxious-depression (=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      3) I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      4) To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      5) Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The first line of the abstract refers to "metacognitive impairments", but the key result is a difference in the mean confidence rating - i.e. could be how participants are using the scale. It's not clear to me that lower mean confidence is necessarily an "impairment" (what's the "right" level of confidence 1-6 for a performance of 70% accuracy). The first line of discussion uses "metacognitive biases" which seems a more accurate description.

      We agree that the term bias is more appropriate to use in the Abstract, given that there is not set level to indicate any level of ‘impairment’ associated with under- or over-confidence. This has been changed to ‘biases’ as per the reviewer’s request (Abstract, page 2, line 49). Thank you for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest being more cautious in the wording relating to the simple effect tests on changes across different treatment arms in the abstract - since no interaction was found it may suggest a difference between arms that is not found significantly. Also since comparison between arms was the secondary aim, first describe interaction effects before simple effects in results.

      Thank you for this suggestion, we agree that the lack of significant interaction effect of time and group on confidence is a key finding, which has now been included in the Abstract (page 2, lines 67-71). Additionally, we have rearranged the order of results so the interaction effects precede the simple effects (Results, Comparing iCBT, Antidepressant and Control Groups, page 7, lines 246 – 292:

      "When comparing the three groups directly, ANOVA analysis predicting anxious-depression scores with group and time as independent variables revealed a main effect of time (F(1, 1632)=62.99, p<0.001), a main effect of group (F(2, 1632)=249.74, p<0.001), and an interaction effect of group and time (F(2, 1632)=9.23, p<0.001). Examining simple effects in the antidepressant arm, there was a significant reduction in anxious-depression from baseline to follow-up (=-0.61, SE=0.09, p<0.001). Among controls, levels of anxious-depression did not significantly change (=0.10, SE=0.06, p=0.096). Further details of transdiagnostic clinical changes for the antidepressant and controls groups are presented in Figure 4A and Table S4.

      Predicting confidence scores using ANOVA analysis with group and time as independent variables revealed a main effect of time (F(1, 1632)=16.26, p<0.001), and no significant main effect of group (F(2, 1632)=2.35, p=0.096). The interaction effect of group and time on mean confidence was not significant (F(2, 1632)=0.60, p=0.550), suggesting that change in confidence did not differ across the three groups. Tests of simple effects revealed that mean confidence significantly increased from baseline (M=3.77, SD=0.88) to follow-up (M=4.07, SD=0.79) in the antidepressant arm (=0.31, SE=0.08, p<0.001) (Figure 4B). Among controls, there was no significant change in confidence from baseline (M=3.68, SD=0.86) to follow-up (M=3.79, SD=0.92) (=0.11, SE=0.07, p=0.103) (Figure 4B).

      With respect to task performance, there was a significant main effect of time (F(1, 1632)=15.17, p=0.001) and group (F(2, 1632)=4.56, p=0.011) on mean dot difference when the three groups were included in the model. The interaction effect of time and group on mean dot difference was not significant (F(2, 1632)=1.91, p=0.148), suggesting no differences across the groups in task difficulty changes. In the antidepressant arm, mean dot difference decreased from baseline (M=41.2, SD=13.3) to follow-up (M=35.3, SD=13.1) (=-5.91, SE=1.25, p<0.001), indicating increased task difficulty. There was no significant change in task difficulty among controls from baseline (M=43.0, SD=11.8) to follow-up (M=41.4, SD=13.6) (=-1.64, SE=1.30, p=0.210) (Figure 4C).

      While our sample was underpowered to examine individual differences, we conducted an exploratory analysis examining the connection between changes in anxious-depression symptoms and changes in confidence in the antidepressant and controls groups. When examining the effects of time, group and anxious-depression change on mean confidence, there was a significant interaction effect of time and anxious-depression change on mean confidence (F(1, 1626)=4.04, p=0.045), suggesting change in confidence is associated with change in anxious-depression. There was no significant three-way interaction of anxious-depression change, time and group on mean confidence when comparing the three groups (F(2, 1626)=0.08, p=0.928), indicating that the significant association between confidence change and anxious-depression change was not specific to any group. Although not significant, the association between change in confidence and change in anxious-depression was in the expected negative direction in the antidepressant arm (r(80)=-0.10, p=0.381), and among controls (r(86)=-0.17, p=0.111) (Figure 4D)."

      Reviewer #3 (Recommendations For The Authors):

      Some minor points:

      Intro

      1) Awkward wording on page 3: 'but little research on how it might impact on metacognition'

      We have amended this sentence to make it more clear that relatively less research has been conducted on metacognitive changes following iCBT. We have also provided more detail on a prior study that examined changes in metacognitive beliefs with iCBT, and how this differs from the current study (Introduction, page 3, lines 137-141: “Additionally, iCBT has demonstrated clinical effectiveness in terms of symptom improvement (22–24). While one study found that iCBT modified self-reported metacognitive beliefs (25), it remains unknown if metacognitive confidence in decision-making improves following successful iCBT”).

      2) On page 3 the authors note 'but studies typically lacked power to detect effects of antidepressants on cognitive abilities (30-33)' - however, surely this is a problem with this study too, and its relatively small sample of those taking antidepressants?

      Thank you for highlighting this. The power comment was in the reference to the larger iCBT arm in this study, but we can appreciate that its placement means that it could be interpreted as being in relation to our smaller antidepressant arm (which we acknowledge is also potentially underpowered). We have reworded this sentence to make it clearer that prior antidepressant studies have not examined the impact of changes in metacognition specifically (Introduction, page 4, lines 147-149: “However, studies examining the impact of antidepressants on cognition have typically focused on cognitive capacities other than metacognition (30–33)”).

      Results

      3) Fig 2 - please clarify what the error bars indicate.

      The error bars represent the standard error around the standardised beta coefficients, which I have added to the description of Figure 2 (page 4, lines 171-172: “The error bars represent the standard error around the standardised beta coefficient”).

      4) Awkward wording: 'though it went in the same direction (Figure 4B)'.

      This part of the sentence was removed to reduce confusion.

      5) This description of the results is somewhat overstated: 'suggesting change in confidence was dependent on change in anxious-depression' (page 7) - this could also be the other way around, or related to a third factor.

      We have changed this from ‘dependent’ to ‘is associated with’, which accounts for the unknown directionality and true dependency of confidence changes on changes in anxious-depression (Results, page 7, line 285: “…suggesting change in confidence is associated with change in anxious-depression”).

      Methods

      6) Please also show how the WSAS in a supplement.

      Although this comment is unclear, we have provided additional information on how each item of the WSAS was scored and the overall score range (Supplemental methods, page 2, lines 53-55: “Each WSAS item was scored from 0 ‘not at all’ to 8 ‘very severely’, with overall scores ranging from 0 to 40. Higher WSAS scores indicating higher levels of functional impairment (11)”.

    1. Reviewer #3 (Public Review):

      This study tackles an interesting topic from a new perspective. The manuscript is well-written, logical, and conceptually clear. The central topic regards the purpose of preparatory activity in motor & premotor cortex. Preparatory activity has long captured the imaginations of experimentalists because it is a window on an unknown internal process - a process that is informed by sensation and related to action but tied directly to neither. Preparatory activity was the first truly 'internal' form of activity to be studied in awake behaving animals. The meaning and nature of the internal preparatory process has long been debated. In the 1960's, it was thought to reflect the priming of reflex circuits and motoneurons. By the 1980's, it was understood to reflect 'motor programming', i.e., the readying of cortical movement-generating machinery. But why programming was needed, and might be accomplished during preparation, remained unclear. By the 2000s, preparatory activity was seen as initializing movement-generating dynamics, much as the initial state of a dynamical system governs its future evolution. This provided a mechanistic purpose for preparation, but didn't answer a fundamental question: why use that strategy at all? Why indirectly influence execution by creating a preparatory state when you could send inputs during execution and accomplish the same thing directly?

      The authors point out that the many neural network models presently in existence do not address this question because they already assume that preparatory inputs are used. Thus, those models show that the preparatory strategy works, and that it matches the data in multiple ways, but they don't reveal why it is the right strategy. An additional issue with existing networks is that they potentially create an artificial dichotomy where inputs are divided into two types: preparation-creating and movement-creating. It would be more elegant if one simply assumed that motor cortex receives inputs that attempt to serve the needs of the animal, with preparation being an emergent phenomenon rather than being baked in from the beginning. In some ways the field is already starting to shift in this direction, with preparation being seen as a special case of a general phenomenon: inputs that arrive in the null-space of network outputs. However, this shift is still nascent, and no paper to date has really addressed this issue. Thus, the present study can be seen as being the first to take a fully modern view of preparation, where it emerges as part of the solution to a more general problem.

      The study is clearly written and clearly presented, and I found both the results and the reasoning to be compelling, with some exceptions noted below. The authors demonstrate that many aspects of the empirical data can be accounted for as natural outcomes of a very simple assumption: that the inputs to motor cortex are optimized to create accurate motor-cortex output while being 'well-behaved' in the sense of remaining modest in magnitude. More broadly, the idea is that preparation emerges as a consequence of constraints on motor-cortex inputs. If upstream areas could magically control motor cortex any way they wanted, then there would be no need for preparation. The necessary patterns of execution activity could just be created directly by inputs at that time. However, when there exist constraints on inputs (i.e., on what upstream areas can do) preparation becomes a useful - perhaps necessary - strategy. By sending inputs early, upstream areas can leverage the dynamics of motor cortex in ways that would be harder to accomplish during movement.

      The authors illustrate how a very simple constraint on inputs - a high 'cost' to large inputs - makes preparation a good strategy. Preparation isn't strictly necessary, but it produces a lower-cost solution (reduced input magnitude for a given level of accuracy). Consequently, preparation appears naturally, with a time-course of ~300 ms before movement onset. This late rise in preparation doesn't match the longer plateau most people are used to from studies that use a randomized instructed delay, but that actually makes sense. In those studies, the animal does not know when the go cue will be given, and must be ready for it to occur at any time. In contrast, the present study considers the situation where the time of future movement is known internally and is part of the optimization process. This more closely matches situations where the animal chooses when to move, and in those situations, preparation does indeed appear late in most cases. So the predictions of their simulations are qualitatively correct (which is all that is desired, given uncertainty regarding things like the right internal time-constants). Their simulations also successfully predict two bouts of preparation during sequence tasks, matching recent empirical findings.

      The main strength of the study is its ability to elegantly explain well-known features of data in terms of simple normative principles. The study is thorough and careful in key ways. For example, they show that the emergence of preparation, in the service of satisfying the cost function, is a very general property that holds across a broad range of network types (including very simple toy networks and a variety of larger networks of different types). They also go to considerable trouble to show why cost is reduced by preparatory inputs, including illustrating different scenarios with different readout-vector orientations. The result is a conceptually clear study that conveys a fresh perspective on what preparation is and why it exists.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons. First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more that than, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

    1. Author Response

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our revision, we will add a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the majority of the behavioral and neural masking effects that we observe may be from target-mask interactions at the target location rather than from the effect of the mask in the surround. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model (Fig. 8), the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target. Third, in our model, our results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal. These points will be elaborated in the revised manuscript and points 2 and 3 will be demonstrated in a supplementary figure.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results form our lab suggest that such sharp edges have a large impact on V1 population responses. We will expand on these issues in the revised manuscript. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we will cite in the revised version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the revised manuscript we will include a supplementary figure that will show how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size of the mask.

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We will add these points to the discussion in the revised manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We will incorporate this information in the revised manuscript.

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the revised manuscript we will be more explicit and refer to the relevant figure panels (Fig 4D, E, J, & K vs. Fig 5D, E, J, & K) and report important values to substantiate this key claim.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J). We will explain this more clearly in the discussion of our revised manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We will include the results of this analysis as a supplementary figure in the revised manuscript. This analysis, however, does seem to reveal interesting differences between correct and incorrect trials which we will discuss in the revised manuscript. s

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Author Response

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major Concerns:

      1) There are numerous grammatical issues throughout the manuscript, and too much awkward jargon is used, such as "status of energy stresses", "ES-acetate". The characterization of acetate as an "energy stress" gives a negative connotation, which is unnecessary and confusing. Ketones are produced under the same circumstances but are a vital adaptive response, except for ketoacidosis. The terminology used throughout the manuscript is also vague, and some methodology is not adequately described in the Methods section. For example, the meaning of "preprandial" and "postprandial" is unclear, and there is no explanation of the related methodology.

      Thank you for your comments. We have replaced "status of energy stresses" with "energy stresses", in our revised manuscript. We agree with you that acetate and Ketone Bodies are produced under the same circumstances and their production is a result of a vital adaptive response. It is well known that the production of large amount of acetate and Ketone Bodies is an important physiological adaption of body in response to energy stresses such as prolonged starvation and untreated diabetes mellitus. In this context, we use “energy stress-acetate”, a term coined by ourselves to emphasize the condition of acetate production and its role under such condition. Based on your concerns, we have addressed the issues and provided a thorough description of the modifications made in the Methods section.

      2) The authors claim that acetate is a ketone body, which is incorrect. As the authors show, it is not produced by the ketogenic pathway or from the breakdown of ketones. Acetate is a carboxylic acid and specifically a short-chain fatty acid.

      We agree with you that our description of acetate as a ketone body is seemingly incorrect. Indeed, acetate is a short-chain fatty acid in terms of molecular structure. The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid. Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      3) The human subjects are not sufficiently characterized, and it is unclear whether they are T1DM or T2DM subjects. No information is provided on morphometrics, how and when serum was collected, exclusion criteria, medicines, etc. Proper characterization of human subjects is necessary before publishing such data.

      Thank you very much for your comments. We have added the description of subjects you mentioned in the Methods section.

      4) While Figure 4 is an essential set of experiments that demonstrate that ACOT12 is necessary for the induction of acetate during starvation in mice, the authors do not explain the source of basal levels of acetate that persist in mice lacking ACOT12. It is unclear whether this source is from other tissue or microbiota. Since loss of ACOT by ShRNA treatment resulted in ~25% reduction in acetate, it is very difficult to conceive how this produces the profound neurological and strength deficits presented in Supplemental Figure 8 (see last point below).

      Additionally, it is not clear how the control mice for the knockout studies were generated. Please clarify.

      In normal condition, the serum acetate level in mice is around 200 μM. Hepatic ACOT12 and ACOT8 enzymes seems to provide a serum acetate concentration of 60-90 μM, individually (Figure 4). The intestinal microbiota contributes a serum acetate concentration of 60-80 μM (Figure 2 and Figure supplement 1).

      During energy stress, the protein levels of ACOT12 and ACOT8 in the mouse liver were significantly upregulated (Figure 3 and Figure supplement 1), resulting in an significant increase of serum acetate level to approximate 400 μM. The acetate produced by ACOT12 (~200 μM) and ACOT8 (~200 μM) constitutes the main portion of serum acetate concentration under such condition (Figure 2), while the contribution of intestinal microbiota to serum acetate level is minimized (Figure 2 and Figure supplement 1). Elimination of either ACOT12 or ACOT8 reduces serum acetate level by up to 50% (Figure 4). However, such estimation is only a rough approximation and does not consider the possibility of compensatory upregulation of ACOT12 and ACOT8 in kidney when ACOT12 or ACOT8 is knocked out in liver.

      Acetate assumes the role as an important energy source in the case of reduced glucose utilization associated with diabetes. In this case, knockdown of ACOT12 or ACOT8 (shACOT12 or shACOT8) can remarkably reduce acetate production and consequently influence the Motor Function of mice to a certain extent.

      5) The results presented in Figure 5 are confusing, and the authors' interpretation needs elaboration. The FAO assay detects water-soluble 3H-metabolites and 3H2O, and etimoxir or CPT1 knockout completely inhibits FAO. Therefore, it is unclear how peroxisomes can produce acetate without generating water-soluble intermediates that are detectable in the assay. Further explanation and rationale for the authors' interpretation are necessary.

      Mitochondria serve as the primary organelle for the catabolism of oleic acid. However, in certain instances, fatty acid oxidation (FAO) can occur in the peroxisome, resulting in the production of medium-chain fatty acids and acetyl-CoA. Nevertheless, these medium-chain fatty acids cannot undergo further oxidation within the peroxisome. Instead, they must be transported out of the peroxisome and then into the mitochondria through CPT1 (carnitine palmitoyltransferase 1) for further oxidation.

      To assess FAO, we utilized a detection method based on 3H labeling in H2O in cells treated with [9,10-3H(N)]-oleic acid. The introduction of [9,10-3H(N)]-oleic acid leads to the production of 3H-labeled medium-chain fatty acids and acetyl-CoA within the peroxisome. The further oxidation of 3H-labeled medium-chain fatty acids in the mitochondria was inhibited by impeding the activity of CPT1, leading to the eventual decrease of 3H-labeled H2O. However, acetyl-CoA can still be converted to acetate by ACOT8. As a result, knockdown or etomoxir inhibition of CPT1, decreased more than one-half of U-13C-palmitate-derived U-13C-acetate production, in spite of mitochondria β-oxidation being nearly completely abolished.

      6) Figure 6F, which shows various fatty acyl-CoAs in MPHs, is not helpful on its own. It would be useful to compare this data to loss of function MPH data and to measure these acyl-CoAs in knockout liver. Additionally, since it is normal for liver acetyl-CoA concentration to change by several-fold in fasted and fed liver, this data from snap frozen liver tissue of ACOT12/8 KO mice would help prove the authors' point.

      We are grateful for your valuable advice. As you mentioned there are indeed several outstanding questions that require further clarification. To address these questions, we are currently in the process of developing an experimental mouse model in which ACOT12 and ACOT8 are conditionally knocked out. By virtue of this approach, we aim to acquire more substantial evidence to substantiate the aforementioned conclusions.

      7) Figure 7 suggests that loss of ACOT inhibits ketogenesis by decreasing HMGCS2 expression and increasing its acetylation. However, it is difficult to imagine that this the main mechanism considering the extraordinary ability of liver to handle high rates of acetyl-CoA conversion to ketones during fasting which, as the authors know, is the canonical mechanism by which mitochondrial CoA is preserved during elevated FAO. The manuscript (Figure 6 and 7) argues that it is the conversion of acetyl-CoA to acetate which is more important. A critical limitation of this argument is that ACOT12 is in cytosol (Figure 5), so while it spares CoA for fatty acid activation, it does not spare CoA for beta oxidation in mitochondria. That latter function is carried out by the ketogenic pathway. A second limitation is that the mechanism relies on citrate transport and ACLY activity, which is not generally thought to be very active in the ketogenic states of fasting and T1DM studied here. In essence, the mechanism relies on circular logic, whereby mitochondrial acetyl-CoA accumulates in the setting of impaired FAO, which then impairs ketogenesis and depletes CoA which then impairs FAO without lowering acetyl-CoA. I don't have a solution, but I think it is important to acknowledge the flaws in this proposed mechanism.

      As the Reviewer suggested, ACLY indeed plays a crucial role in fatty acid synthesis. Acetyl-CoA is transported out of the mitochondria in the form of citrate, which is subsequently broken down into acetyl-CoA by ACLY. Under conditions of sufficient nutrition, acetyl-CoA carboxylase 1 further activates acetyl-CoA to participate in fatty acid synthesis.

      In the context of an energy crisis resulting from low glucose utilization, we propose that ACLY might serve another pivotal role in addressing this energy deficit. In conditions such as untreated diabetes or prolonged starvation, glucose utilization is significantly reduced, leading to a reliance of body on fatty acid oxidation in liver to generate Ketone Bodies and acetate to fuels extrahepatic peripheral tissues and thus cope with the energy crisis. However, excessive fatty acid oxidation disrupts the balance between oxidized and reduced CoA, necessitating the production of both acetate and Ketone bodies to restore this equilibrium. Conventionally, fatty acid synthesis is inhibited during this period as AMPK is activated to suppress acetyl-CoA carboxylase 1 activity via phosphorylation in low-energy states. Based on our preliminary experimental results, the activity of ACLY and citrate transporter still appear to work well. It is possible that citrate-ACLY-ACOT12-acetate pathway is important for downregulating the level of mitochondria acetyl-CoA in energy crisis. According to previous studies, cytosolic reduced CoA has the capability to be transported into the mitochondria, thereby replenishing the acetyl-CoA pool within the mitochondria (PMID: 32234503). It is important to note that this remains a hypothesis requiring further testing.

      8) Figure 8 presents some deceptively complex MS data following a 13C-acetate injection. The data is presented in an unorthodox manner, as 13C-metabolite intensities, making it nearly impossible to properly interpret. Enrichment of TCA cycle intermediates are not always easy to interpret, but at minimum, this data needs to be presented as MIDs or fractional enrichments. If the data is not modeled, then it might be useful to at least perform a rudimentary precursor-product analysis (i.e. normalized to plasma acetate enrichment).

      Supplemental Figure 8 also introduces evidence for neurological and strength deficits in shACOT12/8 knockdown mice. It is an interesting observation, but there is no direct link to the metabolic studies in the main figure, which does not present data in the loss of function mice. Nor is this part of the story investigated in liver specific knockout mice. Figure 8 is the least developed part of the manuscript and could be removed without losing the impact of the story.

      We deeply appreciate your valuable suggestions. As mentioned previously, we are currently engaged in the development of an experimental mouse model where ACOT12 and ACOT8 are selectively knocked out. Subsequent experiments will be conducted to validate this model, and the resulting data will be presented in the form of MIDs or fractional enrichments, as per your suggestion.

      The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT, and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

      Reviewer #2 (Recommendations for the authors):

      The statement that acetate is an emerging ketone body is not correct. It is not a ketone, it is a carboxylic acid or a short-chain fatty acid. In my opinion, to avoid confusion this should be clarified.

      We agree with you that our description of this is not clear enough. Acetate is a short-chain fatty acid in terms of molecular structure indeed.

      The classic Ketone Bodies include acetone, acetoacetate and beta-hydroxybutyrate, among which acetone and acetoacetate contain carbonyl group and can be considered as ketone, however beta-hydroxybutyrate which contains only hydroxyl and carboxyl groups is actually not a ketone but a short-chain fatty acid.

      Noteworthily, here our description of acetate as an emerging novel “ketone body” is not aimed to consider it as a real ketone in structure, but to emphasize the high similarity of acetate and the classic Ketone Bodies in the organ (liver) and substrate (fatty acids-derived acetyl-CoA) of their production, the roles they played (as important sources of fuel and energy for many extrahepatic peripheral organs), the feature of their catabolism (converted back to acetyl-CoA and degraded in TCA cycle), as well as the physiological conditions of their production (energy stresses such as prolonged starvation and untreated diabetes mellitus). To prevent any potential misunderstanding, we annotate the usage of "ketone body" with double quotation marks in our revised manuscript.

      The reason for increased fatty acid delivery to the liver is explained by insulin resistance rather than by reduced carbohydrate availability.

      Patient characteristics should be provided.

      Thank you for your suggestions. We have revised our manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      • Please include the rationale for having data from both C57BL/6 and BALC/c. In metabolic research, C57BL/6 is more commonly studied. The data between these two strains are similar, and one could be easily removed to limit redundancy.

      Thank you for bringing this issue to our attention in the manuscript. In metabolic research, C57BL/6 mice are more commonly utilized as a model organism than BALC/c mice indeed. In this study we try to elucidate a characteristic may be shared among different mammalian species, namely the ability to produce a substantial amount of acetate during energy crises. However, given the constraints of our experimental setup, we opted to employ C57BL/6 mice as the main animal model to investigate the underlying mechanism. BALC/c mice were used to confirm the underlying mechanisms governing acetic acid production.

      • In the experiments where ACOT8 and ACOT12 are selectively knocked out or knocked down, please include the levels of other ketone bodies, such as 3-HB and AcAC, from these experiments. While acetate production is diminished, there might or might not be a compensatory increase in the production of these metabolites. This would include experiments related to Figures 3, 4, and 5.

      Thank you for your valuable comments. As you mentioned, in diabetic mice where ACOT12 and ACOT8 are knocked down in liver, there is a significant down-regulation of 3-HB and AcAc (Figure 7B, C). Based on this observation, we hypothesize that ACOT12 and ACOT8 might also play a regulatory role in the formation and metabolism of ketone bodies during an energy crisis. However, the precise regulatory mechanism underlying this phenomenon requires further investigation.

      • From Figure 1 (source data 1), two patients with diabetes have concurrent cancer. Cancer cells have altered metabolism compared to native cells. Thus, it is possible that circulating acetate cells may be altered in these cancer patients, regardless of the presence of diabetes. This should be acknowledged. Otherwise, these two subjects should be taken out.

      Thank you for your suggestions. We have taken out these two subjects in our revised manuscript.

      • Can the authors expand on their thoughts on why some results from the behavioral tests are statistically significant while others are not? For example, many motor tasks such as forelimb strength, running time, total distance, and total entries significantly differ with ACOT8 and ACOT12 knockdown. However, more anxiety-based measures such as time in open arms, correct alteration, and object recognition are not statistically different.

      Thank you for your comments. The evaluation of anxiety-related behavior is commonly done using the Elevated Plus Maze Test (EPMT), while working memory and cognitive functions are assessed through the Y-maze Test (YMZT) and Novel Object Recognition (NOR) Test. Measures such as forelimb strength and running time in the rotarod test, total distance in YMZT, total entries in YMZT and total distance in the NOR test are indicators of muscle force and movement ability. Our data demonstrate that acetate plays a significant role in enhancing muscle force and facilitating coordinated neuromuscular movement. Interestingly, we found that ACOT12/8 knockdown in the early stages of diabetes mellitus does not have a pronounced impact on psychiatric, memory, and cognitive behaviors (Figure 8 and figure supplement 2). However, it is important to note that our study primarily focuses on elucidating the utilization of acetate during energy crises, such as untreated diabetes and chronic hunger. Our findings suggest that acetate is primarily utilized to enhance motor capacity rather than cognitive or neural activity.

    1. Author Response

      Reviewer #1 (Public Review):

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal pattern of host use. In Argentina, there is some evidence (Stein et al., 2013; Beranek, 2018) regarding the seasonal change in host use by Culex species, including Culex quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterizing bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal host use by Culex mosquitoes.

      While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and autumn (Spinsanti et al., 2008). It is based on this evidence that we hypothesize there is a seasonal change in host use by Culex quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality).

      I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      We will restructure our discussion to align it with our results, as suggested.

      Grammar and writing

      The manuscript will be grammatically revised.

      Reviewer #2 (Public Review):

      There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artifacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because of all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As it was stated by the reviewer, repetition is a resource and time consuming activity that we are not able to do. Replicating the experiment poses a significant time challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities. Given the limitations of resources and time and the infrequent use of experimental repetition in this type of studies, we suggest performing a simulation-based analysis. This approach involves generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation will be to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. We will introduce this simulation-based analysis in the next revised version of the manuscript.

      Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      We are aware that few studies regarding host shifting in South America are available, some such those conducted by Stein et al. (2013) and Beranek (2018) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. As you suggested, we could mention it in the discussion to highlight our partial findings. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus and Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we will discuss into our hypotheses and theoretical model to explain this seasonal shift in host use.

      Grammar and writing

      The manuscript will be grammatically revised by a professional translator.

    1. Data and dataset

      One reason we may be having difficulty with this section is that "data" can be literally anything from basic facts, which is what lay people think of as "data" to professional works of fiction or visual art, which lay people think of as more than mere data. And those lay intuitions are legally relevant.

      As a result of these distinctions, and the many other rights issues that may arise with certain types of data - e.g., health information or nude pictures of real people, this area seems like the most ethically complicated.

      I don't have a good answer here, I'm just trying to call out why this feels more challenging. Maybe that will help shake some ideas loose as we continue to iterate.

    1. Author Response

      Thank you for your thorough critique and thoughtful suggestions for improving our manuscript, "Homeostatic Synaptic Plasticity of Miniature Excitatory Postsynaptic Currents in Mouse Cortical Cultures Requires Neuronal Rab3A.” The reviewers’ detailed comments suggest that showing multiple types of graphs to demonstrate the presence of divergent scaling of mEPSC amplitudes in cultures from Rab3A wild type, and its disruption in cultures from Rab3A knockout mice, had the unintended consequence of obscuring the major results of our study. Furthermore, our proposal that the difference in characteristics of scaling of GluA2 receptor expression compared to that of mEPSC amplitudes, based on the ratio plots, indicated that a mechanism other than postsynaptic receptors likely contributes to the homeostatic increase in mEPSC amplitude was not convincing to the reviewers. Reviewers 2 and 3 point out these results might be explained by differences in the limitations and artifacts of the two very distinct techniques, electrophysiology and fluorescence imaging. In the revision we will acknowledge that a greater variability in the signal, or, more issues with signal over noise, might be present in imaging experiments compared to electrophysiology. This could explain the lack of identical effects on GluA2 receptors compared to mEPSC amplitudes in the matched experiments, but we maintain it is also possible that a greater variability in GluA2 responses is biologically meaningful. Further, an issue with the accuracy of imaging experiments to report the true receptor effects would also call into question the conclusion that receptors always increase after activity blockade. Finally, the graphs illustrating the detailed characteristics of scaling with rank order and ratio plots required pooling multiple samples per cell, which precludes application of standard statistical methods to determine whether effects or differences reach statistical significance. Therefore, we will remove the cumulative distribution functions, rank order plots, and ratio plots, and show only analyses that involve a single sample per cell. This major change will simplify and clarify the main findings, that homeostatic plasticity of both mEPSC amplitude and GluA2 receptor expression in mouse cortical cultures involves the synaptic vesicle protein Rab3A operating in neurons rather than astrocytes. We will focus our comparison between mEPSC amplitudes and receptors in the same cultures to differences between the magnitude of effects on the mean or median, and will make clear that overall, our data can be explained by two possibilities: 1) the presynaptic vesicle protein is acting via regulation of postsynaptic receptors alone, or, it is regulating both postsynaptic receptors and another contributor to mEPSC amplitude, possibly amount of transmitter released by a single vesicle. Either way, it is very surprising that this presynaptic protein is involved in postsynaptic changes, so our results represent a novel contribution to the field of homeostatic plasticity. In sum, the changes we propose should go a long way towards addressing the majority of the reviewers’ major critiques.

      A related issue raised by the reviewers was that the model describing potential presynaptic mechanisms of Rab3A in homeostatic plasticity was not supported by direct evidence (Figure 10). We meant the model to introduce the possibility of a presynaptic contribution to mEPSC amplitude and to stimulate future research, but clearly did not communicate its speculative nature, neither in the Figure legend nor in our discussion of potential mechanisms. In the revision, we will restrict the model to the direct findings in this study. Additionally, we will state where appropriate, that while previous findings at the mouse NMJ are consistent with a presynaptic role for Rab3A (Wang et al., 2011), in the current study there is no direct evidence for this idea in cortical cultures other than the quantitative differences in the fold increases in mEPSC amplitudes and GluA2 receptors which were assayed in the same cultures.

      We will submit a revised version addressing each of the reviewer’s concerns and suggestions as described above and below; these major modifications will greatly improve the readability of the manuscript and clarify the main results.

      Reviewer #1

      Koesters and colleagues investigated the role of the presynaptic small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed an increase in GluA2 puncta size and intensity in wild type, but not Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which presynaptic Rab3A is required for homeostatic scaling of synaptic transmission through GluA2-dependent and independent mechanisms.

      While the title of the manuscript is mostly supported by data of solid quality, many conclusions, as well as the final model, cannot be derived from the results presented. Importantly, the results do not indicate that Rab3A modulates quantal size on both sides of the synapse. Moreover, several analysis approaches seem inappropriate.

      The following points should be addressed:

      1) The model shown in Figure 10 is not supported by the data. The authors neither provide evidence for two different functional states of Rab3A being involved in mEPSC amplitude modulation, nor for a change in glutamate content of vesicles. Furthermore, the data do not fully support the conclusion of a presynaptic role for Rab3A in homeostatic scaling.

      We will revise the model, removing presynaptic mechanisms for Rab3A and restricting it to the direct findings in this study.

      2) The analysis of mEPSC data using quantile sampling followed by ratio calculation is not meaningful under the tested experimental conditions because of the following reasons:

      (i) The analysis implicitly assumes that all events have been detected. The prominent mEPSC frequency increase after TTX suggests that this is not the case, i.e., many (small) mEPSCs are likely missed under control conditions.

      We explicitly addressed the potential contribution of missed mEPSCs that are below threshold in (Hanes et al., 2020). We found that even simulating a threshold of 7 pA, applied to data artificially modified by uniformly multiplying the control data set, did not generate a ratio plot with the increasing ratio over 75% of the data that we observe in the experimental data. Overall, the findings from simulating a threshold and a uniform multiplicative factor illustrate that the threshold issue does not cause major changes to the data. Furthermore, in cultures from Rab3A+/+ mice from the Rab3AEbd/+ colony, the mEPSC amplitudes were significantly smaller than those recorded in cultures from Rab3A+/+ mice from the Rab3A+/- colony (lines 327-329, 11 pa vs 13 pA), indicating that if there were smaller mEPSCs occurring in the Rab3A+/+ data set, we would have detected them. Although for these reasons we feel it is unlikely our ratio plot analysis is invalid, to clarify the result that homeostatic plasticity of mEPSC amplitude requires functioning Rab3A, we will remove the ratio plots.

      (ii) The analysis is used to conclude how events of a certain size are altered by TTX treatment. However, this analysis compares the smallest mEPSCs of the TTX condition with the smallest control mEPSCs, but this is not a pre-post experimental design. Variation between cells and between coverslips will markedly affect the results and lead to misleading interpretations.

      The rank order plot is a well-established plot to examine the mathematical transformation caused by homeostatic plasticity, first used in (Turrigiano et al., 1998). We included it here to facilitate comparison of our findings with previous results. We introduced the ratio plot in (Hanes et al., 2020), finding it shows more clearly differences occurring in the range of small mEPSC values. The reviewer is correct in that we are assuming the smallest mEPSCs before treatment should be matched with the smallest mEPSCs after treatment. It is almost impossible to do a pre-post experimental design for mEPSCs. Even when applying a treatment, for example acute perfusion with a receptor antagonist, to a single cell and recording mEPSCs before and after the treatment, it is not a true pre-post design at the level of mEPSC amplitudes, which come from many different inputs. The power of the method is that different characteristic mathematical transformations for different experimental conditions (e.g., genotype or activity protocol) support the idea that those conditions either involve different mechanisms or have altered the mechanism. Such differences might be missed by only comparing means or medians. However, we found no evidence that loss of Rab3A or expression of the Rab3A Earlybird mutant altered the mathematical transformation due to homeostatic plasticity, other than to reduce its magnitude across all amplitudes. Therefore, including these complex analyses is not adding anything to the finding that Rab3A plays a role in homeostatic plasticity of mEPSC amplitudes and they will be removed in the revision.

      (iii) The ratio (TTX/control) vs. control plots seem to suffer from a division by small value artifact (see Figure 6F).

      The reviewer is referring to findings on the ratio plot for receptor cluster area. Because the large ratios for the smallest control areas occur in the cultures prepared from wild type mice, and to a much lower extent in cultures prepared from Rab3A knockout mice, we think there is a biologically relevant increase in the TTX/CON ratio, since an artifact due to division by small values should be present in both data sets. However, we cannot rule out that the differences in ratio plot behavior between receptors and mEPSC amplitudes result from the different limitations in detection of receptor clusters vs. the limits of detection of mEPSCs, so we will remove the ratio plots and focus on comparison of means or medians.

      Correspondingly, ratio-analysis differs considerably for different control conditions (Fig. 1Giii, Fig. 2Giii, Fig. 6C, Fig. 9A).

      The reviewer is correct to point out that the ratio plot shows differences across control conditions (note, these differences are not obvious with the more standard rank order plot). The magnitude of the 50th percentile ratio differs across control conditions, and behaviors of the largest mEPSCs also differ, with some ratios going down at the highest control amplitudes (1Giii, 6C), and others continuing to increase with increasing control amplitude (2Giii, 9A). They all share the divergent increasing ratio from smallest mEPSC amplitude to around the 20 pA level. We attribute the differences in magnitude to the differences in experimental conditions: 1Giii is Rab3A+/+ from the +/+ colony; 1Giii is Rab3A+/+ from the Ebd/+ colony; 6C is a set of Rab3A+/+ cultures assayed several years after the set in 1Giii; 9A is a different culture condition altogether, with neurons being plated onto an already formed bed of astrocytes. Effects on the largest mEPSCs are likely attributable to the small number and high variability of amplitudes in this range. Since the variability in the very sensitive ratio plot have taken away from the main findings of homeostatic plasticity being disrupted in the absence of functioning Rab3A in neurons, we will remove the rank-order and the ratio plots from the manuscript.

      3) As noted by the authors in a previous publication (Hanes et al. 2020), statistical analysis of CDFs suffers from ninflation. In addition, the quantile sampling method chosen violates an important assumption of the K-S test. Indeed, pvalues for these comparisons are typically several orders of magnitude smaller. Given that the statistical N most likely corresponds to the number of cultures (see, e.g., https://doi.org/10.1371/journal.pbio.2005282), CDF comparisons are not informative and should thus not be used to draw conclusions from the data. The plots can be informative, though.

      As the reviewer acknowledges, we were very careful in (Hanes et al., 2020) to state that the p values could not be used to determine significance in the KS test of cumulative distributions for pooled data because the KS test assumes a single sample per cell. We also suggested in that study that the p values could be used in a comparative way for looking at data sets with similar (inflated) n values to say something about bigger or smaller differences. We failed to reiterate those caveats here. In reviewing the article “What is N” by (Lazic et al., 2018) (which we very much appreciate being shown by the reviewer), we agree that in the current study where we are attempting to show how the effect of homeostatic plasticity is or is not altered by loss of Rab3A function, it is imperative that we be able to make conclusions about statistical significance. The pooling approach is essential for having some sense of the mEPSC amplitude distributions, but that is not necessary for looking at the effect of Rab3A. Therefore, we will remove all analyses that involve pooling of multiple mEPSC amplitudes per cell.

      4) How does recoding noise and the mEPSC amplitude threshold affect "divergent scaling"?

      We addressed this in our 2020 paper (Hanes et al., 2020) where we showed that the experimental homeostatic increase in mEPSC amplitude cannot be simulated with uniform, multiplicative synaptic scaling whether we included or excluded distortion caused by a detection threshold.

      5) What is the justification for the line fits of the ratio data/how was the fit range chosen?

      We are assuming the reviewer is referring to the line fits for the rank-order data. If so, the fit range is the entire range of the data. This issue will be addressed by the removal of the rank-order plots from the manuscript.

      6) TTX application induces a significant increase in mEPSC amplitude in Rab3A-/- mice in two out of three data sets (Figs. 1 and 9). Hence, the major conclusion that Rab3A is required for homeostatic scaling is only partially supported by the data.

      Based on the p-values for comparison of means with a Kruskal-Wallis test, we would argue that TTX application does not show a significant increase in mEPSC amplitude in Rab3A-/- neurons (Figure 1 p-value = .318; Figure 9 p-value = .125) when comparing to untreated control mEPSC amplitude means. It is only when we use the KS test and the inflated n’s that we get a barely significant results, p = 0.042. Based on the Lazic article (Lazic et al., 2018), we would now conclude that we cannot use the KS p value in that analysis. We have tried to be clear that the effect of TTX application on mEPSC amplitude in Rab3A-/- neurons is not completely abolished, but rather is dramatically reduced, which we acknowledge in the manuscript (line 279). This issue will be addressed by removal of CDFs from the manuscript.

      7) Line 289: A comparison of p-values between conditions does not allow any meaningful conclusions.

      Although we feel that comparison of magnitude of effects can be stated in a qualitative way for similar sized pooled data sets with larger or smaller p-values, we agree that statistical significance has no meaning. This issue will be addressed by removing the CDF plots from the manuscript.

      8) There is a significant increase in baseline mEPSC amplitude in Rab3AEbd/Ebd (15 pA) vs. Rab3Aebd/+ (11 pA) cultures, but not in Rab3A-/- (13.6 pA) vs. Rab3A+/- (13.9 pA). Although the nature of scaling was different between Rab3AEbd/Ebd vs. Rab3AEbd/+, and Rab3AEbd/Ebd with vs. without TTX, the question arises whether the increase in mEPSC amplitude in Rab3AEbd/Ebd is Rab3A dependent. Could a Rab3A independent mechanism occlude scaling?

      We have acknowledged in the manuscript that one explanation for a failure to exhibit homeostatic plasticity in the cultures from Rab3A Earlybird mutant mice is that the already large basal amplitude occludes any further increase (line 366). In the revision we will make sure the occlusion possibility is highlighted, but we will also discuss other proteins that have been implicated in homeostatic plasticity that have caused an increase in mEPSC amplitude and/or AMPA receptors at baseline, for example, Arc/Arg3.1 KO (Shepherd et al., 2006; Beique et al., 2011); Homer KO (Hu et al., 2010) and inhibition of mir-186-5p (Silva et al., 2019).

      9) Figure 4: NASPM appears to have a stronger effect on mEPSC frequency in the TTX condition vs. control (-40% vs. 15%). A larger sample size might be necessary to draw definitive conclusions on the contribution of Ca2+-permeable AMPARs.

      We will acknowledge that Ca2+-permeable AMPARs could be contributing to the frequency increase following activity blockade and will also include analyses of frequency throughout the manuscript.

      10) The authors discuss previous papers showing changes in VGLUT1 intensity. Was VGLUT intensity altered in the stainings presented in the manuscript?

      We will perform analyses VGLUT1 intensity and include them in the manuscript.

      11) The change in GluA2 area or fluorescence intensity upon TTX treatment in controls is modest. How does the GluA2 integral change?

      The changes in GluA2 integrals look exactly like the changes in cluster size and were not included to simplify the results. But with the removal of the CDFs, rank order, and ratio plots, we can easily include integral measurements. What we did not observe was an additive effect with intensity and size such that the effects on integral were of greater magnitude or statistical significance than either alone. We will include the integral plots in the revised manuscript.

      12) The quantitative comparison between physiology and microscopy data is problematic. The authors report a mismatch in ratio values between the smallest mEPSC amplitudes and smallest GluA2 receptor cluster sizes (l. 464; Figure 8). Is this comparison affected by the fluorescence intensity threshold?

      What was the rationale for a threshold of 400 a.u. or 450 a.u.?

      We have acquired AOIs of receptor clusters at multiple threshold levels, and can examine whether the results are altered when using a low, medium or high threshold level.

      How does this threshold compare to the mEPSC threshold of 3 pA?

      The issue with values being below threshold in untreated cultures has been the concern in interpreting effects on mEPSC amplitudes, specifically, whether this mismatch contributes to divergent scaling. A problem of values being below a toohighly set threshold in the control and becoming detectable after the homeostatic plasticity produces a lower ratio than expected, because now there are values in the treated condition that were not present in the control condition. Instead, for GluA2 receptor cluster size, we observed higher TTX/CON ratios at the low end of the data set. So, based on this, the thresholds chosen for imaging are not having the same effect, if that is what is being asked. This issue will be addressed by removing ratio plots.

      The conclusion that an increase in AMPAR levels is not fully responsible for the observed mEPSC increase is mainly based on the rank-order analysis of GluA2 intensity, yielding a slope of ~0.9. There are several points to consider here: (i) GluA2 fluorescence intensity did increase on average, as did GluA2 cluster size. (ii) The increase in GluA2 cluster size is very similar to the increase in mEPSC amplitude (each approx. 18-20%). (iii) Are there any reports that fluorescence intensity values are linearly reporting mEPSC amplitudes (in this system)?

      We agree that our data show GluA2 receptors increase as based on cluster size, and did not mean to imply otherwise. Our conclusion that there is another contributor to mEPSC amplitude other than receptors is based on two main findings, 1) that the ratio plots for mEPSC amplitudes and receptor cluster size have distinctively different behaviors, and 2) that there are differences in either magnitude or direction of the TTX effect across 6 matched cultures, 3 from WT animals and 3 from TTX animals (see more explanation of this point below, in response to Reviewer 3). To our knowledge, no one has reported homeostatic plasticity effects on a culture by culture basis, and no one has compared imaging results and physiological results for the same cultures. We will remove the ratio plots and the conclusions based on the differences in behavior for mEPSC amplitudes and receptor cluster size. We will acknowledge in the revision that the differences in magnitude and direction across the 6 matched cultures could be due to the differences in limitations and artifacts of imaging fluorescent antibody staining vs. the limitations and artifacts of detecting mEPSCs electrophysiologically. However, we will continue to state that our results could also be due to the possibility that mEPSC amplitude is not changing in lockstep with receptor levels in every situation. To support this proposal, we will discuss those articles that include both measurements, and point out where mEPSC amplitude measurements and receptor levels match and where they do not.

      Antibody labelling efficiency, and false negatives of mEPSC recordings may influence the results. The latter was already noted by the authors.

      We will add the caveat that antibody labeling efficiency can vary between coverslips. Although we prepared single solutions that were applied to all coverslips in an experiment, this was not possible for the primary antibody to GluA2, which was added to live cultures in individual wells.(iv) It is not entirely clear if their imaging experiments will sample from all synapses. We will add to Materials and Methods that we sample from all the synapses that could be detected by the researcher on the primary dendrite of the pyramidal cell.

      Other AMPAR subtypes than GluA2 could contribute, as could kainate or NMDA receptors.

      This is true, other AMPARs (GluA3 and/or GluA4) could be contributing, but we only looked at the receptors well established to be contributing to homeostatic plasticity (GluA1 and GluA2). We will acknowledge the possible contribution of other AMPARs in the revised manuscript.

      Furthermore, the statement "complete lack of correspondence of TTX/CON ratios" is not supported by the data presented (l. 515ff). First, under the assumption that no scaling occurs in Rab3A-/- , the TTX/CON ratios show a 20-30% change, which indicates the variation of this readout. Second, the two examples shown in Figure 8 for Rab3A+/+ are actually quite similar (culture #1 and #2), particularly when ignoring the leftmost section of the data, which is heavily affected by the raw values approaching zero.

      We will remove the ratio plots from the manuscript and the arguments about differences between GluA2 receptors and mEPSC amplitudes that were based on them. However, we maintain that we have demonstrated a lack of consistent effect for GluA2 receptors and mEPSCs in the matched culture experiments. Yes, the readout of homeostatic plasticity in ratio plots for mEPSCs in the Rab3AKO reach over 1.1 in Figure 1, and as high a 1.2 in the cultures where Rab3AKO neurons were plated on Rab3AWT glia (Figure 9). Our point is that if we had measured GluA2 receptor responses to TTX in those same experiments, the ratios should have been above 1. However, in the experiments in which we measured both mEPSCs and GluA2 receptors, the ratios do not match. In culture #1, the ratio for mEPSCs was at 1 for more than 50% of the data, but for GluA2 receptors, was below 1 for more than 50% of the data. In culture #3, the ratio for mEPSCs was below 1 for more than 50% of the data, but for GluA2 receptors was close to 1.2 for 50% of the data. Only for culture #2 do the ratios appear to match. In the revised manuscript, the evidence that GluA2 receptors and mEPSCs are not changing in parallel will be based on the behavior of means or medians in untreated vs TTXtreated cultures, rather than ratio plots. It could be argued that we need a greater number of matched experiments to make conclusions, but the whole point of a matched experiment is that it should always show the same result—we are no longer dealing with the variability in the homeostatic plasticity itself. We will add a statement that the only three explanations left for the failure of mEPSC amplitudes and GluA2 receptors to change in parallel are 1) a true mismatch, 2) a sampling issue, or 3) technical artifacts that occur in one culture and not another.

      13) Figure 7A: TTX CDF was shifted to smaller mEPSC amplitude values in Rab3A-/- cultures. How can this be explained?

      Figure 7A depicts the pooled data that are shown separately for 3 cultures in Figure 8. We observed mEPSC amplitudes being smaller after TTX treatment in some range of the data for all three Rab3AKO cultures, suggesting that this may be a biological result rather than random variation around no change (which would be a ratio of 1). However, this effect is not significant at the level of means, nor in the KS test (which has the issue of inflated n in any case), so we did not highlight this point. This issue will be addressed by the removal of the CDF plots from the manuscript.

      Reviewer #2

      Technical concerns:

      1) The culture condition is questionable. The authors saw no NMDAR current present during spontaneous recordings, which is worrisome since NMDARs should be active in cultures with normal network activity (Watt et al., 2000; Sutton et al., 2006).

      The (Watt et al., 2000) study recorded mEPSCs in 0 Mg2+ (Figure 1). The (Sutton et al., 2006) study also shows an average mEPSC waveform (Figure 1D) that was recorded from in 0 Mg2+. Our extracellular recording solution contains Mg2+ (1.3 mM) so we likely are not observing NMDA-mediated currents because they are blocked with Mg2+ when strong depolarizations are prevented with TTX in the recording solution. We will add the idea that the NMDA currents are blocked by Mg2+ to Material and Methods.

      It is important to ensure there is enough spiking activity before doing any activity manipulation.

      We agree that it would be best if network spiking activity were monitored alongside mEPSC recordings, for example by culturing on multi-electrode arrays. Data from these measurements might explain culture to culture variability in homeostatic responses. To our knowledge, most other studies investigating homeostatic plasticity do not monitor network spiking activity in the same cultures that assay mEPSC amplitudes. This is something that the field should move towards. We will add the caveat that activity was not directly measured to the manuscript.

      Similarly, it is also unknown whether spiking activity is normal in Rab3A KO/Ebd neurons.

      Since we did not measure spiking activity, we cannot address whether the disruption in homeostatic plasticity in cultures prepared from Rab3A KO and Rab3AEbd/Ebd mutant mice is due to an alteration in network activity. If activity were already low in cultures prepared from these genetically altered mice, we would expect mEPSC amplitudes to be increased, compared to those measured in cultures from WT animals. That is not the case in cultures from Rab3A KO mice, so it is unlikely that network activity is reduced. However, mEPSC amplitudes are increased in Rab3AEbd/Ebd cultures, leaving open this possibility. It would have to be a defect unique to neurons in culture, since the Rab3AEbd/Ebd mouse appears normal in every way, suggesting action potential activity is occurring in the brains of these animals in vivo. We will add the possibility that activity is altered in the cultures from Rab3AKO and Rab3AEbd/Ebd to the manuscript.

      2) Selection of mEPSC events is not conducted in an unbiased manner. Manually selecting events is insufficient for cumulative distribution analysis, where small biases could skew the entire distribution. Since the authors claim their ratio plot is a better method to detect the uniformity of scaling than the well-established rank-order plot, it is important to use an unbiased population to substantiate this claim.

      MiniAnalysis (a standard program used for mEPSC event detection and analysis) selects many false positives with the automated feature (due to the very small sizes of events that are close to the noise level) so manual re-evaluation of the automated process is necessary to eliminate false positives. As soon as there is a manual step, bias is introduced. Interestingly, a manual reevaluation step was applied in a recent study that describes their process as ‘unbiased” (Wu et al., 2020). The alternative is to apply a very large threshold, reducing or eliminating false positives. However, this has the effect of biasing the data towards large events. In sum, we do not believe it is currently possible to perform a completely unbiased detection process. We feel that it is important to include as many small events as possible to reduce the problem of having events in the TTX experimental group that were not matched by events in the control experimental group, for the rank order and ratio plots, so setting the threshold low and manually detecting events accomplishes this. We will add to the Materials and Methods section that the person selecting events did not have information on whether the record was from an untreated or a TTX-treated cell at the time of selection. All of these issues, the potential for skewing the CDFs, and bias potentially interfering in the true rank order and ratio relationships, are addressed by removal of the CDFs, ratio and rank-order plots from the manuscript.

      3) Immunohistochemistry data analysis is problematic. The authors only labeled dendrites without doing cell-fills to look at morphology, so it is questionable how they differentiate branches from pyramidal neurons and interneurons. Since glutamatergic synapses on these two types of neuron scale in the opposite directions, it is crucial to show that only pyramidal neurons are included for analysis.

      MAP2, in addition to labeling dendrites, also labels the cell body, and we used the cell structure revealed by MAP2 staining to select pyramidal-shaped neurons. The selection of the primary dendrite of a pyramidal neuron was stated in lines 239-240 in Materials and Methods and lines 1094 in the figure legend, but we had not explicitly stated how we knew it was a pyramidal neuron. We will include a low power picture of each of the selected pyramidal neurons in the revision.

      Conceptual concerns:

      The only novel finding here is the implicated role for Rab3A in synaptic scaling, but insights into mechanisms behind this observation are lacking. The author claims that Rab3A likely regulates scaling from the presynaptic side, yet there is no direct evidence from data presented. In its current form, this study's contribution to the field is very limited.

      We acknowledge that a presynaptic mechanism is involved in the regulation of homeostatic plasticity by Rab3A is not supported by direct evidence in cortical cultures in this study. But we disagree that the study’s contribution is very limited.

      The revised manuscript will emphasize that there are only two possible mechanisms by which Rab3A is acting in homeostatic plasticity. Either this presynaptic vesicle protein is regulating postsynaptic receptors (an extremely surprising result for which we do have direct evidence), or, it is regulating quantal size from both sides of the synapse (supported by direct evidence from our previous study at the mouse neuromuscular junction in vivo, where receptors are not being upregulated during homeostatic plasticity, and, by indirect evidence in the current study, that receptors and mEPSCs are not being identically regulated in the same cultures). Furthermore, the first idea that follows from the effect of Rab3A on receptors is that it would be regulating release of factors from astrocytes, since this is a mechanism that has been shown to be involved in homeostatic plasticity, and we clearly disprove this hypothesis.

      1) Their major argument for this is that homeostatic effects on mEPSC amplitudes and GluA2 cluster sizes do not match. This is inconsistent with reports from multiple labs showing that upscaling of mEPSC amplitude and GluA2 accumulation occur side by side during scaling (Ibata et al., 2008; Pozo et al., 2012; Tan et al., 2015; Silva et al., 2019).

      We agree with the reviewer that many studies show an increase in receptors and mEPSC amplitudes after activity blockade. This is why we were very surprised in our initial experiments to find that there was not a consistent robust increase in receptors in our cultures. At that point we were only imaging, and we assumed that it was homeostatic plasticity that was not always robust. We decided it was essential to measure mEPSC amplitudes and image receptors in the same cultures. We expected to observe larger and smaller effects on mEPSC amplitudes from culture to culture that were paralleled by larger and smaller effects on receptors, but this is not what happened. We have gone back to the literature to look more closely at whether variability across cultures has ever been shown for mEPSC amplitudes, receptors, or both. In a survey of 14 studies, none report results culture by culture. To our knowledge, we are the first to report this variability in the receptor response, and the lack of correlation between mEPSC amplitudes and receptor responses, in the same cultures. That said, for the 4 examples provided by the reviewer, only 1 reports evidence relevant to our study that receptors and mEPSC amplitudes ‘occur side by side,’ which is the (Ibata et al., 2008) study. Here, 24 hr of TTX treatment of rat cortical cultures causes synaptically localized GluA2 receptors in confocal imaging, and mEPSC amplitudes, to both increase to around 130%. The (Pozo et al., 2012) study is not a study of activity blockade but of the effects of overexpressing beta-integrins in rat hippocampal cultures, and this causes both GluA2 receptors and mEPSC amplitudes to increase, but the GluA2 level is not restricted to synaptic sites, and, is expressed as the surface fraction (surface receptor/total receptor—total receptor being surface intensity plus internalized intensity) which increases from 0.5 to 0.55, or to 110%, while mEPSC amplitude increases to ~180%. The (Tan et al., 2015) study only provides Western blot data to show an increase of receptors to 125% in mouse cortical cultures in response to 48 hr TTX, with mEPSC amplitudes increased to ~140%, but the Western blot technique measures synaptic and nonsynaptic receptors on excitatory and inhibitory neurons, as well as receptors on astrocytes. Finally, in (Silva et al., 2019), the culture conditions for the imaging data and the mEPSC amplitude data are markedly different, with ‘low-density’ Banker cultures being used for the former, and ‘high-density’ cultures used for the latter, and the protocol to induce activity blockade is different from ours (noncompetitive AMPA and NMDA blockers); synaptic GluA2 receptors are increased to ~280% and mEPSC amplitudes to ~170%. In the revision we will carefully summarize the previous evidence for receptors and mEPSC amplitude responses to activity blockade. Since it is known that different protocols trigger different molecular mechanisms, for example, TTX + APV triggers a homeostatic plasticity that can be completely reversed by acute application of blockers of Ca-permeable receptors, whereas TTX alone triggers a plasticity that is insensitive to these blockers (Sutton et al., 2006), Figure 4E; (Soden and Chen, 2010); Figure 4A), we will keep our discussion restricted to studies using TTX alone for at least 24 hr. We will acknowledge that our finding that GluA2 receptors and mEPSC amplitudes are not varying in lockstep from culture to culture suggests there is another contributor to mEPSC amplitude, but that we cannot rule out it is due to a greater variability in signal, or more issues with signal over noise, in imaging experiments compared to electrophysiology experiments.

      Studies surveyed about reporting results by culture:

      (Ju et al., 2004; Stellwagen et al., 2005; Shepherd et al., 2006; Sutton et al., 2006; Cingolani and Goda, 2008; Hou et al., 2008; Ibata et al., 2008; Chang et al., 2010; Hu et al., 2010; Jakawich et al., 2010; Beique et al., 2011; Tatavarty et al., 2013; Diering et al., 2014; Sanderson et al., 2018)

      Further, because the acquisition and quantification methods for mEPSC recordings and immunohistochemistry imaging are entirely different (each with its own limitations in signal detection), it is not convincing that the lack of proportional changes must signify a presynaptic component.

      We agree with the reviewer that there is no way to compare absolute levels from one type of experimental technique to another, but whatever differences in technical issues there are for the two techniques, they should cause systemic errors and should not contribute to the differences between experiments. Most of the issues with imaging come down to variability in the intensity of fluorescence from experiment to experiment, since the antibody solutions are made anew each time, as is the fixation solution. In addition, the confocal microscope function can vary over time and give brighter or dimmer images. But those kinds of artifacts are addressed by using the same solutions on control and TTX-treated coverslips, and imaging control and TTX-treated coverslips in the same single 2-3 hour imaging session, so that whatever issues there are, they cannot contribute to the TTX effect itself. Therefore when we compare the TTX effect (TTX measurements compared to untreated measurements) from culture to culture and find that in one WT culture there was no increase in receptors but there was in mEPSC amplitude, it is difficult to explain how a limitation specific to the antibody imaging technique could produce such a result. Similarly, when we get the opposite result, that in one KO culture, receptors increased but mEPSC amplitudes did not, it is unclear how limitations in signal detection would produce such a result in one culture but not another. The one exception to this is that the primary GluA2 antibody has to be added individually to each coverslip before returning the dishes to the incubator in order to avoid the disruption to live cells that a complete removal of media would have had. The only remaining ‘artifact’ that could explain the results would be a greater variability in the imaging experiments due to limitations in the signal or the signal to noise ratio. In the revision we will report additional characteristics of imaging experiments, such as average intensity for each coverslip, and for each experiment, to address whether variability in fluorescence levels could explain the variability in TTX effects we observe. We will include the possibility that the mismatches in GluA2 receptors and mEPSCs could be caused by greater variability in the imaging experiments.

      2) The authors also speculate in the discussion that presynaptic Rab3A could be interacting with retrograde BDNF signaling to regulate postsynaptic AMPARs. Without data showing Rab3A-dependent presynaptic changes after TTX treatment, this argument is not compelling. In this retrograde pathway, BDNF is synthesized in and released from dendrites (Jakawich et al., 2010; Thapliyal et al., 2022), and it is entirely possible for postsynaptic Rab3A to interfere with this process cell-autonomously.

      In the revision, the model will focus on the direct findings of the manuscript and tone down the speculation about BDNF signaling, but in the Discussion we will add the possibility that a Rab3A-BDNF interaction could occur either presynaptically or postsynaptically. Interestingly, these articles suggest the postsynaptic BDNF is affecting presynaptic function, namely mEPSC frequency. It is conceivable it could presynaptically affect the vesicle’s release of transmitter.

      3) The authors propose that a change in AMPAR subunit composition from GluA2-containing ones to GluA1 homomers may account for the distinct changes in mEPSC amplitudes and GluA2 clusters. However, their data from the Naspm wash-in experiments clearly show that GluA1 homomer contributions have not changed before and after TTX treatment.

      Our apologies to the reviewer that we were not clear on this point. In lines 396 to 400 we were describing the significant effects that NASPM had on mEPSC frequency on both untreated and TTX-treated cells, despite having only modest, and not quite significant effects on mEPSC amplitude. We conclude from these results that there are synaptic sites that have only GluA1 homomers, and the mEPSCs from these sites are blocked 100% by NASPM. There may be an increase in such GluA1-only synapses after activity blockade, but nevertheless, these events do not contribute to the amplitude increase. So we did not mean to suggest that there is a shift from Glua2 containing to GluA1 containing receptors that leads to the amplitude increase and fully agree with the reviewer that the GluA1 homomer contributions to amplitude have not changed before and after TTX. We will clarify the difference between the contribution of GluA1 homomers to amplitude and frequency in the revised manuscript.

      Reviewer #3

      Summary: The authors clearly demonstrate the Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength seems already elevated. In this context, it is unclear if the plasticity is absent or just occluded by a ceiling effect due the synapses already being strengthened. The authors do appropriately discuss both options. There are also differences in genetic background between the Rab3A KO and Earlybird mutants that could also impact the results, which are also noted. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between synaptic strength during HSP and AMPA receptor trafficking, and conclude that trafficking is largely not responsible for the changes in synaptic strength.

      Strengths: This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms.

      Weaknesses: However, the rather strong conclusions on the dissociation of AMPAR trafficking and synaptic response are made from somewhat weaker data. The key issue is the GluA2 immunostaining in comparison with the mESPC recordings. Their imaging method involves only assessing puncta clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, judging from the sample micrographs (Fig 5). To my knowledge, this is a new and unvalidated approach that could represent a particular subset of synapses not representative of the synapses contributing to the mEPSC change. (they are also sampling different neurons for the two measurements; an additional unknown detail is how far from the cell body were the analyzed dendrites for immunostaining. While the authors acknowledge that a sampling issue could explain the data, they still use this data to draw strong conclusions about the lack of AMPAR trafficking contribution to the mEPSC amplitude change. This apparent difference may be a methodological issue rather than a biological one, and at this point it is impossible to differentiate these. It will unfortunately be difficult to validate their approach. Perhaps if they were to drive NMDA-dependent LTD or chemLTP, and show alignment of the imaging and ephys, that would help. More helpful would be recordings and imaging from the same neurons but this is challenging. Sampling from identified synapses would of course be ideal, perhaps from 2P uncaging combined with SEP-labeled AMPARs, but this is more challenging still. But without data to validate the method, it seems unwarranted to make such strong conclusions such as that AMPAR trafficking does not underlie the increase in mEPSC amplitude, given the previous data supporting such a model.

      We chose the primary dendrite to ensure we were not assaying dendrites from inhibitory neurons or on axons, but we will add in the revision that it is a limitation of our methods that we are not sampling all the synapses for each neuron. The majority of previous studies that establish that receptors are increased side by side with mEPSCs did not measure receptors and mEPSCs in the same cells, nor even in the same cultures. There is a recent study which employs dual recordings, transfection of GluA2 and VGlut1 constructs, and infusion of dyes to highlight cell morphology (Letellier et al., 2019), so in principle an experiment could be done in which synaptic GluA2 sites are imaged in a cell in which the mEPSCs are also measured. It would be difficult to make these measurements in the same cells before and after TTX treatment, since there is a high likelihood of damaging the cell upon electrode withdrawal and with the imaging process itself. In theory, only a few such experiments would be necessary to establish whether receptors and mEPSC amplitudes are varying in lockstep, and we will consider this for a future study. As stated in response to conceptual concern #1 in Reviewer 2’s comments, we will review the literature on previous studies’ demonstrations of increases in receptors and mEPSC amplitudes following activity blockade in more detail, including how the synaptic sites to be imaged were chosen, to address whether our selection of sites touching the primary dendrite is unvalidated.

      A sample from 3 articles:

      (Ibata et al., 2008), only information is that ‘distal dendrites’ were examined. The authors do not use a dendritic label. (Jakawich et al., 2010), ‘neurons with pyramidal-like morphology were selected for imaging,’ and ‘principal dendrite of each neuron was linearized’—but how these were identified is not clear, since MAP2 or other cellular labels are not described.

      (Silva et al., 2019), ‘dendrites with similar thickness and appearance were randomly selected using MAP2 staining,’ which suggests synaptic sites with GluA2 and VGLUT1 were selected on the basis of being close to or touching the MAP2 positive dendrite, although this is not stated explicitly.

      We can perform length measurements on the dendrites imaged and report this information in the revision, but the primary dendrite is the closest dendrite to the cell body.

      We have addressed the potential contribution of technical artifacts arising from the two distinct methods of measurement, imaging and electrophysiology, in our response to conceptual concern #1 of Reviewer 2.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is quite unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. It is also unclear why the authors argue this proves that the NASPM was at an effective concentration (lines 399-400).

      We observed a clear effect of NASPM reducing mEPSC frequency. We will state more clearly that we infer from the loss of mEPSCs after NASPM that such mEPSCs were from synaptic sites that had only GluA1 homomers, and acknowledge that this is an interpretation. We will also clarify that if our inference is correct, it would indicate that the dose of NASPM we used was 100% effective at blocking GluA1 homomers. The alternative explanation would be a presynaptic effect of NASPM, which has never been reported, to our knowledge.

      Further, the amplitude data show a strong trend towards smaller amplitude. The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. And the decrease is larger in the TTX neurons. Considering the strong claims for a pre-synaptic and the use of this data to justify only looking at GluA2 by immunostaining, these data do not offer much support of the conclusions. Between the sampling issues and perhaps looking at the wrong GluA subunit, it seems premature to argue that trafficking is not a contributor to the mEPSC amplitude change, especially given the substantial support for that hypothesis. Further, even if trafficking is not the major contributor, there could be shifts in conductance (perhaps due to regulation of auxiliary subunits) that does not necessitate a pre-synaptic locus. While the authors are free to hypothesize such a mechanism, it would be prudent to acknowledge other options and explanations.

      We did not mean to suggest that there is no effect of NASPM on mEPSC amplitude. We will clarify that our data indicate that there is no effect of NASPM on the TTX effect on mEPSC amplitude. We agree with the reviewer that the effect of NASPM on frequency is of larger magnitude after TTX treatment, although the p value is larger than that for untreated cells, likely due to greater variability. We interpret this to mean that TTX treatment increases the proportion of synapses that have only GluA1 homomers. Nevertheless, the increase in GluA1 homomer sites does not appear to contribute to the overall increase in amplitude following TTX treatment, and we wanted to find the mechanism of the amplitude increase. That is why we focused on GluA2 receptors. We will acknowledge the limitation of basing our conclusions on only GluA2 receptors in the revision, as well as the possibility that there is a change in conductance. As stated in our response to Reviewer 2, we do not mean to state that GluA2 receptors do not go up after activity blockade, we find that this is the case. We are proposing an additional mechanism contributing to mEPSC amplitude to explain the different responses for GluA2 receptors vs. mEPSC amplitudes in some of the 6 matched experiments (3 WT and 3 KO).

      The frequency data are missing from the paper, with the exception of the NASPM dataset. The mEPSC frequencies should be reported for all experiments, particularly given that Rab3A is generally viewed as a pre-synaptic protein regulating release. Also, in the NASPM experiments, the average frequency is much higher in the TTX treated cultures. Is this statistically above control values?

      We will report frequency measurements for all experiments shown. Following TTX treatment, frequency variability increases enormously, with cells having as high as > 10 mEPSCs per second, and other TTX-treated cells with frequencies as low as < 1 mEPSC per second, so the TTX effect on frequency, and whether this effect is present or not in Rab3A KO and Rab3AEbd/Ebd is not completely clear, which is why we did not include those results previously.

      Unaddressed issues that would greatly increase the impact of the paper:

      1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role (and particularly the hypothesized and somewhat novel idea that the amount of glutamate released per vesicle is altered in HSP). They could use sparse knock-down of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree with the reviewer that this is the most important question to answer next. The approach suggested by the reviewer would be to record from Rab3A KO neurons in a culture where the majority of its inputs are Rab3A positive. If the TTX effect is absent from these cells, it would strongly indicate that postsynaptic Rab3A is required for homeostatic plasticity. There are not currently transgenic mice expressing GFP forms of Rab3A, so we would have to create one, or, transiently transfect Rab3A-GFP into Rab3AKO neurons. Given that under our experimental conditions, we require a very high density of neurons to observe the increase in mEPSC amplitude, it would be difficult to get the ratio of Rab3A-expressing neurons high enough using transfection to be sure that a given postsynaptic cell lacking Rab3A had a normal number of Rab3A-positive inputs and almost no Rab3A-negative inputs. It may be that the opposite experiment is more doable—an isolated Rab3A-positive neuron in a sea of Rab3A-negative neurons, which could be accomplished with a very low transfection efficiency. Another approach would be to use the fast off rate antagonist gamma-DGG, which is more effective against low glutamate concentrations than high glutamate concentrations (see (Liu et al., 1999; Wu et al., 2007). If gamma-DGG were less effective at reducing mEPSC amplitude in TTX-treated cells, compared to untreated cells, it would support the hypothesis that activity blockade leads to an increase in the amount of transmitter per vesicle fusion event. Further, if the change in gamma-DGG sensitivity after activity blockade were disrupted in cultures from Rab3A KO cells, it would support a presynaptic role for Rab3A in homeostatic plasticity of mEPSC amplitude. We have begun these experiments but are finding the surprising result that within a single recording, small mEPSCs and large mEPSCs appear to be differentially sensitive to gamma-DGG. To confirm that this is a biological characteristic, rather than an issue with the detection threshold, we will be repeating our experiments with a slow off rate antagonist that has same effect regardless of transmitter concentration. The complexity of these results precludes including them in the current manuscript.

      2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs and/or a decrease of GABA-packaging in vesicles (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      The next question, after it is determined where Rab3A is acting, is whether it is required for other forms of homeostatic plasticity. This includes plasticity of GABA mIPSCs on pyramidal neurons, but also mEPSCs on inhibitory neurons, and, the downscaling of mEPSCs (and upscaling of mIPSCs) when activity is increased, by bicuculline for example. We will add a statement about future experiments examining other forms of plasticity to the discussion, and include examples where a molecular mechanism has mediated multiple forms, and those that have been shown to be very specific.

      Beique JC, Na Y, Kuhl D, Worley PF, Huganir RL (2011) Arc-dependent synapse-specific homeostatic plasticity. Proc Natl Acad Sci U S A 108:816-821.

      Chang MC, Park JM, Pelkey KA, Grabenstatter HL, Xu D, Linden DJ, Sutula TP, McBain CJ, Worley PF (2010) Narp regulates homeostatic scaling of excitatory synapses on parvalbumin-expressing interneurons. Nat Neurosci 13:1090-1097.

      Cingolani LA, Goda Y (2008) Differential involvement of beta3 integrin in pre- and postsynaptic forms of adaptation to chronic activity deprivation. Neuron Glia Biol 4:179-187.

      Diering GH, Gustina AS, Huganir RL (2014) PKA-GluA1 coupling via AKAP5 controls AMPA receptor phosphorylation and cell-surface targeting during bidirectional homeostatic plasticity. Neuron 84:790-805.

      Hanes AL, Koesters AG, Fong MF, Altimimi HF, Stellwagen D, Wenner P, Engisch KL (2020) Divergent Synaptic Scaling of Miniature EPSCs following Activity Blockade in Dissociated Neuronal Cultures. J Neurosci 40:4090-4102.

      Hou Q, Zhang D, Jarzylo L, Huganir RL, Man HY (2008) Homeostatic regulation of AMPA receptor expression at single hippocampal synapses. Proc Natl Acad Sci U S A 105:775-780.

      Hu JH, Park JM, Park S, Xiao B, Dehoff MH, Kim S, Hayashi T, Schwarz MK, Huganir RL, Seeburg PH, Linden DJ, Worley PF (2010) Homeostatic scaling requires group I mGluR activation mediated by Homer1a. Neuron 68:1128-1142.

      Ibata K, Sun Q, Turrigiano GG (2008) Rapid synaptic scaling induced by changes in postsynaptic firing. Neuron 57:819826.

      Jakawich SK, Nasser HB, Strong MJ, McCartney AJ, Perez AS, Rakesh N, Carruthers CJ, Sutton MA (2010) Local presynaptic activity gates homeostatic changes in presynaptic function driven by dendritic BDNF synthesis. Neuron 68:1143-1158.

      Ju W, Morishita W, Tsui J, Gaietta G, Deerinck TJ, Adams SR, Garner CC, Tsien RY, Ellisman MH, Malenka RC (2004) Activity-dependent regulation of dendritic synthesis and trafficking of AMPA receptors. Nat Neurosci 7:244-253.

      Lazic SE, Clarke-Williams CJ, Munafo MR (2018) What exactly is 'N' in cell culture and animal experiments? PLoS Biol 16:e2005282.

      Liu G, Choi S, Tsien RW (1999) Variability of neurotransmitter concentration and nonsaturation of postsynaptic AMPA receptors at synapses in hippocampal cultures and slices. Neuron 22:395-409.

      Pozo K, Cingolani LA, Bassani S, Laurent F, Passafaro M, Goda Y (2012) beta3 integrin interacts directly with GluA2 AMPA receptor subunit and regulates AMPA receptor expression in hippocampal neurons. Proc Natl Acad Sci U S A 109:1323-1328.

      Sanderson JL, Scott JD, Dell'Acqua ML (2018) Control of Homeostatic Synaptic Plasticity by AKAP-Anchored Kinase and Phosphatase Regulation of Ca(2+)-Permeable AMPA Receptors. J Neurosci 38:2863-2876.

      Shepherd JD, Rumbaugh G, Wu J, Chowdhury S, Plath N, Kuhl D, Huganir RL, Worley PF (2006) Arc/Arg3.1 mediates homeostatic synaptic scaling of AMPA receptors. Neuron 52:475-484.

      Silva MM, Rodrigues B, Fernandes J, Santos SD, Carreto L, Santos MAS, Pinheiro P, Carvalho AL (2019) MicroRNA186-5p controls GluA2 surface expression and synaptic scaling in hippocampal neurons. Proc Natl Acad Sci U S A 116:5727-5736.

      Soden ME, Chen L (2010) Fragile X protein FMRP is required for homeostatic plasticity and regulation of synaptic strength by retinoic acid. J Neurosci 30:16910-16921. Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Sutton MA, Ito HT, Cressy P, Kempf C, Woo JC, Schuman EM (2006) Miniature neurotransmission stabilizes synaptic function via tonic suppression of local dendritic protein synthesis. Cell 125:785-799.

      Tan HL, Queenan BN, Huganir RL (2015) GRIP1 is required for homeostatic regulation of AMPAR trafficking. Proc Natl Acad Sci U S A 112:10026-10031.

      Tatavarty V, Sun Q, Turrigiano GG (2013) How to scale down postsynaptic strength. J Neurosci 33:13179-13189.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Watt AJ, van Rossum MC, MacLeod KM, Nelson SB, Turrigiano GG (2000) Activity coregulates quantal AMPA and NMDA currents at neocortical synapses. Neuron 26:659-670.

      Wu XS, Xue L, Mohan R, Paradiso K, Gillis KD, Wu LG (2007) The origin of quantal size variation: vesicular glutamate concentration plays a significant role. J Neurosci 27:3046-3056.

      Wu YK, Hengen KB, Turrigiano GG, Gjorgjieva J (2020) Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics. Proc Natl Acad Sci U S A 117:24514-24525.

    1. Reviewer #3 (Public Review):

      The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions, and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field. However, its main point is to promote high-resolution 2D template matching (2DTM) as a more universal analysis method for in vitro and, more importantly, in situ data. While the experiments performed to that end are sound and well-executed in principle, I fail to make that specific conclusion from their results.

      The authors correctly point out that overfitting is largely enabled by the presence of false-positives in the data set. They go on to perform their in situ experiments with ribosomes, which provide an extremely favorable amount of signal that is unrealistic for the vast majority of the proteome. This seems cherry-picked to keep the number of false-positives and false-negatives low. The relationship between overfitting/false-positive rate and the picking threshold will remain the same for smaller proteins (which is a very useful piece of knowledge from this study). However, the false-negative rate will increase a lot compared to ribosomes if the same high picking threshold is maintained. This will limit the applicability of 2DTM, especially for less-abundant proteins.

      I would like to see an ablation study: Take significantly smaller segments of the ribosome (for which the authors already have particle positions from full-template matching, which are reasonably close to the ground-truth), e.g. 50 kDa, 100 kDa, 200 kDa etc., and calculate the false-negative rate for the same picking threshold. If the resulting number of particles does plummet, it would be very helpful to discuss how that affects the utility of 2DTM for non-ribosomes in situ.

      Another point of concern is the dramatic resolution decrease to 8 A after multiple iterations of refinement against experimental reconstructions described in line 159. Was this a local search from the poses provided by 2DTM, or something more global? While this is not a manifestation of overfitting as the authors have conclusively shown, I think it adds an important point to the ongoing "But do we really need tomograms, or can we just 2D everything?" debate in the field, which is also central to the 2D part of 2DTM. Reaching 8 A with 12k ribosome particles would be considered a rather poor subtomogram averaging result these days. Being in the "we need tilt series to be less affected by non-Gaussian noise" camp myself, I wonder if this indicates 2D images are inherently worse for in situ samples. If they are, the same limitations would extend to template matching. In that case, shouldn't the authors advocate for 3DTM instead of 2DTM? It may not be needed for ribosomes, but could give smaller proteins the necessary edge.

      Right now, this study is also an invitation to practitioners who do not understand the picking threshold used here and cannot relate it to other template-matching programs to do a lot of questionable template matching and claim that the results are true because templates are "unoverfittable". I think such undesirable consequences should be discussed prominently.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      I - General criticisms

      Reviewer #1: My main criticism is unfortunately inherent to the approach: comparative studies are absolutely critical, but they can only provide a very sparse sampling of diversity. Fortunately, thanks to high-throughput sequencing, bioinformatic analyses can now be performed on a large number of species, but experimental validation is typically restricted to two or three species. The consequence of this for the present manuscript is that while the functional conservation of the Gwl site is convincingly shown, the exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative.

      We completely agree with the reviewer that comparative approaches are critical to understanding biological mechanisms, and are excited by the increasing possibilities to perform not only sequence and descriptive comparisons but functional studies across a range of emerging model organisms. We hope that more and more researchers in cell and molecular biology will profit from experimental tools and techniques now available in such species, and to pioneer new ones. Of course, and he/she rightly points out, conclusions are currently limited by the number of species studied, but comparisons between two judiciously chosen species can already be very informative. Thus, in our study, the use of Xenopus and Clytia allowed us to make significant progress towards our main objective of understanding the cAMP-PKA paradox in the control of oocyte maturation; specifically by showing both that PKA phosphorylation of Clytia ARPP19 is lower in efficiency and that the phosphorylated protein has a lower effect on oocyte maturation than the Xenopus protein. As the reviewer points out, unravelling the exact mechanisms underlying these differences will require a large amount of additional work and is beyond the scope of the current study. Actually, we have embarked on several series of experiments to this end using some of the approaches listed in the Discussion. Specifically, we are testing the biochemical and functional properties of chimeric constructs containing the consensus PKA site from various species. This is a substantial undertaking which will require one to two years to complete, but is already giving some very interesting findings.

      Reviewer #1: The figures and text could be slightly condensed down to about 6 figures.

      We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      ____________II - Abstract

      As recommended by Reviewer #2, we have reworked the Abstract to make it more accessible to new readers, attempting to bring out more clearly and simply the main results and conclusions of the study. We correspondingly simplified and shortened the title of the article. Changes: Page 2.

      ____________III- Introduction points

      Reviewer #2: I believe that it would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. How long is prophase arrest in Xenopus compared to Clytia or other organisms? How can this affect the prophase arrest mechanisms? It seems that the prophase arrest in Xenopus oocytes is found to be significantly more prolonged compared to Clytia and various other organisms, and also meiotic maturation proceeds much more rapidly in Clytia than in Xenopus. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study.

      Differences in timing of oocyte prophase arrest and in maturation kinetics across animals are indeed highly relevant in relation to the underlying biochemical mechanisms. Unfortunately, not enough information is currently available concerning the duration of the successive phases of oocyte prophase arrest across species to make any meaningful correlations with PKA regulation of maturation initiation. We have nevertheless expanded the Introduction to cover this issue as follows:

      • We start the introduction by mentioning how the length of the prophase arrest varies across species. Changes: Page 3, lines 5-11.
      • We have added examples of species which likely have similar durations of prophase arrest but show cAMP-stimulated vs cAMP-inhibited release. Changes: Page 4, lines 28-35.
      • We have specified the temporal differences in meiotic maturation in Xenopus (3-7 hrs) and Clytia (10-15 min). Changes: Page 5, lines 32-33.

      Reviewer #2: why, and not others, were these species [Xenopus, Clytia] chosen for this study. A brief justification is included in lines 1-page 5 "..a laboratory model hydrozoan species well suited to oogenesis studies", but it does not explain why this and not other hydrozoan species like Hydra, that has also been used for meiosis studies.

      As requested by Reviewer #2, fuller details are now included about the advantages of Clytia compared to other hydrozoan species, citing several articles and recent reviews here and also in the Discussion. Changes: Page 5, lines 21-32 & 37-39.

      Hydra is a classic cnidarian experimental species and has proved an extremely useful model for regeneration and body patterning, but is not suitable for experimental studies on oocyte maturation because spawning is hard to control and fully-grown oocytes cannot easily be obtained, manipulated or observed. In contrast many hydromedusae (including Clytia, Cytaeis, and Cladonema) have daily dark/light induced spawning and accessible gonads, so provide great material for studying oogenesis and maturation. Of these, Clytia has currently by far the most advanced molecular and experimental tools.

      Reviewer #2: The proteins MAPK is not introduced properly, as it is first mentioned in the results section in line 12. Given the importance of the results provided with it, it should be presented in the introduction prior to the results section.

      As requested by Reviewer #2, the involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced, explaining how its phosphorylation serves as a marker of Cdk1 activation. Changes: Page 5, lines 1-5.

      Reviewer #2: These sentences need a more elaborate explanation: Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways?

      We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation initiation in the starfish. Changes: Page 4, lines 1-15.

      Reviewer #2: Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. If the general pathway has been identified but the signaling cascade is still not well described, please indicate that in a clearer way.

      We apologise that the wording we used was not clear and implied that the mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 were poorly understood. On the contrary, they are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      ____________IV - Results

      Reviewer #2: The text of the results is generally well described; however, all the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. For example, in page 5 lines 12-25, or page 7 lines 30-37, are all introduction information.

      As requested by Reviewer #2, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. We did not restrict to the two examples cited by the reviewer, but have shortened all the Results passages that repeat information already provided in the Introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      Reviewer #2: Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described.

      We have followed the reviewer's recommendation. The explanation of the experiments and the results are more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      Reviewer #2: Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion?

      Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but lysis can happen either immediately following injection or during the natural exaggerated cortical contraction waves that accompany meiotic maturation, suggesting that it relates to mechanical trauma. We have expanded this paragraph and the legend of Fig. 3C to explain these injection experiments more fully in the text and to clarify these issues. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      Same paragraph: Lines 25-27 of page 8. Text reads, "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD.". Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion.

      As explained above, we could not increase the concentrations of ARPP19 protein beyond 4mg/ml. It is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte.

      Concerning OA, it is well documented in many systems including Xenopus, starfish and mouse oocytes as well as mammalian cell cultures, that high concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation (Goris et al, 1989; Rime & Ozon, 1990; Alexandre et al, 1991; Boe et al, 1991; Gehringer, 2004; Maton el al, 2005; Kleppe et al, 2015). Specific tests in Xenopus oocytes, have shown that injecting 50 nl of 1 or 2 mM OA specifically inhibits PP2A, while injecting 5 mM also targets PP1 and higher OA concentrations inhibit all phosphatases. For these reasons, we did not increase OA concentrations over 2 mM. When injected in Xenopus oocyte at 1 or 2 mM, OA induces Cdk1 activation, GVBD but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 mM in Clytia oocytes, OA does not induce Cdk1 activation nor GVBD but promotes cell lysis. This supports the conclusion that 2 mM OA is sufficient to inhibit PP2A (and possibly other phosphatases) but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia.

      We have reworded the relevant text to make these points clearer. The previous statement that “we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD” has been removed because it was unnecessarily cautious in the context of the literature cited above, as now fully explained_._ Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      References: Alexandre et al, 1991, doi: 10.1242/dev.112.4.971; Boe et al, 1991, doi: 10.1016/0014-4827(91)90523-w; Gehringer, 2004, doi: 10.1016/s0014-5793(03)01447-9; Goris et al, 1989, doi: 10.1016/0014-5793(89)80198-x; Kleppe et al, 2015, doi: 10.3390/md13106505; Maton el al, 2005, doi: 10.1242/jcs.02370; Rime & Ozon, 1990, doi: 10.1016/0012-1606(90)90106-s

      Reviewer #2: Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19".

      We realise that we had not explained clearly enough how the thiophosphorylation assay works. In this assay, γ-S-ATP will be incorporated into any amino acid of ClyARPP19 phosphorylatable by PKA. The observed thiophosphorylation of the wild-type protein, demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      ____________V - Figures and text related to the figures

      Figure 1A

      Reviewer #2: Why is mouse not included in Figure 1A? Although it might be very similar to human, given that mouse is the species that is most commonly use as a mammalian model, I believe it could be included. However, this is optional upon decision by the authors.

      We have replaced the human sequence in Figure 1A with the mouse sequence as suggested. The sequences of each of the mouse and human ENSA/ARPP19 proteins are indeed virtually identical across mammals. Changes: Fig. 1A.

      Figure 1C

      Reviewer #2: There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Note that Clytia is not a commonly used species, therefore images should be properly explained for general readers. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic, plus any additional information needed to understand the image. In addition, the detection of ARPP19 in the nerve rings is intriguing. This is mentioned in the discussion section, any idea of its function there? Please include some additional information or additional references, if they exist.

      We have expanded the explanations of Fig. 1C in the text and in the figure legend. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components including mRNAs are significantly diluted by high quantity of yolk proteins as the oocytes grow to full size. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      Nothing is known about the function of ARPP19 in the Clytia nervous system. The only data linking ARPP19 and the nervous system concerns mammalian ARPP16, an alternatively spliced variant of ARPP19. ARPP16 is highly expressed in medium spiny neurons of the striatum and likely mediates effects of the neurotransmitter dopamine acting on these cells (Andrade et al, 2017; Musante et al, 2017). This point is included in the Discussion in relation to the hypothesis that PKA phosphorylation of ARPP19 proteins in animals first arose in the nervous system and only later was coopted into oocyte maturation initiation. Changes: page 16, lines 12-13 & 17-20 - page 19, lines 6-9.

      Figure 2A

      Reviewer #1: Fig. 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)?

      As requested by reviewer #1, the x-axis is no longer cut. The number of oocytes for each experiment is now provided in the legend of Fig. 2A and in similar plots of Fig. 5A and 5D. Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      Figure 2D-E (as well as Figure 6C-D and Figure 8B-C)

      Reviewer #1: Fig. 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot.

      We added all the data points to the concerned plots: 2D, 6C and 8B. As recommended by reviewer #1, we combined on a single plot the phosphorylation levels and the half-times. 2D-E => 2D, 6C-D => 6C and 8B-C => 8B. Changes: Figs 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      Figure 3A and 3B

      Reviewer #1: Fig. 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane.

      In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. We had separated them on the figure to make it clear that the membrane had been cut. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Figure 3C

      Reviewer #2: Fig. 3C needs a better explanation in the text. The way these graphs are presented is somehow confusing. The meaning of the dots is not self-explanted in the graph, and it seems that each experiment was done independently but then the complete set of results is presented. Legend says that "each dot represents one experiment" but this is difficult to read as in every analysis the figure also indicates the average and the total number of oocytes. If authors wish so, they can keep the figure as it is, but then please explain this graph better in the text, and please include statistical analysis. These results are very robust, but a comparison between the number of oocytes that go through spontaneous GVBD of lysis in the different conditions will benefit their understanding.

      This figure is intended to provide an overview of all the Clytia oocyte injection experiments that we performed, for which full details are given in Supplementary Table 1. Since these experiments were not equivalent in terms of exact timing and types of observation (or films) made and oocyte sensitivity to injection -as ascertained by buffer injections-, it is not justified to make statistical comparisons between groups. We apologise that the presentation was misleading in this respect and hope that the new version is easier to understand. We removed from the figure the average percentage of maturation for each condition between experiments to avoid any misunderstanding of the nature of the data, and rather represent the values of each experiment independently. We also now explain the data included in the figure fully in the text and figure legend. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      Reviewer #2: Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions (Fig 3C). Given that in the control experiments with buffer this effect is also observed in some oocytes, please explain if this is caused by a mechanical disruption of the oocyte during the injection. In contrast, okadaic acid induces the lysis in all the 14/14 oocytes analyzed, is this due also to the mechanical approach? Or is there other reason more related to the PP2A inhibition? Please explain.

      These points are treated above in the response to this reviewer concerning the Results section.

      Figure 5

      Reviewer #2: In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed?

      The binding partners/effectors of XeARPP19-S109D that are involved in maintaining the prophase arrest have not yet been identified. The most probable explanation of the delay in meiotic maturation induced by ClyARPP19-S109D is that Clytia protein recognizes less efficiently these unknown ARPP19 effectors that mediate the prophase arrest. As a result, maturation would be delayed, but not blocked. This explanation was provided in the Discussion (page 17, lines 14-17) and is now mentioned in the Results section. Changes: page 11, lines 16-19.

      ____________VI - Discussion

      Reviewer #2: Although it presents highly interesting suggestions, discussion may border on being overly speculative, especially from line 37 of page 15 till the end.

      We agree and have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios in a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

      SUMMARY - Point by point responses to individual reviewers’ comments in their order of appearance.

      Reviewer 1

      • The figures and text could be slightly condensed down to about 6 figures.

      We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      • The exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative.

      As the reviewer points out, unravelling these exact mechanisms will require a large amount of additional work and is beyond the scope of the current study.

      • 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)?

      Fig. 2A has been changed in line with the reviewer's request (as well as similar plots in Fig. 5A and 5D). Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      • 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot.

      Fig. 2D has been changed in line with the reviewer's request (as well as similar plots in Figs 6C-D and 8B-C). Changes: Fig. 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      • 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane.

      In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Reviewer 2

      • Abstract needs to be simplified if wants to reach a broader range of readers.

      We have reworked the Abstract to make it more accessible to new readers. Changes: Page 2.

      • It would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study.

      We have expanded the Introduction to cover the issue of time-references. Fuller details are now included about the advantages of Clytia compared to other hydrozoan species. Changes: Page 3, lines 5-11, page 4, lines 28-35, page 5, lines 32-33, page 5, lines 21-32 & 37-39.

      • The proteins MAPK is not introduced properly, as it is first mentioned in the results section.

      The involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced. Changes: Page 5, lines 1-5.

      • Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways?

      We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation in starfish, also mentioning that in many species the pathways are still unknown. Changes: Page 4, lines 1-15.

      • Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory.

      The mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      • Why is mouse not included in Figure 1A?

      We have replaced the human sequence in Figure 1A with the mouse sequence. Changes: Fig. 1A.

      • 1C: There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic.

      We have expanded the explanations of Fig. 1C in the text. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components are significantly diluted by high quantity of yolk proteins. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      • In addition, the detection of ARPP19 in the nerve rings is intriguing. Any idea of its function there?

      The only data linking ARPP19 and the nervous system concerns a mammalian variant of ARPP19 that is highly expressed in the striatum. This point is included in the Discussion_. Changes: page 16, lines 12-13 & 17-20 - page 19, lines 6-9._

      • Figure 3C. The way these graphs are presented is somehow confusing. If authors wish so, they can keep the figure as it is, but then Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions. please explain this graph better in the text, and please include statistical analysis.

      This figure is intended to provide an overview of all the Clytia oocyte injection experiments, for which full details are given in Supplementary Table 1. We have modified the figure and now clarified this fully in the text and figure legend. Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but it probably relates to mechanical trauma. We now explain these injection experiments more fully in the text. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      • In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed?

      The most probable explanation is that Clytia protein recognizes less efficiently the unknown ARPP19 effectors that mediate the prophase arrest in Xenopus. This explanation is provided in the Results section. Changes: page 11, line 16-19.

      • All the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible.

      As requested, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      • Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described.

      The explanation of the experiments and the results are now more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      • Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion?

      As explained above, increasing injection volumes or protein concentrations increases the levels of lysis observed due probably to mechanical trauma. But it is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 25-27 of page 8. "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD." Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion.

      High OA concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation. For these reasons, we cannot increase OA concentrations over 2 µM. When injected in Xenopus oocyte at 1 or 2 µM, OA induces Cdk1 activation, but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 µM in Clytia oocytes, OA does not induce Cdk1 activation but promotes cell lysis. This supports the conclusion that 2 µM OA is sufficient to inhibit PP2A but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia. We have reworded the relevant text. Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19".

      The observed thiophosphorylation of the wild-type protein demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      • Some parts of the discussion are a bit speculative.

      We have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios into a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary of the main findings of the study.

      This work presents very interesting data about the maintenance and release of the prophase arrest of oocytes during sexual reproduction. Authors approach some of the remaining questions about oocyte maturation in animals by taking a comparative approach between two species (Clytia and Xenopus) that use opposing cAMP/PKA signaling pathways to trigger oocyte maturation. To do it they focused on phosphorylation characteristics and function of the regulatory protein ARPP19 from the amphibian Xenopus and its orthologue in the hydrozoan Clytia. Results suggest that the low capacity of Clytia ARPP19 to be phosphorylated by PKA. Moreover, Clytia ARPP19 is inherently a poorer PKA substrate than Xenopus ARPP109 both in vivo and in vitro, despite the presence of a functional PKA site. In addition, the absence of functional interactors mediating its negative effects on Cdk1 activation may provide a double security allowing induction of meiosis resumption in Clytia by elevated PKA activity despite the presence of ARPP19, while additional and yet unidentified mechanisms ensure the Clytia oocyte prophase arrest.

      Minor comments: read detailed review below. Figure 1 and Figure 3 need a better explanation of the results. Abstract needs to be simplified if wants to reach a broader range of readers. Some parts of the discussion are a bit speculative.

      Overall, this work used a robust set of molecular experiments that strongly support the conclusions of the study.

      Significance

      Strengths and limitations of this work:

      The primary strength of this work lies in its innovative use of two distinct species and the integration of molecular experiments to extract conclusions from their different signaling pathways. The well-designed and executed experiments, particularly those of figures 5-9, contribute to an elaborated exploration of the topic, elucidating the underlying mechanisms with clarity. The explanation of each experiment in the results section further adds to the clarity and depth of the study.

      The abstract requires improvement, particularly from lines 10 to 21, as it becomes fully understood only after reading the entire manuscript. To make the work more accessible to new readers, it would be good to present the abstract in a more approachable manner. Figures 1C and 3C need a better explanation in the text. Additionally, some sentences would benefit from citations or further clarification in the results or discussion section. Although is presents highly interesting suggestions, discussion may border on being overly speculative, especially from line 37 of page 15 till the end.

      Detailed review

      Introduction:<br /> I believe that it would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. How long is prophase arrest in Xenopus compared to Clytia or other organisms? How can this affect the prophase arrest mechanisms? It seems that the prophase arrest in Xenopus oocytes is found to be significantly more prolonged compared to Clytia and various other organisms, and also meiotic maturation proceeds much more rapidly in Clytia than in Xenopus. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study. A brief justification is included in lines 1-page 5 "..a laboratory model hydrozoan species well suited to oogenesis studies", but it does not explain why this and not other hydrozoan species like Hydra, that has also been used for meiosis studies.<br /> The proteins MAPK is not introduced properly, as it is first mentioned in the results section in line 12. Given the importance of the results provided with it, it should be presented in the introduction prior to the results section.

      These sentences need a more elaborate explanation:<br /> Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways?

      Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. If the general pathway has been identified but the signaling cascade is still not well described, please indicate that in a clearer way.

      Results section: this review will first comment the figures, and then the text.<br /> Figure 1<br /> Why is mouse not included in Figure 1A? Although it might be very similar to human, given that mouse is the species that is most commonly use as a mammalian model, I believe it could be included. However, this is optional upon decision by the authors.<br /> There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Note that Clytia is not a commonly used species, therefore images should be properly explained for general readers. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic, plus any additional information needed to understand the image. In addition, the detection of ARPP19 in the nerve rings is intriguing. This is mentioned in the discussion section, any idea of its function there? Please include some additional information or additional references, if they exist.

      Figure 3<br /> The way these graphs are presented is somehow confusing. The meaning of the dots is not self-explanted in the graph, and it seems that each experiment was done independently but then the complete set of results is presented. Legend says that "each dot represents one experiment" but this is difficult to read as in every analysis the figure also indicates the average and the total number of oocytes. If authors wish so, they can keep the figure as it is, but then please explain this graph better in the text, and please include statistical analysis. These results are very robust, but a comparison between the number of oocytes that go through spontaneous GVBD of lysis in the different conditions will benefit their understanding.

      Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions (Fig 3C). Given that in the control experiments with buffer this effect is also observed in some oocytes, please explain if this is caused by a mechanical disruption of the oocyte during the injection. In contrast, okadaic acid induces the lysis in all the 14/14 oocytes analyzed, is this due also to the mechanical approach? Or is there other reason more related to the PP2A inhibition? Please explain.

      Figure 5<br /> In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed?

      • The text of the results is generally well described; however, all the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. For example, in page 5 lines 12-25, or page 7 lines 30-37, are all introduction information.<br /> Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described.

      Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion?

      Lines 25-27 of page 8. Text reads, "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD.". Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion.

      Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19".

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for their helpful and constructive comments. In response to these suggestions we have performed new experiments and amended the manuscript, as we describe in our detailed response below.

      Reviewer #1:

      1. The Reviewer notes that while our analysis of centrosome size was comprehensive, we provided no analysis of centrosomal MTs, pointing out that while centrosome size declines as the embryos enter mitosis, the ability of centrosomes to organise MTs might not. This is a good point, and we now provide an analysis of centrosomal-MT behaviour (Figure 2). We find that there is a dramatic decline in centrosomal MT fluorescence at NEB, although the pattern of centrosomal MT recruitment prior to NEB is surprisingly complex.

      2. The Reviewer questions how PCM client proteins can be recruited in different ways by the same Cdk/Cyclin oscillator. We apologise for not explaining this properly. It is widely accepted that Cdk/Cyclins drive cell cycle progression, in part, by phosphorylating different substrates at different activity thresholds (e.g. Coudreuse and Nurse, Nature, 2010; Swaffer et al., Cell, 2016). Moreover, it is also clear that Cdk/Cyclins can phosphorylate the same protein at different sites at different activity thresholds (e.g. Koivomagi et al., Nature, 2011; Asafa et al., Curr. Biol., 2022; Ord et al., Nat. Struct. Mol. Biol., 2019). Thus, we hypothesise that rising Cdk/Cyclin cell cycle oscillator (CCO) activity phosphorylates multiple proteins at different times and/or at different sites to generate the complicated kinetics of centrosome growth. We now explain this point more clearly throughout the manuscript.

      3. The Reviewer is puzzled as to how we conclude that Cdk/Cyclins phosphorylate Spd-2 and Cnn at all the potential Cdk/Cyclin phosphorylation sites we mutate in our study. The Reviewer is right that we cannot make this conclusion, and we did not intend to make this claim. As we now clarify (p11, para.1), although it is unclear if Cdk/Cyclins phosphorylate Spd-2 or Cnn on all, some, or none of these sites, if either protein can be phosphorylated by Cdk/Cyclins, then these mutants should not be able to be phosphorylated in this way—allowing us to address the potential significance of any such phosphorylation. We now also note that several of these sites have been shown to be phosphorylated in embryos in Mass Spectroscopy screens (Figure S6).

      4. The Reviewer highlights differences in how Spd-2 and Cnn help recruit γ-tubulin to centrosomes (Figure 6). They ask for a more detailed description, and are puzzled as to how this is compatible with direct regulation by a single oscillator. We now explain our thinking on this important point in much more detail. It appears that Spd-2 helps recruit γ-tubulin throughout S-phase, while Cnn has a more prominent role in late S-phase (Figure 6). This is consistent with our overall hypothesis of CCO regulation, as we postulate that low-level CCO activity promotes the Spd-2/γ-tubulin interaction in early S-phase, while higher CCO activity promotes the Cnn/γ-tubulin interaction in late-S-phase, potentially explaining the increase in the rate of γ-tubulin (but not γ-TuRC) recruitment we observe at this point (see minor comment #1, below, for an explanation of the various γ-tubulin complexes in flies). This is consistent with recent literature showing that CCO activity promotes γ-tubulin (but not γ-TuRC) recruitment by Cnn/SPD-5 in worms and flies (Ohta et al., 2021; Tovey et al., 2021).

      5. The Reviewer was not convinced by our model (Figure 8, now Figure 9), raising two major concerns. First, they were unsure how a single oscillator could generate different patterns of protein recruitment. We addressed this in point #2 and #4, above, where we explain how different thresholds of CCO activity trigger different events, so there is no expectation that we should observe steady changes in recruitment over time as CCO activity rises. Second, they questioned how modest levels of Cdk/Cyclin activity can promote recruitment, while high levels of activity can inhibit recruitment. In point #1, above, we cite several examples where such positive and negative regulation by different Cdk/Cyclin activity levels have been described. We also now explain throughout the manuscript why this hypothesis provides a plausible explanation for our results: with moderate CCO activity promoting Spd-2-dependent PCM-client recruitment in early S-phase; higher CCO activity promoting a decrease in Spd-2 recruitment in mid-late-S-phase (so centrosomal Spd-2 levels decline); and even higher levels of CCO activity leading to a decrease in the interactions between the client proteins and the Spd-2/Cnn scaffold as the embryos enter mitosis (so the client proteins are rapidly released from the centrosome).

      The Reviewer also raised the important point here that our model does not explain why the mutant forms of Spd-2 and Cnn accumulate to higher levels at the start of S-phase, and not just at the end of S-phase/entry into mitosis. We apologise for not explaining this properly. The accumulation of the mutant proteins (particularly Spd-2, Figure 5C) in early-S-phase occurs because the excess mutant protein that accumulates at centrosomes in _late-_S-phase/mitosis is not removed properly from centrosomes during mitosis (presumably because there is insufficient time). Thus, centrosomes still have too much mutant Spd-2 at the start of the next S-phase. We show this in Reviewer Figure 1 (attached to this letter), which tracks Spd-2 behaviour further into mitosis, and now explain this in more detail in the text (p12, para.1).

      1. The Reviewer questions how the CCO can both induce centrosome growth and also switch it off, as it is unclear how an oscillator that only phosphorylates sites to decrease centrosome binding could also promote growth. They ask if we can identify and mutate any Cdk/Cyclin sites in centrosome proteins that promote centrosome recruitment. As we now clarify, we did not intend to claim that the CCO only phosphorylates sites that decrease the centrosome binding of proteins, although we do hypothesise that such phosphorylation is important for switching off centrosome growth in mitosis. In addition, we hypothesise that moderate levels of CCO initially promote centrosome growth, and our data suggests that the CCO does this, at least in part, by promoting Polo recruitment (Figure 8). We speculate that the CCO phosphorylates specific Polo-box-binding sites in Ana1 and Spd-2, the main proteins that recruit Polo to centrioles. We agree that identifying these sites is an important next step, but it is complicated as our studies indicate that multiple sites contribute in a complex manner. Importantly, it is well established that the CCO triggers centrosome growth as cells prepare to enter mitosis, so our hypothesis that moderate levels of CCO activity initiate centrosome growth is not new or controversial.

      Minor Comments

      1. The reviewer asks how we explain the different incorporation profiles we observe for the different subunits of the γ-tubulin ring complex. We apologise for not discussing this point. In flies there is a “core” γ-tubulin-small complex (γ-TuSC) and a larger γ-tubulin-ring complex (γ-TuRC) that contains the Grip71, Grip75 and Grip128 subunits we analyse here (Oegema et al., JCB, 1999). The γ-TuSC functions independently of the γ-TuRC so γ-tubulin and γ-TuRC components can behave differently.

      2. The Reviewer questions why we claim an “inverse-linear” relationship between S-phase length and the centrosome growth rate when the relationship is not linear (Figure 3, now Figure S3). I was originally confused by this as well but, mathematically, a linear relationship means y is proportional to x, whereas an inverse-linear relationship means y is proportional to 1/x. Thus, an inverse-linear relationship between x and y does not plot as a straight line, but rather as the curves we show on the graphs. We now explain this in text (p9, para.2).

      Reviewer #2:

      This Reviewer found the manuscript hard to follow, so we are very grateful that they took the time to try to understand it. We agree that the subject matter is complicated, and that our presentation was not always helpful. The Reviewer’s comments have been very useful in helping us to identify (and hopefully improve) areas of particular difficulty.

      Major points:

      1. The Reviewer highlights that the two experimental approaches underpinning our main conclusions are problematic: (1) Experiments with mutants of Spd-2 and Cnn that theoretically cannot be phosphorylated by Cdk/Cyclins are hard to interpret as these mutations may have other effects; (2) It is unclear whether reducing Cyclin B levels reduces peak CDK activity or simply slows the time it takes to reach peak levels. They suggest a more direct test of our model would be to analyse PCM recruitment in embryos arrested in S-phase or mitosis. (1) We agree that the mutations designed to prevent Cdk/Cyclin phosphorylation could perturb function in other ways, but this is true for any such mutation, and there are many papers that infer a function for Cdk/Cyclin phosphorylation from such experiments. Importantly, the centrosomal accumulation of the phospho-null mutants actually slightly increases compared to WT (Figure 5C and I), and we now show that the centrosomal accumulation of a phosphomimicking Spd-2-Cdk20E mutant slightly decreases (Figure S8). We now acknowledge the potential caveat of a non-specific perturbation of protein function, but feel that the reciprocal behaviour of the phospho-null and phospho-mimicking mutants somewhat mitigates this concern (p12, para.2). (2) Fortunately, and as we now clarify, it has recently been shown that reducing Cyclin levels does not reduce peak Cdk activity, but rather slows the time it takes to reach peak activity (Figure 2A, Hayden et al., Curr. Biol., 2022). Thus, the cyclin half-dose experiments provide an excellent alternative test of our hypothesis as they show that the WT proteins can exhibit similar behaviour to the mutants if the rate of Cdk/Cyclin activation is slowed. We feel the evidence supporting our hypothesis is strong enough that it warrants serious consideration.

      The suggestion to look at PCM recruitment in embryos arrested in either S-phase or M-phase is a good one, but these experiments produce complicated data. In M-phase arrested embryos, for example, Cnn levels continue to rise (see Figure 1G, Conduit et al., Dev. Cell, 2014), but the other PCM proteins do not (unpublished); in S-phase arrested embryos (arrested by mitotic cyclin depletion) centrosomes continue to duplicate, but now do so asynchronously, greatly complicating the analysis (McCleland and O’Farrell, Curr. Biol.., 2008; Aydogan et al., Cell, 2020). The centrosomes that don’t duplicate, however, reach a constant steady-state size (where the rate of centrosome protein addition is balanced by the rate of loss). These observations are consistent with our recent mathematical modelling of mitotic PCM assembly (Wong et al., 2022) if we additionally account for cell cycle regulation (which was not considered in our original model). We believe such analyses are beyond the scope of the current paper and we plan to publish a second paper incorporating our new hypothesis into our mathematical modelling.

      1. The Reviewer questions whether our methods accurately measure centrosomal protein accumulation, pointing out that γ-tubulin and Grip128 occupy different centrosomal areas—which should not be possible if they are part of the same complex. They suspect that our use of different transgenes with different promotors could explain these differences. As we should have described (see point #1 in our response to the minor comments of Reviewer #1), γ-tubulin exists in two complexes in flies, only one of which contains Grip128, so γ-tubulin and Grip128 exhibit different localisations. Moreover, as we now show (Figure S2), using different promotors does not seem to make a difference to overall recruitment kinetics. Thus, we are confident that our methods measure centrosome protein recruitment dynamics accurately.

      2. The Reviewer is concerned that our measurements of centrosome size based on fluorescence intensity (Figure 1) and centrosomal area (Figure S1) do not always match. They suggest a potential reason for this is that proteins are not uniformly distributed within centrosomes, and this may impact our ability to measure protein accumulation based on 2D projections (noting, for example, that Polo and Spd-2 are concentrated at centrioles and in the PCM, potentially explaining the different shape of their growth curves compared to the client proteins). When the centrosome-fluorescence-intensity and centrosome-area recruitment profiles of a protein do not match, the average “centrosome-density” of that protein must be changing over time. In some cases, we understand why density changes. Cnn, for example, stops flaring outwards on the centrosomal MTs during mitosis so its centrosomal area decreases even as its fluorescence intensity increases (leading to an increase in its centrosomal-density). We agree (and now discuss—p19, para.3) that the prominent accumulation of Spd-2 and Polo at centrioles could help to explain why Spd-2 and Polo accumulation dynamics differ from the client proteins.

      Other points:

      1. The Reviewer suggests it would be good to know how much Polo at the centrosome is active. We agree, but although commercial antibodies against PLK1 phosphorylated in its activation loop work in cultured fly cells, we cannot get them to work in embryos. Moreover, the recruitment of Polo/PLK1 to its site of action by its Polo-Box Domain is sufficient to partially activate the kinase independently of phosphorylation (Xu et al., NSMB, 2013). Thus, it seems likely that all the Polo/PLK1 recruited to centrosomes will be at least partially activated, even if it is not necessarily phosphorylated on its activation loop.

      2. The Reviewer asks if it is clear that less Spd-2 and Cnn are recruited to centrosomes in the half gene-dosage embryos. We apologise for not mentioning that this is indeed the case. We showed this previously for Cnn (Conduit et al., Curr. Biol., 2010) and we now state that this is also the case for Spd-2. We do not show the Spd-2 data as we plan to publish a comprehensive dose-response curve of Spd-2 (and Cnn) recruitment in our next modelling paper.

      3. Would it not be relevant to examine Polo ½ dosage embryos? We do have this data (Reviewer Figure 2), attached to this letter, but it is quite complicated to interpret (as we explain in the legend). We feel it would be more appropriate to include this in our next modelling paper where we can properly explain the behaviours we observe. Publishing this data here would distract from our main message without changing any of our conclusions.

      4. The Reviewer asks why the non-phosphorylatable Spd-2 protein is also present at higher levels on centrosomes at the start of S-phase (not just the end of S-phase). This was also raised by Reviewer #1 (point #5), so please see the second paragraph of our response there.

      Minor/Discussion Points:

      1. We thank the Reviewer for highlighting that absolute and relative centrosome size control are different things and we have amended the manuscript accordingly.

      2. The Reviewer questions whether it is accurate to describe Spd-2 and Polo as scaffold proteins, noting that only Cnn has been shown to have scaffolding properties. There is strong evidence that Spd-2 has Cnn-independent scaffolding properties in flies (e.g. Conduit et al., eLife, 2014), but this is a fair point for Polo. We think it is justified to separate Polo from other client proteins as Polo is essential for scaffold assembly, whereas other client proteins are not. We now define our scaffold/client terminology to avoid confusion (p4, para.3).

      3. The Reviewer highlights several points related to differences in recruitment kinetics (also touched on in points #2 and #3, above), noting we don’t discuss properly the idea of two different modes of PCM recruitment. These are all good points, largely addressed in our response to points #2 and #3, above. We now discuss much more prominently the two different modes of client protein recruitment throughout the manuscript.

      4. As we now clarify, in all our experiments we use centrosome separation and nuclear envelope breakdown (NEB) to define the start and end of S-phase, respectively.

      5. The Reviewer quotes the landmark Woodruff paper (Cell, 2017) as showing that the ability to concentrate client proteins (including ZYG-9, the worm homologue of Msps) is an intrinsic property of the PCM scaffold, so how do we explain that Msps departs prior to NEB while Cnn continues to accumulate? It is indeed a striking observation of our study that all PCM client proteins (not just Msps) start to leave the centrosome prior to NEB, even as Cnn levels continue to accumulate. Our hypothesis is that this ‘leaving’ event is triggered by a threshold level of Cdk/Cyclin activity—explaining why these client proteins all start to leave the PCM at the same time (just prior to NEB) irrespective of nuclear cycle length. This is not incompatible with the Woodruff paper, which did not attempt to reconstitute any potential regulation by Cdk/Cyclins in their in vitro studies.

      6. The Reviewer questions why Spd-2 that cannot be phosphorylated by Cdk/Cyclins (Spd-2-Cdk20A) accumulates abnormally at centrosomes in late S-phase, yet γ-tubulin (which is recruited by Spd-2) seems to leave centrosomes more slowly in the presence of the mutant protein. As we now explain more clearly, there is no contradiction here. Spd-2-Cdk20A accumulates to abnormally high levels in late-S-phase/early mitosis (Figure 5C), and this reduces the γ-tubulin dissociation rate, as we would predict (Figure 7B, right most graph). It does not “prevent” dissociation, however, (as the Reviewer seems to suggest it should?), but this is probably because these experiments have to be performed in the presence of large amounts of the WT Spd-2 (Figure 5A).

      7. The referencing error has been corrected.

      8. The Reviewer asks why in Figure 1 not all of the centrosome proteins could be followed for the full time period (as we mention in the legend, but do not explain). There are different reasons for different proteins: (1) Polo cannot be followed in mitosis as it binds to the kinetochores, making it impossible to accurately track centrosomes (so the data for mitosis is missing for Polo); (2) Cnn exhibits extensive flaring at the end of mitosis/early S-phase (Megraw et al., JCS, 1999), so we cannot track individual separating centrosomes labelled with NG-Cnn in early S-phase until they have moved sufficiently far-apart (so the early S-phase time-points are missing for Cnn); (3) In addition, several of the client proteins bind to the mitotic spindle, so although we can still track and measure the centrosomes in late mitosis in the graphs, we don’t show pictures of these late mitosis centrosomes in the montage in Figure 1A as the images look a bit odd. We now explain these reasons in the Materials and Methods.

      9. We now indicate that nuclear cycle 12 (NC12) is being analysed in Figures 4-8.

      10. The reviewer questions why we don’t show the decrease rate for γ-tubulin in Figure 6 (the Spd-2 and Cnn half-dose experiments), when we do show it in Figure 7 (the Spd-2 and Cnn Cdk-mutant experiments), suspecting that it is slowed in both cases. The reviewer is correct and we now show this data for both sets of experiments.

      11. We have corrected the labelling error in Figure S1.

      12. The Reviewer suggest moving some of the data from the main Figures, and the entirety of Figures 2 and 3 to the Supplemental Information. We understand this point, and agree that the amount of data presented in Figures 1-3 is somewhat overwhelming. We have played around with the Figures a lot—in particular trying to show a few examples of the data and moving the rest to Supplementary—but it is hard to pick a “typical” example, and the power of comparing the behaviour of so many different centrosome proteins is somewhat lost. We have tidied up several Figures and, as a compromise, we keep Figure 2 (now Figure 3) in the main text, but have moved Figure 3 to Supplementary (now Figure S5).

      13. The Reviewer suggests that we should repeat the analysis of Spd-2, Polo and Cnn dynamics that we show here, as we already presented this data in a previous publication (Wong et al., EMBO. J, 2022). We understand this point, but feel this would be a less accurate comparison, as essentially all of the data shown in Figure 1 was obtained several years ago during a contiguous ~6month period. Since then, the lasers and software on our microscope system have been updated, so it would probably be less fair of a comparison to obtain new data for a subset of these proteins (and it seems overkill to perform the entire analysis again). We clearly state that this data has been presented previously, so we hope the Reviewer will agree that it is acceptable to present it again here so readers can more easily compare the data.

      Reviewer #3:

      This Reviewer is broadly supportive of the manuscript, but to publish in a prestigious journal they think additional experimental evidence will be required to support our hypothesis.

      The Reviewer notes that our only evidence that Cdk/Cyclins directly phosphorylate Spd-2 comes from our analysis of the Spd-2-Cdk20A mutant, as the effect of reducing Cyclin B dosage on WT Spd-2 behaviour is very modest. They request that we analyse the behaviour of a Spd-2-Cdk20E phospho-mimicking mutant. The effect of halving the dose of Cyclin B on Spd-2 behaviour is modest, but this is what we would predict as all we are doing in this experiment is slowing S-phase by ~15%, so Spd-2 should accumulate at centrosomes for a slightly longer time and to a slightly higher level (as we observe, Figure 5E). A great advantage of the early fly embryo system is that we can compare the behaviour of many hundreds of centrosomes, so even subtle differences like this are usually meaningful. To illustrate this point, we have now repeated the Spd-2 analysis in WT and CycB1/2 embryos (but now using a CRISPR/Cas9 Spd-2-NG knock-in line) and we see the same subtle differences (Figure S9). In addition, as requested, we have now analysed the behaviour of a Spd-2Cdk20E mutant protein using an mRNA injection assay (as it would have taken too long to generate and test new transgenic lines). In this assay we injected embryos with mRNA encoding either WT Spd-2-GFP, Spd-2-Cdk20A-GFP or Spd-2-Cdk20E-GFP. The mRNA is quickly translated, and we computationally measured the fluorescence intensity of the centrosomes in mid-S-phase (i.e. at the Spd-2 peak) (Figure S8). This analysis confirms that Cdk20A accumulates to slightly higher levels, and reveals that Cdk20E accumulates to slightly lower levels, than the WT protein. Together, these new experiments strongly support our original conclusions.

      The Reviewer notes that we propose that the CCO initially promotes centrosome growth by stimulating Polo recruitment to centrosomes, but states that we only provide indirect evidence for this by showing that centrosomal Polo levels are strongly reduced in Cyclin B half-dose embryos. They suggest we determine Spd-2 levels in Polo half-dose embryos, and/or the centrosome levels of mutant forms of Spd-2 that cannot be phosphorylated by Polo. We believe the Cyclin B half-dose experiment provide direct support for our hypothesis that Cdk/Cyclin activity influences Polo recruitment (Figure 8), although, clearly, we have not identified the mechanism. We do, however, suggest a plausible mechanism: Ana1 and Spd-2 are largely responsible for recruiting Polo to centrosomes, and we have previously shown that several of the potential phosphorylation sites in these proteins that help recruit Polo to centrosomes are Cdk/Cyclin or Polo phosphorylation sites (Alvarez-Rodrigo et al., eLife, 2020 and JCS, 2021; Wong et al., EMBO J., 2022). We are currently testing this hypothesis, but progress is slow as it is clear that multiple sites in both proteins can influence this process.

      As the Reviewer requests, we have now also examined how Spd-2 and Cnn behave in Polo half-dose embryos (Reviewer Figure 2, attached to this letter). As we describe in the Figure legend, this data is informative, but is complicated. With relatively minor, but mechanistically important, tweaks to our previous mathematical modelling we can explain these behaviours, but introducing such a significant mathematical modelling element would be beyond the scope of this paper. As described above, these findings will form the basis of a follow-up paper that is more mathematically oriented.

      It is a great idea to look at mutant forms of Spd-2 that cannot be phosphorylated by Polo, but the consensus Polo phosphorylation site (N/D/E-X-S, with the N/D/E at -2 and the S at 0 being preferences, rather than a strict rule) is less well-defined than the consensus Cdk/Cyclin phosphorylation site (where the Pro at -1 is essentially invariant). Thus, we cannot accurately predict which sites would need to be mutated to generate such a mutant.

      The Reviewer requests that we analyse the behaviour of TACC in embryos expressing the Spd-2-Cdk20A and Cnn-Cdk6A (as we do in Figure 7 for γ-tubulin). This is a reasonable request, but we prefer not to show this data as we have recently identified an interesting interaction between TACC, Spd-2 and Aurora A that will be the subject of another paper we hope to submit shortly. This data is hard to interpret without explaining these interactions properly, which is beyond the scope of the current manuscript.

      We hope the Reviewers will agree that these changes have improved the manuscript substantially, and that it is now suitable for publication. We would like to thank them again for taking the time to read this rather complicated paper so thoroughly.

      We look forward to hearing from you.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      In their manuscript „Live-cell super-resolution nanoscopy reveals modulation of cristae<br /> dynamics in bioenergetically compromised mitochondria", Golombek et al. tested the effects of different mitochondrial toxins on cristae dynamics. The main focus of their work lies on live STED imaging, which they use to visualize cristae merging and splitting. They found swelling of mitochondria and reduced cristae density in response to most toxins, but cristae dynamics remained largely unaffected. Depletion of the membrane potential by administration of CCCP increased cyristae dynamics, while inhibition of ANT had a negative effect on cristae dynamics at least in a subset of mitochondria.

      1. The authors state that the used concentrations of mitochondrial toxins commonly result in a change in oxygen consumption. While this is believable, it is not guaranteed that the specific chemicals used for the experiments were working properly (freeze/thawing or simply incorrect storage or aliquotation may have an effect on the compounds). This is even more important in the case of results where no significant change after the administration of the toxins is seen. In Figure 5, the authors report no change in membrane potential after oligomycin administration, this is unexpected. I therefore suggest to include a supplementary figure, in which the functionality of the compounds is verified. This could be done by respiratory measurements (e.g. Seahorse). A Mito Stress Test was performed for Figure 6, but this was done using the Seahorse kit chemicals, which were probably different from the chemicals used in the microscopy experiments.

      Response: We appreciate the valid concerns of the reviewer in this point.

      A) In order to show the functionality of compounds which were used for performing our experiments including STED imaging, we now performed respiratory measurements employing the concentrations of mitochondrial toxins (Oligomycin A, CCCP, rotenone/antimycin A) which were used during imaging conditions as well as commercially available mitochondrial toxins (Oligomycin A, FCCP, rotenone/antimycin A) with respective concentrations used as a standard for the Mito stress Kit. The new figures are included in Fig S1A & B. HeLa cells treated with seahorse compounds or those used during imaging conditions showed similar results including basal, maximal and spare respiratory capacity. Further, in order to overcome the inefficiency of mitochondrial toxins employed, due to freeze/thaw cycles, we used fresh aliquots (stored at -20°C) as a general strategy. This is clearly observed by a drastic reduction of ΔΨm upon treating HeLa cells with CCCP, antimycin A as well as rotenone (Fig S6A & B). A reduction of mitochondrial ATP levels was also observed upon employing rotenone, antimycin A and oligomycin A confirming that active mitochondrial toxins were used. These experiments demonstrate that the mitochondrial toxins employed throughout our manuscript are functional as expected.

      New Figure S1A & B

      B) The Fig 6 (now Fig 5 due to Reviewer # 2, Point 7) respirometry experiments which initially employed seahorse compounds and BKA has now been replaced with new experiments where we used mitochondrial toxins similar to STED imaging. Needless, to say, the results are similar to what were observed with seahorse compounds. The new figures are replaced in Fig 5A & 5B.

      New Figure 5A & B

      C) Oligomycin A inhibits ATP synthase which results in decreased ATP synthesis as observed (Fig 4A & B). Further, oligomycin A is expected to hyperpolarise mitochondria (2). In Fig S6, despite some cells having more ΔΨm, there was no overall significant change when compared to untreated cells. Previous publications also show that there is no significant difference in ΔΨm upon treatment with oligomycin (1) demonstrating that the ΔΨm depends on the concentration of oligomycin, treatment time and cell type.

      1. Figure 1 would benefit from a more detailed description of merging/splitting events. Maybe a cartoon plus a zoomed in image of an exemplary event?

      Response: Thank you for the suggestion. In order to clearly explain/simplify the understanding of cristae merging and splitting events, we added a cartoon in Fig 1B. The green and magenta arrows show sites of imminent merging and splitting with the green and magenta asterisks representing them respectively in the subsequent frames. The zoomed in images in Fig1A (leftmost panel) are shown to the right as time-lapse images.

      New Figure 1B

      1. Could the reduced cristae density be an effect of mitochondrial swelling? It is curious that all toxins appear to have the same effect on mitochondrial architecture. What is the fait of an enlarged mitochondrion over time? Mitophagy? And does the percentage of enlarged mitochondria change with increasing treatment time?

      Response: Thank you for the comment.

      A) We agree that the reduced cristae density is due to mitochondrial swelling. We added the relevant text in the results section ‘Cristae structure is altered in a subset of mammalian cells treated with mitochondrial toxins’. Treatment of HeLa cells, with all the mitochondrial toxins mentioned, uniformly result around 50 % of mitochondria undergoing enlargement (Fig 2B). In enlarged mitochondria where the mitochondrial width is ≥ 650 nm, there is no change in cristae area occupied per mitochondria (Fig S3C & D) and as a result reduced cristae density (Fig 2H). Therefore, it indicates that reduced cristae density occurs due to mitochondrial enlargement.

      Figure 2B-F

      Figure S3C and D

      B) In order to address the fate of mitochondria with increasing time upon treatment with various mitochondrial toxins, we treated the HeLa cells for 4 hrs with mitochondrial toxins. Untreated cells maintained normal mitochondrial morphology while cells treated with various mitochondrial toxins displayed fragmented and swollen mitochondrial morphology. The new Fig S5 is included in the supplementary. Cristae morphology was abnormal displaying interconnected cristae in swollen mitochondria. Since mitochondrial fragmentation is already observed at 4 hours and accompanied by interconnected cristae, the number of cristae merging and splitting were severely reduced.

      Our imaging performed within 30 mins of addition of respective toxins overcomes the additional aberrancy of mitochondrial fragmentation which would not allow a reliable analysis of cristae dynamics as too few cristae would be visible within one mitochondrion.

      New Figure S5

      1. Figure 4C: How was the mitochondrial width determined in the LSM images? Especially in the perinuclear area it will be difficult to determine this parameter without the super-resolution provided by STED. Was this parameter determined manually for selected mitochondria? In the methods part it says that only a maximum of two mitochondria per cell were analyzed. How were these chosen? Was the process blinded?

      Response: Thank you for the comment. We could imagine the reason for the ambiguity in understanding.

      A) For LSM confocal images involving FRET-based microscopy to determine the ATP levels, we calculated the cell population as belonging to either normal or enlarged category. The confocal images of HeLa cells displayed clear separation of mitochondria even in the perinuclear area (representative images are shown in Fig 4A) and thus it was possible to measure the width of individual mitochondria. The methods section ‘FRET-based microscopy to measure ATP levels’ describes that ‘the cut off for swollen mitochondria was set to 650 nm in congruence with STED SR nanoscopy. If 85% of the mitochondrial population featured enlarged mitochondria, the cells were designated as swollen. Similarly, if 85% of the mitochondrial population featured mitochondria whose width was less than 650 nm, the cell was considered as having normal mitochondria’.

      Figure 4A

      B) The cristae morphology of various mitochondria is fairly uniform in individual cells. Thus, the mitochondria are representative of the individual cells. Therefore, in order to increase the coverage of various cells, we considered a maximum of two mitochondria from each cell which were randomly chosen. This part is modified in the methods section ‘Quantification of various parameters related to cristae morphology’ to make it clear. Thus, while the quantification of various parameters including dynamics involved individual mitochondria, various cells were classified as belonging to normal or enlarged category while measuring ATP levels.

      1. What is the average size of all mitochondria per cell? Is this addressed in Figure 2B or are only analyzed mitochondria included? Please carify. Were the mitochondria chosen for analysis representative for the respective cell?

      Response: The data obtained by super-resolution imaging of mitochondria is used for quantifying cristae dynamics which is a very challenging and time-consuming method done in a blind-manner. As mentioned in response 4B, the cristae morphology is fairly uniform in individual cells, therefore, we only included the mitochondria which were analysed for various cristae parameters in our analysis which are really huge data-sets already. Thus, the average size of individual mitochondria per cell are not represented while analysing images obtained with STED SR imaging. Please also check response 4B.

      1. explain the mt-Go-AT team2, what is GFP (green fluorescent protein) and OTP (?)

      Response: GFP is Green Fluorescent Protein and OFP is Orange Fluorescent protein and included in the revised text.

      1. the graphs show in principle, e.g. Fig.1B, 3B-E show events/mitochondrion as far as I understand, not per cristae.

      Response: Thank you for pointing this out. It is actually the average number of events per cristae per mitochondria. We have changed the Y-axis to events/cristae/mito in Fig 1C (previous 1B), Fig 3B-E and wherever applicable for other figures throughout the manuscript.

      Figure 1C

      Figure 3B-E

      1. I would recommend changing the legend of the x-axis of Fig.2B-F to mito-width (y-axis could be probability density function, PDF).

      Response: We have now changed the X-Axis to mito width (originally width) in Fig 2B-F. The Y-axis are still retained as percentage mitochondria where cells treated with few mitochondrial toxins do not show a gaussian distribution of mitochondrial width.

      Figure 2B-F

      Referees cross-commenting

      both expert opinions address similar concerns and therefore a revision should be requested

      Reviewer #1 (Significance):

      The study is thorough and the experiments and results are well described. Overall, however, it remains a descriptive study and does not provide mechanisms. There is also no discussion of how MMP-dependent proteins, such as Opa1, which was previously studied by the Reichert group, might be affected. For swelling mechanisms, the opening of the mitochondrial permeability transition pore was discussed. This could be tested using inhibitors, but perhaps not within the scope of this publication. Nevertheless, the information provided by the study is of interest to the bioenergetics community and should be made available.

      Response: Thank you for the overall inputs.

      We tested the processing of OPA1 forms and found that after 30 mins, only CCCP treatment led to the processing of long isoforms to short forms (Fig S6C). We now included in the discussion that it is possible that short OPA1-forms are correlative to increased cristae merging as well as splitting events upon treatment with CCCP.

      New Figure S6C

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:<br /> The authors investigated cristae merging and splitting events using ultra-resolution STED. The goal was to test if cristae membrane remodeling is dependent on OXPHOS complexes, mitochondrial membrane potential (ΔΨm), and the ADP/ATP nucleotide translocator. To do this the authors utilized several mitochondrial toxins with known mechanisms of action. Interestingly, many changed overall cristae density but did not change the cristae remodeling events. Inhibition of ANT did change cristae morphology and cristae dynamics.

      Major Concerns

      1. Many conclusions and concepts need more clarification. For example, a major take home from the abstract is that various ETC inhibitors and protonophores reduce cristae density but not did not change cristae remodeling events. If cristae density is reduced, how can this occur without cristae remodeling events? Remodeling events need to be clearly defined in the introduction and abstract.

      Response: Thank you for pointing out this lack of sharpness in our terminology which indeed can cause a misunderstanding. To avoid this, we have now included ‘changes in cristae morphology’ as well as ‘dynamic merging and splitting events of cristae’ under the broader term cristae remodelling. Thus, we had changed the wording ‘cristae remodeling’ to cristae dynamics in the abstract and wherever appropriate in the manuscript text.

      The cristae morphology analysis showed no change in cristae area (Fig S3C) which was accompanied by mitochondrial enlargement. Therefore, cristae density was reduced. For the purpose of clarity, we added a sentence in the introduction section while giving a peek into our results that ‘cristae dynamic events are ongoing despite reduced cristae density’. In addition, we have now included in the results section the following statement: ‘Cristae membrane remodeling has been used to describe cristae dynamic events (i.e. cristae merging and splitting) as well as overall changes in cristae morphology within a single mitochondrion in this manuscript’.

      Figure S3C and D

      1. Other interpretations are also unclear such as how ETC inhibitors which reduce ATP levels did not impact cristate remodeling events, yet inhibiting ATP/ADP exchange did greatly impact this phenomenon. It seems likely that the inhibition of ANT has nothing to do with ATP/ADP exchange since most of the ETC inhibitors no doubt greatly impact overall ATP/ADP exchange. This interpretation needs clarification.

      Response: We agree that further clarification is needed, in particular to explain why ATP/ADP exchange is actually ongoing even when OXPHOS inhibitors are applied and to explain why reduced ATP levels do not mean that there is no ATP/ADP exchange occuring. Treatment of HeLa cells with various mitochondrial toxins inhibiting the function of OXPHOS complexes leads to decreased ATP levels due to ongoing ATP consumption within the cell (Fig 4). One should also consider that two things can and do happen when most of these toxins are applied regarding ATP exchange. First, the ATPase can act in reverse mode which is a (partial) compensatory mechanism to restore ΔΨm and which will further decrease ATP levels (Note: not in the presence of oligomycin). Second, under these conditions ADP/ATP exchange is still ongoing in order to transport ATP derived from glycolysis in the cytosol to the mitochondrial matrix which also causes an (partial) compensatory increase in membrane potential. After ATP import ATP is hydrolysed to ADP for reverse proton pumping via the F1FO-ATPase or alternatively by the F1-part alone without proton pumping. In all these cases it is essential and possible to exchange ADP with ATP constantly. Therefore, the overall exchange of ADP and ATP is not necessarily grossly expected to be different when compared to untreated cells (due to compensatory glycolysis and subsequent ATP import and hydrolysis in the matrix). On the other hand, BKA treatment which clearly impairs the exchange of ADP and ATP will lead to a completely different situation compared to only treating with OXPHOS inhibitors. With BKA the mitochondrial matrix cannot anymore be resupplemented with ATP derived from glycolysis and metabolite flux is grossly hampered. Consistent with this a strong reduction in ΔΨm and oxygen consumption is accompanied with BKA treatment (Fig. 5AB & SFig 7F). Thus, w.r.t cristae dynamic events, in the time-frame we used for imaging, a reduction of ATP levels does not impede occurrence of cristae merging and splitting events while BKA treatment does (Fig S7). We discuss this indeed interesting and unexpected finding in the discussion section. We propose that rather ongoing metabolite flux (ATP/ADP exchange) is critical for maintaining cristae dynamics and blocking it is detrimental for it. We adapted the discussion in this direction to make it more clear.

      Figure S7A, B and D

      1. Why did the authors wait 30 min to image after the addition of mitochondrial toxins? I would have guessed there is a more rapid change in response to these inhibitors. Is there is a chance he authors missed the most dramatic events?

      Response: Since we were inclined to observe early responses, cells were imaged within the first 30 mins after addition of the respective mitochondrial toxins (Please see methods ‘cell culture transfection and mitochondrial toxin treatment’). Thus, to answer this question we want to emphasize that we did not wait 30 minutes but we restricted our time frame of analysis to 30 min. Therefore, we think that we did not miss out on any rapid changes occurring early on. Regarding this point, Reviewer #1 (Query 3) asked for responses at a later time-point. Please read the Reviewer #1, response 3B.

      1. How do these mitochondrial toxins that are known to cause mitochondrial swelling not induce changes in cristate density?

      Response: Thank you for the question. Probably, there is a misunderstanding. In Fig S3E, we clearly show that as the mitochondrial width increases in cells after treatment with mitochondrial toxins, there is a clear decrease in cristae density. In fact, the reduced cristae density is observed exclusively in enlarged mitochondria. Figure S3E-I

      5. It's interesting that inhibition of the ANT translocator by BKA treatment led to increased percentage of mitochondria with abnormal cristae morphology. It's accepted that inhibition of ANT profoundly reduces mitochondrial swelling. Do the authors have any data suggesting that abnormal cristae morphology actually is a mechanism for reducing cell death events such as permeability transition? Did the authors utilize cyclosporin A concomitantly with any of the mitochondrial toxins?

      Response: This is a very interesting question! As the reviewer might be aware, there is evidence connecting cristae remodelling to induction of apoptosis (3). Cristae transitioned to a highly interconnected state after tBID treatment within minutes. However, it is unclear what is the contribution of cristae dynamics in this regard. Within 30 mins, there were no visual signs of cell death in our experiments as observed under a microscope. Hence, we did not use cyclosporin A in our experiments. In our opinion, this question will form part of a very interesting future study and is currently beyond the scope of this manuscript.

      1. Are the authors confident in the data given many of the experiments utilized quantification of 10-20 mitochondria? How are you sure this sampling is sufficient for phenomenon being studied?

      Response: Please see Reviewer 1, Response 4B. As pointed in the response to reviewer #1, the cristae morphology is fairly uniform in individual cells. Therefore, in order to maximise the cell population covered, we randomly used a maximum of two mitochondria from each cell. In addition, we included cristae analysis from at least three biological replicates in order to observe the reproducibility of the data. Taking these factors into consideration, we are confident that our results reflect a sufficient sample size. Further, we would like to point out while our group performs STED super-resolution imaging routinely, the quantification of cristae merging and splitting events done in a blind yet manual manner is a really laborious and time-consuming process. In the future, we are also looking to optimise this at least in a semi-automated manner.

      1. Figure 4 and 5 merely confirm current dogma and don't really contribute to the overall conclusions and can be moved to supplemental data.

      Response: We agree that Fig 5 is confirming to the current dogma. Therefore, we moved it to Fig S6. Regarding Fig 4, we would like to highlight that there is a decrease of ATP levels before mitochondria enlarge. Thus, we would like to retain it as part of the main figure.

      1. It's interesting that BKA dose dependently decreased ATP-linked respiration and all doses limited maximal respiratory capacity. It would be interesting to know if the BKA normal vs. abnormal mitochondria have differential membrane potential?

      Response: Thank you for the interesting question. Overall, BKA treatment leads to a significant decrease of ΔΨm in the whole cell population (Fig S7). Further, the abnormal cristae morphology is only seen in one-third of the population of mitochondria (Fig shown in Response 2). Thus, a drop in ΔΨm seems to be a very early response upon exposure to BKA and independent of cristae morphology. An ideal experiment to address this question would be to image cristae dynamics and ΔΨm using super-resolution imaging which is challenging according to the state-of-art and available chemicals.

      Figure S7E and F

      1. Overall, this is an interesting study and seems appropriately performed but the results and conclusions are unclear. More discussion should include physiological relevance and impact and how this data influences previous work. Some physiological perturbations beyond the mitochondrial toxins and or utilization of genetic models would strengthen the interpretation and overall impact.

      Response: Thank you. We added an OPA1 blot showing the different L-OPA1 and S-OPA1. (Reviewer #1, response in significance section) where we observed that S-OPA1cleavage is selectively enhanced in CCCP-treated cells which could be correlated with enhanced cristae dynamics. We also included these results in the main text.

      New Figure S6C

      Referees cross-commenting

      Yes, I conclude that given the significant overlap in reviwer comments and general need for clarification of concepts and data that a revision is in order.

      Reviewer #2 (Significance):

      Overall, a highly specialized study with audience limited to mitochondriacs. Although, I'll note tis is a hot area of study and there is high interest in the field. Some of the data interpretation is difficult to understand and overall more context is needed to explain the results, impact and relevance. Defining exactly what a cristae remodeling event is and how this differs from cristae density and how the two aren't directly connected is unclear.

      Review by a mitochondrial biologist specializing in mitochondrial signaling and connection to physiology.

      References:

      1. Baker MJ, Lampe PA, Stojanovski D, Korwitz A, Anand R, et al. 2014. Stress-induced OMA1 activation and autocatalytic turnover regulate OPA1-dependent mitochondrial dynamics. EMBO J 33: 578-93
      2. Farkas DL, Wei MD, Febbroriello P, Carson JH, Loew LM. 1989. Simultaneous imaging of cell and mitochondrial membrane potentials. Biophys J 56: 1053-69
      3. Scorrano L, Ashiya M, Buttle K, Weiler S, Oakes SA, et al. 2002. A distinct pathway remodels mitochondrial cristae and mobilizes cytochrome c during apoptosis. Dev Cell 2: 55-67
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Major comments<br /> In the paper "Microtubules under mechanical pressure can breach dense actin networks", the authors showed clear evidence that pressure plays an important role in microtubule breaching into dense actin networks using elegant in vitro reconstitution assays. They have argued that the pressure results from polymerization force of microtubules, which builds up when microtubules are immobilized in the opposite end of breaching, by the means of actin microtubule crosslinking factor Tau.

      Authors answer:

      We thank the reviewer for his/her positive comments on our manuscript.

      It would definitely be interesting to see lack of breaching in the presence of crosslinking deficient Tau construct in order to rule out the off -target effect of Tau on microtubule and actin architecture which may possibly facilitate breaching.

      Authors answer:

      This is an interesting suggestion. Unfortunately, we do not have in hand such crosslinking deficient Tau construct. However, please note that we showed two independent ways to demonstrate the role of pressure. One is indeed by crosslinking microtubule to actin bundle with Tau, but the other is by blocking the two opposite ends of microtubules with two dense actin networks. So, we think our conclusion about the role of pressure is solid.

      The authors have also observed microtubule breaching into dense actin networks in living cells. However, in Figure 1C, better cell/ image processing might have been chosen to increase the visibility of actin structures that microtubules encounter on their way to breaching. In Figure S1D, for example, the similar actin structures in lamellipodia are very nicely visible.

      Authors answer:

      We apologize but we don’t understand reviewer’s comment. In figure 1C images of actin networks are shown in black and white and are more visible than in figure S1D where they are shown in magenta and overlaid with microtubules. In any case, we increased the contrast of images to make fine actin structures at the cell edge clearer.

      It is also interesting that on Figure 6A, actin bundles look different than the rest of the figures on the paper. It almost looks like actin bundles become branched, whereas in the other Figures actin bundles are either singular or two-three bundles joined together at the point very close to the edge of micropatterned lipid bilayer.

      Authors answer:

      This is correct. In this experiment several bundles co-aligned. As mentioned by the reviewer this could also be visible in other conditions without Tau (such as in Figure 4E), and, as shown below, this structure of bundle was not visible in all fields we looked at. So we don’t think this structure is responsible for the changes we measured in the ability of microtubules to penetrate the actin network in the presence of Tau.

      Minor comments<br /> In the legend of Figure 4E, it should be written "white arrow" instead of "yellow arrow".<br /> In the Results section "crosslinking between microtubules and actin bundles increase piercing frequency", in the sentence number 7, it should be written "backwards" instead of "reaward".

      Authors answer: We modified the text and legend according to the reviewer suggestions.

      Reviewer #1 (Significance):

      The experimental setup of the paper is quite significant in the field, given the difficulty of observing dynamics of dense cytoskeletal structures in living cells. Moreover, the paper gives insight into how microtubule behavior can vary depending on different morphological states of actin network.

      Authors answer: We thank the reviewer for his/her overall very positive feedback on our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors developed a novel in vitro system to investigate the interaction of dynamic microtubules with the F-actin network. While this system does produce some interesting results, it is unclear how exactly this replicates or explains what might happen near a cell's leading edge. There is a limited characterization of the produced F-actin networks. For example, it is unclear to what extent the F-actin networks are similar or different to cell lamellipodial networks. What is the density / expected mesh size of these networks and could that be varied / manipulated? The bottomline observation that microtubules can grow into F-actin networks if they have nowhere else to go does not seem particularly ground-breaking, and the discussion is very shallow. Overall writing could be improved; there are lots of typos and grammatical inconsistencies. The second paragraph of the introduction is a bit convoluted.

      Authors answer:

      We thank the reviewer for his/her comments. Figure 1 was used to illustrate the behavior of microtubules encountering actin networks in cells and the fact that they struggle to penetrate actin network. This is only a way to argue that the penetration of actin network is a relevant question, that cannot be easily addressed in cells. However, it is correct that our in vitro systems, as it is the case for all in vitro reconstituted systems, cannot tend exactly to reproduce a lamellipodial cellular network. But it offers a better way to modulate actin network architecture. We have used in vitro systems to characterize the different behavior of microtubules when they encounter dense actin networks in different conditions, guided or not by actin bundles, constraint or not at the two ends.

      The observation that microtubule can penetrate actin network when pressurized might not be “ground breaking”, still it contradicts previous works showing that microtubule under pressure tend to depolymerize (Janson et al, J Cell Biol, 2003), which would obviously prevent them from penetrating actin networks. So, our conclusion was somehow unexpected.

      We found important to discuss the fact that although the microtubule polymerizing forces is sufficient to breach dense actin network, it must be counteracted by another mechanism immobilizing microtubules. This means that in cells, expression level of actin-microtubule crosslinker modulate the penetration of microtubule into the lamellipodium.

      However, we agree that the second paragraph of the introduction is not absolutely necessary and removed it.

      Specific comments:

      Fig. 1 seems a bit anecdotal. The authors revisit an observation that has been made before. I can see how it is used as rationale for the in vitro system, but not sure that this adds much to the overall story. Clearly different cell types are different, but without some sort of quantification this remains meaningless. It should also be noted in the discussion maybe that there are large differences between cells in 2D and 3D. Microtubules much more frequently grow to the cell edge compared with 2D (see Akhmanova SLAIN2 paper from some years ago).

      Authors answer:

      We agree with these comments. Indeed, Figure 1 is used only as an illustration of the behavior of microtubules encountering actin network in cells. As the reviewer said, microtubule penetration and actin architectures will both vary a lot from one cell type to another. So we believe that quantification for these particular cases will not extend the illustrative purpose of this figure where it is already clear that some microtubules can penetrate and other can’t.

      Fig. 2: While Arp2/3 certainly promotes branched F-actin networks, from the data provided it is not clear to me to what extent the produced F-actin networks replicate F-actin organization at the cell edge. If this a the point the authors are trying to make, the ultrastructure of their in vitro networks needs some additional characterization. As far as it is possible to discern from the data provided, the F-actin meshwork on the stripes in E looks pretty much identical in both panels (and not really like a dendritic network that in a cell also would have a certain polarity with barbed ends facing out), and the bundles on the left don't look like anything that normally occurs in a cell.

      Authors answer:

      We also agree with these comments. The networks we assembled are not lamellipodial-like networks, there are branched network of various densities, with or without bundles. It is true that bundles of filaments do not grow out of lamellipodial network in cells. However, bundles of aligned and linear filaments exist in cells, in the form of radial fibers or transverse arcs, along which microtubule tend to align. And these structures might guide microtubules toward cell protrusions, as it is the case in growth cone for example.

      Fig. 4 It is unclear what is going on here. Given that without F-actin bundles, polymerizing microtubules are freely moving around, it does not come as a surprise that they would never penetrate the F-actin network because as the authors correctly state the growing end will push back from the barrier. So, then why do they sometimes penetrate when bundles are present? In 4A it appears that microtubule growth into f-actin only happens once the microtubule minus ends gets stuck between F-actin bundles on the other side. 4D is the same as 4A; so that makes me think this really does not occur that often. Does the microtubule plus end only penetrate the F-actin meshwork when the minus end gets stuck on the other side? This seems important and also means microtubule penetration may not have anything to do with the F-actin network architecture at the plus end. This needs to be quantified.

      Authors answer:

      This is perfectly correct. In figure 4 the two actin networks are distant, and the microtubules only rarely penetrate them because they are rarely in contact with them at both ends. This occurs only when bundles orient microtubules perpendicular to the edges of the actin network, since in this configuration the distance between the two actin networks is shorter. Hence our motivation to bring actin networks closer to each other in figure 5.

      Fig. 5 I guess that sort of solves my confusion with Fig. 4. The quantification graphs in 5B and 5C are flipped with respect to the figure legend (?).

      Authors answer:

      Indeed, in this figure we distinguished the role of pressure (when both microtubule ends are in contact with actin networks) and the role of alignment with actin bundles. And found that the presence of bundles is useless and that only pressure matters.

      I understand the rationale for not considering microtubules that grow at a shallow angle, but there does not seem to be that much of a difference between 5B and 5D. Wouldn't a better quantification simply compare microtubules that contact F-actin at both ends compared with microtubules were the minus end is free. In this case, I would expect a very large difference in penetration.

      Authors answer:

      This is also correct. The difference is so important that when one end is free the microtubule never penetrate. We mention it in the text but did not plot these data. This is why we measured only microtubule with both ends contacting an actin network and did not consider the one at shallow angles.

      We added the illustration of the condition with short distance and actin bundles (shown below) to make this more clear in the figure.

      The small difference between 5B and 5D shows that by eliminating those microtubules there is no more difference between the conditions with or without bundles. And thus that their contribution in favoring microtubule penetration was to favor optimal orientation to get pressurized at the two ends rather than offering a sort of favorable network organization at their base. However, we agree with the reviewer that the absence of difference between the two populations, with or without actin bundle, when considering only microtubule interacting with actin at angles higher than 30° is not quite striking. We tested all angles (see below) and found that actually the absence of difference is more obvious when considering microtubules interacting with more than 60°. And the analysis of angle distribution, now reported in Figure 5D, showed that in both conditions most microtubules interact with more than 60°, so we only exclude few outliers by considering those that interact with more than 60°. So we changed the presentation of our data in Figure 5C by changing the threshold from 30 to 60°.

      Do microtubules under pressure ever bend/buckle in this in vitro situation. As the authors state, in cells, that happens frequently. This difference is interesting. Why?

      In vitro microtubules buckle homogeneously between their two ends. These long buckling wavelengths are not very spectacular. In cells, microtubules are crosslinked to actin filaments or other structures over shorter distances (see quantification below). This leads to buckling with shorter wavelength, which is more striking.

      It is customary to refer to polymerized actin as F-actin.

      The supplementary videos are not referenced in the manuscript.

      Authors answer:

      We apologize and have now referenced the supplementary video in the manuscript.

      Reviewer #2 (Significance):

      The manuscript describes results from a novel assay to study interactions between F-actin networks and dynamic microtubules in vitro. While of interest to a specialized audience, the overall finding that microtubules can grow into an F-actin meshwork is somewhat incremental especially because of the limited characterization of the F-actin networks used. It remains unclear to what extent this is relevant to a physiological context in cells.

      My field of expertise is related to cytoskeleton dynamics and quantitative microscopy in live cells.

      Authors answer:

      Although intuitive, the demonstration that the density of actin network can prevent microtubule penetration is novel. More importantly, the demonstration that anchoring of microtubule is sufficient to increase the pressure to such a point that microtubule can then penetrate those networks is also novel and significant to appreciate when and how they do so in cells.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this paper, the authors present an in vitro assay designed to explore the dynamic interaction between growing microtubules and pre-existing actin networks. Notably, the presence of linear actin bundles facilitated the movement of polymerizing microtubules along actin filaments. When microtubules were immobilized to two spatially separated actin networks, they exhibited the ability to breach and penetrate dense actin meshworks. This penetration was attributed to the mechanical pressure generated by microtubule polymerization. The authors tested tau as a microtubule-actin crosslinking protein in this process and found that tau promoted microtubule penetration into dense actin meshwork. Although the findings in this paper are potentially significant, the work is still in its preliminary stage and the scope is limited.

      Authors answer:

      We thank the reviewer to summarize properly the main findings of our manuscript.

      1. The authors observed that the inclusion of tau, a microtubule-associated protein known for its role in promoting microtubule polymerization, significantly facilitated microtubule penetration into dense actin meshworks. This enhancement is likely attributed to tau's ability to promote microtubule polymerization, generating stronger forces within the microtubules that enable them to breach the actin meshworks. To validate the involvement of the crosslinking function in the process, the authors should explore the effects of other microtubule-actin crosslinking proteins in their assay.

      Authors answer:

      We thank the reviewer for this interesting suggestion regarding the role of Tau in our experiments. To address this comment, we have analyzed the rate of growth in our experiments in presence and absence of Tau (see quantification below). We found that the construction of Tau we used reduced microtubule growth rate. Therefore, we believe that microtubule growth was not responsible for the improved penetration of microtubule in dense actin networks in our assay, and that it was rather the crosslinking ability of Tau that played a significant role.

      1. The paper highlights the importance of anchoring both ends of microtubules to two adjacent actin networks for successful penetration into the actin meshworks. However, the precise mechanisms by which these microtubule ends are anchored to actin filaments are not elaborated upon. Providing detailed insights into this anchoring process would enhance the readers' comprehension of the experimental setup and its relevance to the observed results.

      Authors answer:

      We apologize for this lack of clarity. We don’t think that microtubule ends are “anchored” to the actin network. They are simply embedded into it. This embedding prevents them from moving rearward and thus lead to pressure increase as they polymerize.

      1. Additional information on the experimental methods is warranted to improve the reproducibility and clarity of the study. Specifically, the authors should elucidate the process through which nucleation-promoting factors were grafted onto lipid bilayers. This detail is crucial for researchers seeking to replicate or build upon the study's findings.

      Authors answer:

      We apologize for this lack of clarity. There was indeed an error in our description of SUV preparation with lipid-biotin. We have now revised our material and method section. In particular we have described more accurately the various steps we used to micropattern WA-streptavidin onto lipid-biotin.

      1. In Fig. 5D, the authors observed no significant difference in the breaching probability between microtubules that contacted the actin meshwork at an angle higher than 30°, with or without actin bundles. To ensure a better comparison, it is advisable to focus on quantifying the microtubules that are contacting two actin meshworks at both ends (the immobilized microtubules), as they would have similar probabilities of being pressurized by their growth. Moreover, further justification is required to explain the choice of 30° as the threshold angle and its significance in the context of microtubule behavior.

      Authors answer:

      We thank the reviewer for this comment. We apologize for the confusion. The quantification we made is precisely the one described by the reviewer. We made this more clear by adding further illustration of the two conditions and the measurement made.

      1. Fig. 5C appears to depict the "Distribution of the angle of the interaction of microtubules in the presence (10nM of Arp2/3 complex) or absence (100 nM of Arp2/3 complex) of actin bundles" instead of the "proportion of microtubules piercing the branched actin network." The alphabet labels in the figure should be updated accordingly. Additionally, the authors should clarify whether a comparison was conducted between the means of the angles in the two conditions and whether any observed differences were statistically significant.

      Authors answer:

      We apologize for this confusion. We updated the figure legend in which 5C and 5D were inverted.

      1. Investigating the potential significant difference in the mean interaction angles between the absence and presence of actin bundles would be intriguing. The presence of actin bundles might indeed influence the interaction angle or contact position, potentially increasing penetration frequency. This insight would further enrich the findings and provide valuable context for understanding the interplay between microtubules and actin networks.

      Authors answer:

      We apologize for this confusion. We now report the statistical difference. And indeed, it accounts for the difference it the penetration frequency, as shown by the absence of difference when we consider only microtubules that are more or less perpendicular to the network. This is indeed one of the most significant conclusion of our work. We added some schematics to make this clearer.

      1. More comprehensive information about the statistical analyses should be provided. This'd be important for the validity and reliability of the study's conclusions.

      Authors answer:

      We apologize for this lack of clarity. The statistical analysis we performed were not described in the Materials and Methods section but in each figure legend.

      Reviewer #3 (Significance):

      The work represents an advance in understanding the mechanism by which microtubules navigate dense actin meshworks.

      Authors answer:

      We thank the reviewer for this positive evaluation of our work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:<br /> In this study, the authors delineate the association of paralog dispensability with the frequency of homozygous deletions (HDs) and thereby show that paralog dispensability can play a significant role in shaping tumor genomes. The authors analyzed the strength of negative selection on the paralogs relative to the singletons using frequencies of the homozygous deletions (HD). The study focused on HDs because they ensure a complete loss of function, unlike other mutational aberrations that can be masked because of haplo-sufficiency. While accounting for potential confounding factors, authors find that paralogs tend to have a relatively high frequency of HDs, suggesting a relaxed negative selection. Furthermore, the authors specifically attribute this association to the dispensable paralogs by analyzing gene inactivation data generated from multiple experimental systems. Overall, the findings of this study can potentially have significant implications in cancer biology field and specifically to the researchers studying cancer genome evolution.

      We thank the reviewer for the careful reading and positive assessment of our manuscript

      Major comments:

      1. To dissect further which dispensable paralogs are more likely to be associated with a high HD frequency, synthetic lethal paralogs could be compared with non-synthetic lethal ones.

      In the section titled 'Homozygous deletion frequency of paralog passengers is influenced by paralog properties' (begins from line #289), authors have shown that paralogs with a high frequency of HDs are more likely to have the properties of dispensability (in Figure 4). It seems that all of those properties are also associated with synthetic lethality as the authors identified in their previous study (DeKegel et al. 2021). Furthermore, as shown in the subsequent section ('Essential paralogs are less frequently homozygously deleted than non-essential paralogs', begins from line #344), the high HD is associated with the dispensable paralogs. Some of those dispensable paralogs are expected to be synthetic lethal. Therefore, the association of paralogs with a high frequency of HDs with experimentally validated or predicted sets of synthetic lethal paralogs could be tested. This may help authors to contextualize their findings in terms of genetic interactions between paralogs.

      We thank the reviewer for highlighting the potential relationship with our previous work. We agree that many of these properties are associated with synthetic lethality, but we note that they are also associated with single gene essentiality. This makes the relationship between synthetic lethality, essentiality, and deletion frequency somewhat difficult to dissect.

      Nonetheless we have tested, in a number of ways, whether there is a relationship between a paralog having a reported/predicted synthetic lethality and being homozygously deleted. We find no obvious connection between the two.

      We first tested using a set of synthetic lethal interactions identified by integrating molecular profiling data with genome wide CRISPR screens in a large panel of cancer cell lines (the data used to train the classifier in De Kegel et al, 2021). As there is an ascertainment bias in this dataset (paralogs must have frequent loss of function alterations / silencing to be tested) we restricted our analysis to only those paralog pairs tested for synthetic lethality. We identified no clear pattern (p>0.05, Fisher's Exact Test).

      We next tested using an integrated set of four combinatorial CRISPR screens (aggregated in De Kegel et al) where we considered a pair to be synthetic lethal if it was a hit in any screen and not synthetic lethal if it was screened at least once and never identified as a hit. Again we restricted our analysis to paralogs that were present in this dataset to prevent issues with ascertainment bias. We found no clear association.

      We further tested using a consensus dataset derived from the same combinatorial screens, where a pair were marked as synthetic lethal if they were identified as a hit in at least two screens and not synthetic lethal if they were screened at least twice and never identified as a hit. Again we restricted our analysis to paralogs that were present in this dataset and found no clear association.

      We finally tested using our predicted synthetic lethal interactions – annotating the top 3% of predictions as synthetic lethal and the remainder as non-synthetic lethal. The 3% threshold is similar to the observed frequency of synthetic lethality in the training set. In this case, as this dataset covers all paralogs considered, no restriction was necessary.

      None of the above analyses revealed a clear relationship between deletion frequency and synthetic lethality. A caveat of these analyses is that none of the experimental datasets are complete (covering only a minority of all paralog pairs) and they are all somewhat noisy. Furthermore, as we show in our modelling analysis (Fig S3) the observed homozygous deletions are far from saturating.

      However we think there may be a simpler explanation, beyond limitations of the data, for why we do not observe a relationship between HDs and synthetic lethality.

      As the reviewer notes, there is evidence in cell lines that one reason paralogs are more dispensable than singletons is because of buffering / redundant relationships as revealed by synthetic lethal interactions. These relationships therefore provide an explanation for why some paralogs are dispensable. As our primary claim is that paralogs are more frequently deleted because they are more dispensable we might anticipate a relationship between deletion frequency and synthetic lethality. However, by definition, synthetic lethal interactions can only be observed for non-essential (dispensable) genes. Therefore when analysing the overlap with synthetic lethal interactions we are primarily restricting our analyses to genes that are already individually dispensable. Consequently we might not anticipate observing any enrichment. The buffering relationship revealed by synthetic lethality provides an explanation for why a paralog is dispensable but once we are restricting our analysis to dispensable paralogs we do not necessarily expect to see further enrichment.

      We think that an ideal way to explore this question further would be to look at selection on deletions of pairs of paralogs – we anticipate that if a gene is dispensable because of paralog buffering then both paralogs should not be deleted simultaneously. However, the current copy number datasets are too small to evaluate such pairwise relationships. This is discussed in manuscript as follows:

      Analyzing the frequency with which two members of a paralog family are lost would provide more direct insight into the contribution of paralog redundancy, but due to the overall rarity of passenger gene HDs, we cannot make a comprehensive assessment of co-deletions here – e.g. among paralog pairs where both genes are non-drivers, and not on the same chromosome, only two pairs are co-deleted in at least one TCGA sample. Larger cohorts would also allow us to search for patterns of mutual exclusivity of HDs to identify genetic interactions – this has been applied for identifying interactions between driver genes [57,58]__, but is more challenging for interactions between non-driver genes, which are much less frequently altered.

      Minor comments:<br /> 1. The number of TCGA and ICGC tumor samples analyzed:<br /> As mentioned in the Results section (line #106), 9966 tumor samples were analyzed. However, the sample size mentioned in Figure 2A is 9951. Is the lower number shown in the figure due to the filtering procedure mentioned in the Methods section (line #455)? The change in sample sizes could be explained. A similar difference in sample sizes exists for the ICGC data also.

      The difference was indeed due to filtering process, but numbers were only provided in the methods. We have now addressed this in the main text :

      After removing a small number of ‘hyper-deleted’ samples (see Methods) we retained 9,951 samples for further analysis.

      1. The rationale behind setting the threshold at 100 HD genes to classify 'hyper-deleted' samples for TCGA (line #462) and ICGC data (line #473) could be explained.

      We excluded hyper-deleted samples to avoid any individual sample having undue influence on the genes observed to be ever deleted or indeed to influence the overall patterns observed. It is also common in analyses of selection in tumours that make use of mutational profiles (rather than copy number profiles) to exclude hypermutated samples (e.g. Martincorena et al, Cell 2017; Lopez et al, Nature 2020). However the exact threshold of 100 samples was somewhat arbitrary and this query prompted us to assess whether it had any significant impact on the results.

      We therefore repeated all analyses using a more stringent threshold (50 samples) and also without thresholding. Although the exact percentages and odds-ratios vary somewhat with the different thresholds, all major conclusions are still supported.

      We appreciate that this was minor comment and that reviewer did not request this new analysis, but in the absence of a strong justification for a single threshold we felt it appropriate to assess multiple thresholds (and none).

      1. Citation for DepMap is missing (caption of Figure 5). We have added the text below to the legend for Figure 5 :

      Essential genes for the DepMap dataset (Meyers et al, 2017) are obtained from a version of the data reprocessed in (De Kegel et al, 2021) to reduce off-target sgRNA effects (see Methods).

      CROSS-CONSULTATION COMMENTS<br /> Along the lines of Reviewer #3's second major comment, I have a suggestion, the possible benefits of which would depend on the target audience to which the authors intend to communicate their study.

      I would suggest including a brief comparison of the findings of this study which deal with human paralogs, with the findings in model organisms such as yeast, perhaps in the discussion section. To facilitate such a comparison, authors could try measuring the enrichments of, for example, molecular functions, gene families, types of genetic interactions, etc., among the paralogs associated with a high frequency of HDs and then discussing their comparison with what is known in the literature for paralogs in other model organisms that tend to be frequently deleted.

      Such a comparison could be of interest to the community of researchers working on other model organisms and put this study in a much broader context. However, as I said before, this would depend on the authors' intended target audience.

      We thank the reviewer for the suggestion. We have added an additional section to the discussion highlighting differences and similarities to the observations from yeast as follows:

      Much of our understanding of the factors that influence gene dispensability comes from studies in model organisms, in particular the budding yeast Saccharomyces cerevisiae [3,9,10,43,44]__. Analyses of the yeast gene deletion collection, a set of gene deletion mutants systematically generated in a single S. cerevisiae strain, revealed that paralogs were less likely to be essential than singleton genes [3,45]__. Furthermore, more detailed analyses of yeast paralogs revealed that paralogs from large families were less likely to be essential as were genes with highly sequence similar paralogs [43,44]__. Previous analyses, including our own, demonstrated that many of these trends are also evident when analyzing gene essentiality from CRISPR screens in cancer cell lines [12,13,15,35]__. Our results here are also consistent with these findings – many of the features that are associated with paralog dispensability in yeast are also associated with gene deletion frequency in tumor genomes.

      The connection between the budding yeast observations and those in cancer is less clear when it comes to the relative dispensability of WGDs and SSDs. Analyses of the yeast gene deletion collection revealed that SSDs are more likely to be essential than WGDs in the single genetic background studied [43,44]__. In our previous analyses of gene essentiality in hundreds of cancer cell lines we found that SSDs were more likely to be broadly essential (essential in most cell lines) than WGDs but that WGDs were less likely to be never essential (i.e. more likely to be essential in at least one cell line)__[13]__. As the analyses of gene essentiality in budding yeast were generated in a single genetic background the concordance with our cancer cell line results was difficult to assess, but as gene deletion collections are now being generated in additional yeast strains it should become possible to perform a more direct comparison__[46–48]__.

      Here we found that WGDs are less likely to be deleted than SSDs in tumors. This is surprising in light of the yeast gene deletion collection results, where SSDs were more likely to be essential than WGDs in the strain studied, but less so in light of the cancer cell line results, where WGDs were less likely to be never essential. It is also worth noting that experimental evolution studies in yeast found that SSDs accumulate protein-altering mutations at a higher rate than WGDs [49,50]__. These results are perhaps especially relevant when analyzing the influence of paralog features on selection in tumors.

      We note that there are many additional differences in the features of WGDs and SSDs in budding yeast that may alter their relative dispensability in tumors. An obvious large scale difference is that in the ancestor of humans there were two rounds of whole genome duplication compared to a single duplication event in yeast__[51,52]__. Less obvious, but potentially of importance for cancer, is that the two classes of paralogs are enriched in pathways in humans that do not have obvious counterparts in yeast. For example, WGDs are highly enriched in signaling pathways involved in development while SSDs are enriched in immune response genes__[53]__. How the membership of these pathways influences the dispensability and selection of genes in tumors and cancer cell lines warrants further study.

      Reviewer #1 (Significance):

      As the authors note in their manuscript, it is expected that paralog dispensability could be associated with the relaxed negative selection in tumor genomes because (1) paralogs are prevalent in the human genome, and (2) many of them are dispensable, as apparent from the large-scale gene inactivation screens in hundreds of cancer cell lines (Blomen et al. 2015, Wang et al. 2015, Dandage and Landry 2019, De Kegel and Ryan 2019). However, direct mapping of this association, while importantly accounting for potential confounding factors, has been lacking.<br /> As a researcher with prior experience in the research topics such as gene duplication and genetic interactions, it appears to me that this study presents formal proof of the important association between paralog dispensability and tumor genome evolution which could be of major implication for the research community of cancer biology field and specifically to the researchers dealing with the topics such as cancer evolution, copy number alterations in cancer genomes, and synthetic lethality-based precision oncology therapeutics.

      Thank you again for the positive assessment.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Here, De Kegel & Ryan analyse thousands of tumour samples from the TCGA and ICGC projects to identify homozygously deleted genes, finding that about 40% of protein-coding genes are deleted in at least one sample. They find homozygously deleted genes to be enriched for paralogous genes, and further, more frequently deleted genes are increasingly likely to be paralogs. The authors then test the influence of several factors on the likelihood of being deleted, including gene length, distance to a fragile site or chromosomal region, and distance to a recurrently deleted tumour suppressor gene (TSG). They find that proximity of a TSG, telomere, centromere, and fragile site all increase likelihood of being deleted in a sample, as does gene length. Having a paralog also remains an important predictor of deletion after accounting for these other factors. Additionally, the more similar in sequence the closest paralog is to the gene and having a larger gene family size are also predictive of deletion. Conversely, if a gene is a whole genome duplicate as opposed to a small-scale duplicate, it is less likely to be deleted. Finally, the authors test the hypothesis that paralogs that are deleted in cancer are less likely to be essential and find that this is indeed the case.

      Comments

      The authors have done a good job of identifying trends of paralog deletion in cancer samples and the factors influencing them. The results are well described and presented and support the conclusions. I like the inclusion of the saturation analysis as an estimate of what to expect given current and potential future sample sizes, and I appreciate the inclusion of a WGD/SSD paralog distinction. The data and methods are sufficiently detailed. I have a few minor comments below.

      We thank the reviewer for the careful reading and positive assessment of our manuscript

      1. Around line 160 in the text and supplemental figure 4A, the authors test if the trends they see are observed across individual cancer types. With 9 of 33 cancer types reaching a sample size threshold, 8 of 9 comparisons are significant. The authors do not state correcting for multiple testing.

      We have now also assessed the significance of the results after performing a Holm-Bonferroni correction for multiple hypothesis testing and find that all 8/9 cancer types remain significant.

      1. I initially misunderstood the hemizygously deletion analysis, thinking the analysis in supplement figure 4B/C was asking if a sample has any singleton or any paralog deleted and comparing the number of samples with any deletion of either - given the number of genes deleted per sample this wouldn't make sense as a good test. I think the authors are actually comparing the number of loss-of-hemizygosity events per gene and grouping by paralog/singleton. I think this is a good analysis, but I think it would be helpful to clarify this in the text and figure legend e.g. "Samples w/ gene LOH" could be "LOH events per gene" or something similar.

      As suggested we have now updated the y-axis label in these charts to ‘LOH events per gene’. We note that there are now two additional panels in this figure to address copy neutral LOH, per Reviewer 3’s request.

      1. Occasionally, I wanted some more detail in the text for context, which was sometimes later provided - e.g. I noted when reading about line 125 that I was curious at this point how often TSGs occurred on segments, and this was later provided on line 241. Similarly, around line 114 I was curious how many genes are typically deleted per HD segment, for which the median value was provided on line 206 (and distribution in supplemental figure 1), and again for hemizygous deletions. I think sometimes it would be helpful to provide this context earlier in the text to aid interpretation of the results.

      We thank the reviewer for these suggestions which we have now incorporated into the text.

      On line 115 (previously 114) the relevant sentence now reads:

      Typically an HD that results in the loss of a protein coding gene also results in the loss of several chromosomally adjacent genes – in the TCGA dataset a median of three genes are lost per gene-deleting HD segment

      On line 124 the relevant sentence now reads:

      We found that almost half (49%) of the HDs that result in the loss of at least one protein coding gene overlap a known tumor suppressor.

      1. In the discussion, on line 420, the authors include the point that a paralog might not be required at all in a tumour cell and therefore easily deleted. I think this possibility could be expanded on here and in the introduction/results section, as it is an important point. I think it would be helpful to include more about the possibility that a paralog might be deleted in a tumour cell because it is simply not required or that is more likely to have less of a phenotypic impact compared to a singleton, and that this could be a reason for the observed enrichment of paralogs in deleted genes. A citation to support this point could be Áine N O'Toole, Laurence D Hurst, Aoife McLysaght, Faster Evolving Primate Genes Are More Likely to Duplicate, Molecular Biology and Evolution, Volume 35, Issue 1, January 2018, Pages 107-118, https://doi.org/10.1093/molbev/msx270. Duplicate genes can be duplicates because copy number variation of them has minimal impact.

      We thank the reviewer for raising this important point.

      We have briefly addressed this in the introduction as follows:

      In multiple model organisms, paralogs have been demonstrated to be more dispensable than singletons (genes without a paralog) [3–5]__. There are a number of reasons for why a paralog might be more dispensable than a singleton gene, including preferential retention of duplications of non-essential genes [6,7]__, but perhaps the most obvious explanation is buffering between paralogs.

      Where references 6 and 7 are as follows:

      1. O’Toole ÁN, Hurst LD, McLysaght A. Faster Evolving Primate Genes Are More Likely to Duplicate. Mol Biol Evol. 2018;35: 107–118.
      2. He X, Zhang J. Higher duplicability of less important genes in yeast genomes. Mol Biol Evol. 2006;23: 144–151.

      We discuss this more comprehensively in the discussion as follows:

      In both yeast and cancer there are a number of reasons for why paralogs might be more dispensable than singleton genes. Perhaps the most obvious is the existence of buffering relationships between paralog pairs, such that when one paralog is lost the other paralog can compensate for this loss. Such buffering relationships between paralogs can be revealed through synthetic lethality screens and a number of recurrently deleted paralogs in cancer have already been reported to display synthetic lethal interactions with their paralog (recently reviewed in [54]__). Supporting this model, in previous work analysing essentiality in cancer cell lines we found that buffering relationships between paralogs could explain 13-17% of cases where a paralog was essential in some cell lines but not others__[13]__. This suggests that at least some of the increased dispensability of paralogs in cancer cells can be attributed to buffering relationships between paralog pairs. However this is not the only explanation for paralogs displaying increased dispensability in tumour cells. An additional explanation is that paralogs may perform essential functions in specific contexts (e.g. within specific tissues or at specific developmental stages) but are not required within the specific context of a tumour. Consistent with this model, human paralogs are more likely to display tissue-specific expression patterns [55]__. Finally we note that there is evidence to suggest that genes whose perturbation has a lower phenotypic impact may more ‘duplicable’ – i.e. rather than paralogs being under weaker selection because they are duplicated, their duplication was tolerated because they were already under weaker selection__[6,7]__. Teasing apart the relative contributions of these factors to the increased dispensability of paralogs in cancer will require further research and potentially new data resources such as gene essentiality profiles in diverse non-cancer cell types [56]__.

      CROSS-CONSULTATION COMMENTS<br /> I agree, that's a helpful suggestion from reviewer 1.

      Reviewer 3's suggestion regarding age of the two whole genome duplication events is quite difficult to unpick as the duplication events seem to have happened relatively close in time to each other while rediploidisation of the first was occurring. Additionally, paralogs from SSDs tend to be more similar in sequence simply because the two WGD events are relatively old while SSDs can occur at any time up to present. They're therefore biased by young duplicates that have not had the opportunity to diverged much and decrease in sequence similarity.

      We appreciate these comments.

      Reviewer #2 (Significance):

      This is a novel study as it examines the frequency of paralog deletion in cancer samples and the factors influencing it, building upon work already conducted in cancer cell lines. This study extends the knowledge of the field confirming previous trends observed in cell lines, this time in actual cancer samples. It confirms that paralogs are more dispensable than singletons, likely because they have a similar counterpart that can provide some level of functional redundancy. The more similar the closest paralog, the more likely it is to be deleted provides support for this.<br /> It is certainly limited by the number of samples currently available in the two cancer sample projects included but the authors attempt to quantify how limiting this sample size is by conducting a saturation analysis using down-sampling to estimate how many gene deletions one can expect from different numbers of samples. This is important as the lack of observance of many gene deletions is likely due to the limited sample size and not due to negative selection. This low observance of gene deletions disappointingly limits further testing beyond single paralogs to consider the deletion effects of multiple gene family members and more directly test evidence of functional redundancy between paralogs. The authors provide a good discussion of the limitations of their study.

      The results are of interest to evolutionary biologists and cancer biologists. Those with an interest in duplicate genes, and/or factors affecting gene loss in tumours will be interested in this work.

      My field of expertise is molecular evolution, gene duplication and copy number variation.

      We thank the reviewer for the positive assessment of the significance of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Thank you review "Paralog dispensability shapes homozygous deletion patterns in tumor genomes" by DeKegel et al. This manuscript uses TCGA and ICGC tumor data to show evidence for paralog dispensability. They analyze the rate of homozygous deletions and show that it is higher for paralogs compared to singletons. Their findings are robust to a number of confounding variables that they take into account e.g. distance to tumor suppressor, telomere, centromere or fragile site. They show that paralogs that belong to large families and have higher sequence identity tend to show more dispensability and these dispensable paralogs are less likely to be WGD.

      We thank the reviewer for the time taken to review our manuscript.

      Major comments.<br /> 1. Does the finding pertaining to lack of enrichment of paralogs in regions LOH take into account whether LOH is copy neutral or not i.e. how does dosage affects this finding? Is it possible that there is a difference in paralog rate in LOH that results in total copy 1 and that the presence of copy neutral LOH masks the effect? Also, Integration of gene expression dataset would be helpful to resolve the difference between dosage paralog that compensate of the lack of their sister by upregulating their gene expression.

      In the submitted manuscript we focussed solely on LOH events where the copy number of one allele was 0 and the other allele was ≥1. These include copy loss events (total copy number = 1), copy neutral events (total copy = 2), as well as amplifications (total copy number > 2). The rationale for this approach was that we were interested in understanding whether the mechanism that was generating deletions was preferentially generating deletions in paralog-rich regions.

      However, we agree that understanding the influence of dosage is of interest here. We have therefore expanded the analysis in the paper to separately assess the enrichment of paralogs in copy neutral LOH regions (total copy number = 2) and copy loss LOH regions (total copy number = 1).

      As shown in the new updated Figure S4B we do not find an enrichment of paralogs in genes subject to either copy neutral LOH or copy loss LOH.

      The relevant section of the text on page 6 now reads :

      We do not find that paralogs are more frequently subject to LOH than singletons in either the TCGA or ICGC cohort (Fig. S4B-C); when considering all LOH segments we even see that singletons are slightly more frequently subject to LOH in the ICGC cohort (Fig. S4C, left), but when considering only focal LOH segments – i.e. segments whose length is less than half of the chromosome arm’s length, which is the case for all HD segments – there is no significant difference between paralog and singleton LOH frequency in either cohort. To assess whether gene dosage influenced the observed LOH frequency we further restricted our analysis to copy neutral LOH events (total copy number = 2) and copy loss LOH events (total copy number = 1) and again found no significant increase in deletion frequency of paralogs compared to singletons (Fig. S4B-C).

      Regarding the integration of gene expression to identify dosage compensation between paralogs – we agree that this is extremely interesting. However, it is quite challenging to address properly. Most paralogs are only observed to be homozygously deleted a single time and so statistically identifying how loss of one gene impacts the mRNA abundance of another is challenging. In the minority of cases where a paralog is recurrently deleted, often these deletions occur in samples from different cancer types and so integrating transcriptomic data still presents some technical challenges. Given this complexity, and as the question of dosage compensation is not central to our key observations, we have not integrated transcriptomic data here.

      1. Is the finding that paralogs are depleted among WGD is influenced by the age of WGD since there are 2 WGD events? Do SSD tend to be more or less similar by seq than WGD? This should be explored further since this observation is the opposite of what is observed in model organisms such as yeast whereby SSD are less functionally similar than WGD and often show properties similar to singletons than WGD.

      As noted by reviewer 2 in the cross commentary, it is extremely challenging to age the duplicates that arose from the WGD due to the close temporal proximity of the two whole genome duplication events. In the dataset of paralogs analysed used here, SSDs have lower average sequence identity than WGDs. However we note that both sequence identity and duplication type are included in our regression analysis (Figure 4D) and both are significantly associated with homozygous deletion frequently.

      This should be explored further since this observation is the opposite of what is observed in model organisms such as yeast whereby SSD are less functionally similar than WGD and often show properties similar to singletons than WGD.

      We do not actually think that our results are in opposition to the findings from model organisms. The bulk of studies on the functional consequences of deletions of SSDs/WGDs in model organisms are derived from analyses of the budding yeast gene deletion collection, which is generated in a single strain and grown in lab conditions. Consequently these studies report on which genes can be lost in a single genetic background when grown in rich media. We think it is not fully clear how these findings will apply in the context of a panel of genetically heterogenous tumours derived from multiple different cell types. We note that there are additional complexities when analysing human genes (tissue types, two rounds of WGD, metazoan specific pathways enriched in either WGDs/SSDs) that make a straightforward comparison with yeast challenging. We also note that although the results of analyses of the yeast gene deletion collection suggest that SSDs are more likely to be essential than WGDs, experimental evolution studies have demonstrated that SSDs are more likely to accumulate protein altering mutations than SSDs (Keane et al, Genome Research 2014; Fares et al, PLoS Genetics 2013). This is not what would expect based on the analyses of the yeast gene deletion collection, but is closer to what we observe for tumour genomes where SSDs are more likely to be homozygously deleted.

      We agree that we did not adequately discuss these issues in the previous version of our manuscript and so have added a new section to the discussion where we compare our results here with those from budding yeast:

      Much of our understanding of the factors that influence gene dispensability comes from studies in model organisms, in particular the budding yeast Saccharomyces cerevisiae [3,9,10,43,44]__. Analyses of the yeast gene deletion collection, a set of gene deletion mutants systematically generated in a single S. cerevisiae strain, revealed that paralogs were less likely to be essential than singleton genes [3,45]__. Furthermore, more detailed analyses of yeast paralogs revealed that paralogs from large families were less likely to be essential as were genes with highly sequence similar paralogs [43,44]__. Previous analyses, including our own, demonstrated that many of these trends are also evident when analyzing gene essentiality from CRISPR screens in cancer cell lines [12,13,15,35]__. Our results here are also consistent with these findings – many of the features that are associated with paralog dispensability in yeast are also associated with gene deletion frequency in tumor genomes.

      The connection between the budding yeast observations and those in cancer is less clear when it comes to the relative dispensability of WGDs and SSDs. Analyses of the yeast gene deletion collection revealed that SSDs are more likely to be essential than WGDs in the single genetic background studied [43,44]__. In our previous analyses of gene essentiality in hundreds of cancer cell lines we found that SSDs were more likely to be broadly essential (essential in most cell lines) than WGDs but that WGDs were less likely to be never essential (i.e. more likely to be essential in at least one cell line)__[13]__. As the analyses of gene essentiality in budding yeast were generated in a single genetic background the concordance with our cancer cell line results was difficult to assess, but as gene deletion collections are now being generated in additional yeast strains it should become possible to perform a more direct comparison__[46–48]__.

      Here we found that WGDs are less likely to be deleted than SSDs in tumors. This is surprising in light of the yeast gene deletion collection results, where SSDs were more likely to be essential than WGDs in the strain studied, but less so in light of the cancer cell line results, where WGDs were less likely to be never essential. It is also worth noting that experimental evolution studies in yeast found that SSDs accumulate protein-altering mutations at a higher rate than WGDs [49,50]__. These results are perhaps especially relevant when analyzing the influence of paralog features on selection in tumors.

      We note that there are many additional differences in the features of WGDs and SSDs in budding yeast that may alter their relative dispensability in tumors. An obvious large scale difference is that in the ancestor of humans there were two rounds of whole genome duplication compared to a single duplication event in yeast__[51,52]__. Less obvious, but potentially of importance for cancer, is that the two classes of paralogs are enriched in pathways in humans that do not have obvious counterparts in yeast. For example, WGDs are highly enriched in signaling pathways involved in development while SSDs are enriched in immune response genes__[53]__. How the membership of these pathways influences the dispensability and selection of genes in tumors and cancer cell lines warrants further study.

      Minor comments<br /> 1. There is a missing reference on line 55.

      We thank the reviewer for catching this oversight. We have now added a reference to Zerbino et al, NAR 2018 on this line.

      CROSS-CONSULTATION COMMENTS<br /> That's a good suggestion by reviewer 1. Homozygous deletion collection is available in yeast so these data can be used directly in addition tot he haploid gene deletion collection data.

      Since authors of this manuscript included in their analysis the comparison of WGD and SSD then they should do it more thoroughly. It is not sufficient what they presented here especially given that it contradicts the findings from model organisms.

      As noted above we have now added a significant discussion of the yeast findings and also of the SSD/WGD observations

      Reviewer #3 (Significance):

      This work provides the first systematic assessment of paralog dispensability specifically looking at homozygous deletions of paralogs across primary tumor samples and builds on the existing findings in cancer cell lines. It will be broadly interesting to those studying duplicated gene evolution and genome robustness. My expertise is in complex genetic networks in yeast and human cancer as well as genome evolution.

      We thank the reviewer for the positive assessment of our manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank all reviewers for taking the time to evaluate our manuscript. Many helpful suggestions and discussion points were raised. These comments were instrumental to provide more data that strengthen our conclusion about the relevance of centrin condensation in vivo, expand our findings to other organisms, and improve the manuscript in general. Details are given in the following individual replies.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Voss and colleagues show calcium-dependent assembly of Plasmodium falciparum centrins in vitro and in parasites. This assembly is dependent on the EF-hands of centrin and an N-terminal disordered region.

      Major concerns:

      1. The very definitive title is not wholly supported by the data. This should be qualified by specifying the conditions under which the centrins can accumulate in this way.

      We understand this comment by the reviewer. There are multiple dimensions to the potential of centrins to condensate, such as the specific centrin family member, in vivo vs in vitro situation, and media conditions. Naturally it is difficult to represent these various conditions in a concise and compelling title but in line with the suggestion by Reviewer 2 we are changing the title to “Malaria parasite centrins can assemble by Ca2+-inducible condensation” to reflect the conditionality of this process.

      1. A major concern is whether this behaviour of centrins represents a biologically relevant mechanism in centriolar plaque formation. Is this limited to high overexpression conditions or in vitro high concentrations? Or is it a result of the tagging of the P. falciparum centrins?...

      Centrin accumulation at the centriolar plaque and assembly of the centriolar plaque itself must be differentiated. Although compelling we are already very careful in the text about extrapolating our findings about centrin accumulation in cells to centriolar plaque or centrosomal assembly in general. We, however, thank the reviewer for this important comment and now have carried out hexanediol treatment of wild type parasites to test the effect on centrin in a native context. After IFA staining we failed to detect any centrin foci at the centriolar plaques, suggesting that they can be resolved by inhibiting weak hydrophobic interactions that are typical for phase separation (now Fig. 6, lines 283ff).

      Concerning the effect of tagging we have generated new data of cells overexpressing an untagged version of PfCen1 in parasites, which still shows formation of ECCAs as revealed by IFA (now Fig. 4H-K, lines 243ff). This significantly alleviates the concern that the observed phenomenon is only a consequence of GFP-tagging. Our in vitro data already showed that native and tagged PfCentrin1 & 3 can undergo condensation.

      Concerning the critical concentration of our in vitro assay we find it to be around 10-15 µM without the addition of crowding agents such as PEG (now Fig. S3C, lines 120ff). To our understanding it is challenging to select an in vitro concentration that is adequate to define a threshold for “biological relevance” due to so many additional factors playing a role in vivo. Those factors can also favor a phase separation locally when total saturation concentration is not reached as we now discuss in more detail (lines 440ff). For reference the critical concentration of FUS, which is one of the most studied phase separating proteins in model system, is around 2 µM, but concentrations below 15 µM are well within the range of what is observed for in vitro LLPS. Additionally, it is important to consider that we find Cen1/3 and HsCen2 LLPS is inducible and reversible and that very homologous proteins i.e. Cen2 and 4 serve as an adequate internal control.

      … A convincing approach to addressing this issue would be to knock-in a fluorescent tag to the centrin loci. Roques et al. (ref. 12 in this submission) report the GFP tagging of centrin-4 in P. berghei, although they note that centrins-1 to -3 were refractory to tagging in this organism. It is unclear whether Voss et al. attempted this tagging in P. falciparum. This should be clarified and relevant data presented.

      We indeed attempted several unsuccessful iterations of tagging Cen1/3 with HA and GFP tag and now explain this in the text more clearly (lines 81ff). We did not attempt tagging Cen2 and 4 as they do not display phase separation in vitro or carry IDRs.

      If the tagged molecules used in the biochemical parts of this study are functional, it is challenging to understand why the centrins cannot be tagged in P. falciparum. If the tags render the P. falciparum centrins dysfunctional, the study becomes significantly less useful.

      Our data shows that in vitro Cen1-GFP can undergo Ca2+-inducible and reversible LLPS and that GFP-tagged centrins can still localize to the centriolar plaque. Centrin function, however, certainly goes beyond its ability to condensate and localize. It is easily conceivable that interaction with critical binding partners at the centriolar plaque is inhibited by tagging a protein as small as centrin, which prohibits tagging the endogenous version, while its ability to phase separate remains unaltered. To dynamically study a protein in cells tagging is, however, unavoidable. Even though tagging affects any proteins function to highly variable degree we are still convinced that studying those proteins still provides useful information. Our mutant versions of PfCen1 in vivo shows that non-condensating version display different localization. Importantly, as mentioned above, we now provide images of cells overexpressing an untagged Cen1 version, which still causes ECCA formation (Fig. 5H-K). Ultimately, even though tagged versions might not be fully functional, our observations are compatible with the ability of centrins to condensate in vivo.

      1. If a knock-in cannot be achieved, it must be shown that the transgenic expression of tagged Plasmodium centrins does not confound the analysis of centrin behaviour. It is known that these proteins can behave anomalously when overexpressed (Yang et al. 2010, PMID: 20980622; Prosser et al. 2009, PMID: 19139275), at least in other species.

      Thank you for this comment. Transgenic expression of proteins can in principle influence their behavior. In the context of this study the overexpression is, however, used intentionally since protein concentration correlates with the phase separation. Here, transgenic overexpression is used as a tool, rather than being a confounding factor, and ECCA formation can be used as quantifiable phenotype. The observation that ECCAs appear significantly earlier the higher they are expressed is in our opinion one of the stronger points of evidence that this result from phase separation in vivo. Yet centrins maintain their centriolar plaque localization and no significant impact on growth is observed. To definitely answer whether phase separation of endogenous centrin is occurring during centriolar plaque accumulation is challenging. These challenges and limitations are now addressed in the significantly extended discussion. As explained above untagged Cen1 also forms ECCAs.

      A previous description of centriolar plaque from the authors' lab (Simon et al. 2021, PMID: 34535568) shows an organized structure of an established size. It should be demonstrated whether the structures formed with the GFP tagged centrins show the same dimensions and dynamics as those in wild-type parasites. The extent of the overexpression of the GFP-tagged centrins should also be demonstrated.

      We thank the reviewer for this suggestion. We have now added spatial measurements of the centrin signal dimensions at the centriolar plaque of mitotic spindle containing nuclei in PfCen1-GFP overexpressing vs non-induced cell lines. We found that the width of the centrin-signal at the centriolar plaque was unaltered while the height only increased by 11% (Fig. S9). Further, we found no significant growth phenotype in overexpressing parasites, which indicates that the centriolar plaque is functional.

      Due to several confounding factors, we were, unfortunately, unable to clearly quantify the extent of overexpression. Most notably the induction of overexpression only works in about 50% of the cells (Fig. S6). The mean intensity after induction further displays quite some variability. Furthermore, the expression kinetics along the IDC of endogenous centrin and our overexpression system that we use as a tool differ. Lastly, our centrin antibodies display crossreactivity (see also Fig. S12) making it impossible to identify how much of the endogenous pool we are labeling in comparison to the GFP- tagged Cen1 protein.

      1. It would also be useful to remove the His tag from the recombinantly expressed and purified centrins for the in vitro analyses, particularly if concern remains about the impact of tags on Plasmodium centrin behaviour.

      Based on the published in vitro studies on other centrins, we did not anticipate the His-tag to change LLPS properties. Also, Cen1 and 3 and Cen2 and 4 would need to be differentially affected by the tag. We further have experimented with N-terminally tagged 6His-Cen3 protein and found no significant differences in our turbidity assays. Nevertheless, we expressed new versions of the recombinant PfCen1-4 proteins with a TEV cleavage site inserted after the His-tag to purify untagged proteins and found no fundamental differences in our LLPS assay aside some slight variation in the kinetics (Fig. S3E).

      1. The discussion is very short and does not consider the findings presented here in the context of the literature, with respect to centrins, Plasmodium MTOC assembly mechanisms, or to general considerations around biological condensates. Andrea Musacchio's recent commentary (ref. 44 in the current submission) advocates caution in ascribing phase separation as an assembly mechanism for organelles in vivo, particularly on the basis of in vitro experiments with high concentrations of homogeneous protein. It is not clear that the concentration dependence of extracentrosomal centrin accumulations (ECCAs) at the onset of schizogony provides sufficient justification of a phase separation model in vivo. The authors' recent description of the involvement of an SFI1-like protein, SIp (Wenz et al. 2023 PMID: 37130129), in the centriolar plaque makes a case for non-homotypic interactions also driving assembly and alternative models for ECCA are not convincingly excluded. The absence of a robust discussion of such considerations is unhelpful to the reader.

      We very much thank the reviewer for this suggestion, which helped to significantly improve the manuscript. We have purposefully included the commentary by Andrea Musacchio to highlight a different (possibly the most antipodal) point of view on the role of biomolecular condensation in membraneless organelle formation for the unfamiliar readers that might be just getting to know the field of phase separation. In the absence of word limitations, the reviewer is right to point out the lack of more extensive discussion. We now have significantly extended this section and address the suggested points including the potential role of the novel centriolar plaque protein Slp, which was not published upon submission of our previous version (lines 450ff.)

      1. It is also unclear whether the analysis of human centrin is suggested to indicate a phase separation mechanism for centrins in human cells. As this is readily testable, this notion could be considered further. Although its experimental examination may lie outside the theme of this study, one would expect some discussion of the significance of the data presented in the study.

      Since it is the first description of phase separation of centrin, it would indeed be interesting to explore the functional relevance in other organisms such as humans. We are considering approaching this in the future. We have, as requested above, significantly extended the discussion and now also include this aspect. Earlier reports have e.g. shown centriole overduplication in human cells upon centrin overexpression.

      Minor points

      1. There are only three centrins in humans. Centrin 4 is a pseudogene (Gene ID: 729338 on NCBI).

      Thank you for detecting this error, which we now corrected (line 60). Centrin 4 seems only to be an expressed gene in mice.

      1. Line 175 should say 'temporally', rather than 'temporarily. The Abstract should say 'evolutionarily conserved', rather than 'evolutionary conserved'. 'To condensate' is not ideal as a phrase- 'to form a condensate' would be clearer.

      Thank you for those suggestions. The text has been modified accordingly.

      Referees cross-commenting

      I think the other 2 reviewers have made fair, cogent and constructive points. There is good convergence between the reviewers on the significant issues around the study. These concern in vivo and in vitro effects of tagging and of high concentrations.

      Reviewer #1 (Significance):

      The biology of the Plasmodium centriolar plaque is of great interest as an alternative MTOC structure, with obvious additional interest deriving from the role of this organism in malaria. Much remains to be learned about this structure, so the topic of this paper is likely to attract a broad readership. Furthermore, the centrins are a widely-expressed and evolutionarily conserved family of eukaryotic proteins, with multiple roles; a new model for their behaviour, such as is suggested here, would be of interest to many cell biologists.

      With that in mind, significant additional data should be provided to substantiate the model proposed by the authors.

      We appreciate that the reviewer considers our manuscript of interest for a broad audience. We feel that our modifications of the text including a more thorough contextualization and addition of some new experimental data now sufficiently supports our claims.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors analyzed the properties of the four Centrin proteins of the malaria parasite using a combination of in vitro and in vivo approaches. Their findings indicate that two of the four Plasmodium Centrin proteins, PfCen1 and PfCen3, as well as the human Centrin protein HsCen2, exhibit features of biomolecular condensates. Moreover, analysis of cells overexpressing PfCen1 indicates that such biomolecular condensates become more numerous as cells approach mitosis and are dissolved thereafter.

      Major comments

      A) A critical point that requires clarification is how the protein concentrations used in the in vitro and in vivo assays (20-200 microM in vitro, and not estimated in vivo) compare to that of the endogenous components. This is important because it may well be that 6His-tagged PfCen1, PfCen3 and HsCen2 can form biomolecular condensates when present in vast excess, but not when present in physiological concentrations. The authors should report the estimated cellular concentration of PfCen1-4, as well as that achieved upon PfCen1-GFP overexpression (on top of endogenous PfCen1), for instance using semi-quantitative immunoblotting analysis. Given this limitation, the authors may also want to temper their title by introducing the word "can" after "centrins".

      In the context of phase separation, protein concentration is of course a critical metric. However, in vitro and in vivo concentrations cannot be directly compared as the composition of the surrounding solute has a significant impact on the effective saturation concentration. In vitro we find a saturation concentration for Cen1 of 10-15 µM (Fig. S3C), which is within a range that is frequently found other in vitro studies as listed in the in vitro LLPS data base (PMID: 35025997). We now more explicitly discuss this in the text (lines 422ff). At this point, unfortunately, we have no means of investigating the absolute concentrations of centrin in vivo and to our knowledge no such data is available for apicomplexan. Additionally, one has to keep in mind the presence of other centrin family members in the cell which can interact and co-condensate as well as other centriolar plaque proteins, like PfSlp, but are difficult to separate through analysis. Further we now discuss several contexts that modify the saturation concentration in vivo (lines 440ff).

      As explained above in a response to Reviewer 1, we were not able to produce a satisfactory quantification of the overexpression levels. We are repasting the previous response here:

      “Due to several confounding factors we were, unfortunately, unable to clearly quantify the extent of overexpression. Most notably the induction of overexpression only works in about 50% of the cells (Fig. S6). The mean intensity after induction further displays quite some variability. Lastly the expression kinetics along the IDC of endogenous centrin and our overexpression system that we use as a tool differ. Lastly, our centrin antibodies display crossreactivity (see also Fig. S12) making it impossible to identify how much of the endogenous pool we are labeling in comparison to the GFP- tagged Cen1 protein. “

      Concerning the title, as explained above, we followed the suggestion and added the word “can”.

      B) Movies S1 and S2 (and the related Fig. 1D and 1E) are not the most convincing to support the notion that the observed assemblies are biomolecular condensates, as not much activity is going on during the recordings. Likewise, Movies S3, and even more so Movie S4, as out of focus for a large fraction of the time, making it difficult to assess what happens at the beginning of the process. Moreover, it appears that fusion events, while occurring, are rather rare. The movies should be exchanged for ones that are in focus, and ideally a rough quantification of fusion events as a function of biomolecular condensate size provided.

      We thank the reviewer for requesting clarification. Movies S1 and S2 are by no means direct evidence for biomolecular condensation and we do not claim them to be but rather say that they are “…reminiscent of biomolecular condensates…”. We think that this is an appropriate entry into the subsequent analyses. For Movie S1 it is noteworthy that the shape of the accumulation, which can only be resolved by super-resolution microscopy in live cells, is round as would be expected for a liquid condensate in the absence of forces and on these short time scales. Nevertheless, the centriolar plaque must be duplicated which might be the process partly depicted in Movie S2. The observation that centrin can be still change its shape at least suggests that it is not a solid aggregate. In the context of centriolar plaque biology and the technological advance of applying live cell STED in P. falciparum, we think these data are still worth reporting.

      Concerning Movies S3 and S4 we have carefully selected the focal plane to highlight all the hallmarks of LLPS. Since the protein droplets freely move in 3D throughout the entire imaged liquid volume there is no z-plane that is in focus. Our positioning of the focal plane presents the best compromise between showing round droplet shape, droplet fusion events, and surface wetting. All those observations demonstrate the liquid nature of the condensates. Fusion events are indeed relatively rare, and we do not go beyond this qualitative statement that it can be seen.

      C) An important control is missing from Fig. 2, namely assaying PfCen1-4 without the 6His tag, to ensure that the tag does not contribute to the observed behavior (although it can of course not be sufficient as evidenced by the lack of biomolecular condensates for PfCen2 and PfCen4).

      Thank you for this suggestion. Since reviewer 1 made a similar comment, I’m reiterating our previous reply here: Generally speaking, and based on the published in vitro studies on other centrins, we didn’t anticipate the very small His-tag to change LLPS properties. Also, Cen1 and 3 and Cen2 and 4 would need to be differentially affected by the tag. We further have experimented with N-terminally tagged 6xHis-Cen3 protein and found no significant differences in our turbidity assays. However, we expressed new versions of the recombinant PfCen1-4 proteins with a TEV cleavage site inserted after the His-tag to purify untagged proteins and found no significant differences in our LLPS assay (Fig. S3E).

      D) The authors should test whether the assemblies formed by PfCen1 and PfCen3 are sensitive to 1,6-hexanediol treatment, as expected for biomolecular condensates.

      This is an interesting and helpful suggestion. We now tested 1,6-hexanediol addition to recombinant PfCen1 and wildtype parasites (now Fig. 6). Interestingly the dissolving effect of hexanediol on PfCen1 in vitro was moderate, which we attribute to the polar component in centrin assembly, which has been documented earlier (Tourbez et al. 2004). In vivo, however, only 5 min of treatment caused a striking dissolution of most centrin foci in wild type parasites, which is compatible with the interpretation that centrin or centriolar plaque assembly could be driven by biomolecular condensation.

      E) The fact that HsCen2 also forms biomolecular condensates is very intriguing, but further investigation would be needed to assess the generality of these findings. For instance, the authors could test in vitro also S. cerevisiae Cdc31, the founding member of the Centrin family of proteins to further enhance the impact of their study.

      We thank the reviewer for this suggestion. It would of course be exciting to investigate in more detail how widely this biochemical property of some centrins is conserved. To take a first step in that direction, we have recombinantly expressed centrins containing some N-terminal IDRs from C. reinhardtii, T. brucei and S. cerevisiae to represent organism of significant evolutionary distance. Using our in vitro phase separation assays, we found a very similar behavior to PfCen1 for two centrins while yeast Cdc31, although forming droplets, had a much higher saturation concentration, which could be explained by the significantly lower intrinsic disorder in its sequence (now new Fig. 3).

      Minor comments

      1) For the experiments reported in Fig. 3D, the same concentrations as those used in Fig. 3A-C (namely 10 microM, and not 30 microM as in Fig. 3D) should be used. Moreover, it would be informative to test whether PfCen2 and PfCen4 as PfCen3 when added to PfCen1.

      Unfortunately, this experiment is not feasible since Cen3 does not produce droplets at 10 µM. Hence, in Fig. 3D we aimed to test if Cen1 is incorporated into preformed droplets i.e. whether there is still some interaction between them. We have, however, tested the addition of Cen2 to Cen1 and Cen3 and as expected from the inability PfCen2 to condensate we did not find the same synergistic effect as for Cen1 and 3 together (now Fig. S6). The combination of Cen1/2/3 still enabled co-condensation while addition of Cen4 did not further improve droplet formation. Taken together this strongly suggests that only Cen1 and 3 contribute to the phase separation in vitro (lines 184ff).

      2) The authors mention that the effect of Calcium in inducing biomolecular condensates is specific, as Magnesium was not effective (lines 94-95). However, an examination of Fig. S3B indicates that the Magnesium also exhibits some activity, albeit less potent than Calcium. The authors should discuss this point and rectify the wording in the main text.

      Thank you for pointing this out. While PfCen1 is not reactive to Magnesium, PfCen3 and HsCen2 do display a small reaction, which we now more clearly mention in the text (lines 118ff). Of note Mg2+ and other divalent cation are known to generally promote phase separation.

      3) Do the authors think that PfCen2 and PfCent4 localize to the centriolar plaque in vivo using another mechanism that deployed by PfCen1 and PfCent3? It would be good to discuss this point.

      This is indeed a point worth discussing. Centrins can of course still interact in the absence of biomolecular condensation and their localization to the centriolar plaque is not dependent on their ability to phase-separate as seen for PfCen2 and 4. We have recently described a novel centriolar plaque protein PfSlp that interacts with centrins and might assist recruitment (Wenz et al. 2023). Cellular condensates are, however, often separated into scaffold proteins, which actually phase separate and client protein which get recruited into those condensates. It is easily conceivable that Cen1 and 3 participate in formation of the biomolecular condensate into which Cen2 and 4 as well as other centriolar plaque proteins might be recruited. Unfortunately, we were not yet able to establish a recruitment hierarchy by e.g. dual-labeling of centrins to test whether PfCen1 and 3 might appear prior to PfCen2 and 4. We now include those aspects in the extended discussion.

      4) Given that the EFh-dead mutant exhibits no activity in vitro and fails to localize in vivo, one potential concern is that the protein is misfolded. The authors should conduct a CD spectrum to investigate this.

      Thank you for suggesting this relevant control experiment. We have carried out CD spectroscopy of wild type and EFh-dead PfCen1 and find no difference in secondary structure distribution. We now added these data to the supplemental information (now Fig. S14).

      5) It is not entirely clear from the main text in lines 103-104, as well as from the legend, what Fig. S3B shows. When was EDTA added in this case?

      Thank you for requesting clarification. We will assume the reviewer is referring to Fig S4B. We wanted to show that contrary to PfCen3 that PfCen1 droplets can still be resolved after an elongated period of incubation with calcium but forgot to mark the timepoint of EDTA addition at 180 min in the graph. We have now corrected this and further reworded the sentence for more clarity (lines 132ff).

      6) Fig. S7: the correlation between PfCen1-GFP expression levels and ECCA appearance is modest at best. What statistical test was applied? This should be spelled out. Moreover, the authors should combine the two data sets, as this will provide further statistical power to assess whether a correlation is truly present.

      Indeed, the correlation is modest but statistically significant, which is why we decided to place this data in the supplemental information. The used statistical test was an F-test provided by Prism, which compares two competing regression models, which we now mention in the legend. Combining the two data sets is unfortunately not possible since they arise from two independent sets of measurements where different imaging settings had to be used to adjust for the very different fluorescent protein levels in both lines after induction.

      7) The authors may want to discuss how their findings can be reconciled with the notion that Centrin assemble into a helical polymer on the inside of the centriole (doi: 10.1126/sciadv.aaz4137).

      This is an interesting point. Although centrin does localize to the inside of the centriole (https://doi.org/10.15252/embj.2022112107), more precisely one pool at the distal part and one pool at the core, there is no evidence that it is itself part of the helical inner scaffold described by the authors even though it might localize in close proximity to it. Further, there are several examples where polymers such as microtubules act as seeding point for biomolecular condensates or the other way around, and our work suggest this could be a potential working model for centrins. We have discussed our results extensively with the two corresponding authors of the aforementioned study (i.e. Virginie Hamel and Paul Guichard) and agreed that our data are not conflicting. Nevertheless, we include the inner centriole localization and potential association with polymer structures of centrin in our extended discussion.

      9) Likewise, the authors may want to speculate regarding what their findings signify for the role of Centrin proteins in detection of nucleotide excision repair (doi: 10.1083/jcb.201012093).

      We appreciate the comment by the reviewer. Centrins seem to have many different potential roles that remain to be clarified. While we are excited about this, we think it is too early to speculate about the impact of centrin condensation on less well studied aspects of centrins such as nucleotide excision repair. We, however, now cite this study in the discussion to highlight the functional diversity of centrins.

      Small things

      • Fig. 1A: change color for microtubules as red on red is difficult to discern.

      Throughout our publications we use this shade of magenta to label microtubules in schematics and have therefore opted to use a slightly brighter shade of red for the RBCs instead to improve visibility.

      • Fig. 1C: the indicated boxes in the top row do not seem to correspond exactly to the insets shown in the bottom row.

      We have verified the position of the boxes and found them to be accurate. Possibly the different imaging modality used for both panels (confocal vs STED) creates this impression.

      • line 266: typo, promotor > promoter.

      Has been corrected.

      • line 360: a reference should be provided for the GFP-booster, including the concentration at which it was used.

      Has been added.

      • line 363: "an" missing before "HC".

      Has been corrected.

      • line 428: it would be best to deposit the macros on Github or an analogous repository.

      Macros have been deposited on https://github.com/SeverinaKlaus/ImageJ-Macros (line 737)

      • line 461: "to the" is duplicated.

      Has been corrected.

      • Fig. S5A: maybe draw the lines in red (as red in Fig. S5B correspond to the proteins that do not have IDRs).

      Since we cannot easily change the line colors of the IDR graphs, we have inverted the font color for Fig. S5B instead.

      • Movie S7, legend: left frames shows PfCen1-GFP, not microtubules as currently stated.

      Has been corrected.

      Reviewer #2 (Significance):

      This is a provocative study that extends initial observations regarding self-assembly properties of Centrin proteins, and posits that some members of this evolutionarily conserved family can form biomolecular condensates. After the above outstanding issues have been properly addressed, these data could have important implications for understanding Centrin function in centriole biology and DNA repair. Therefore, these findings will be of interest to a cell biology audience.

      Field of expertise: cell biology.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The authors have provided a comprehensive characterisation of centrin proteins in Plasmodium falciparum. Through expression of episomal GFP-tagged centrin for in vitro, they were able to observe co-localisation of centrin with centriolar plaques during the replicative stage of the parasite. They also utilised live cell STED microscopy to track dynamic changes in centrin morphology. They have also demonstrated calcium-dependent phase separation dynamics in bacterially-expressed P. falciparum centrin and human centrin 2. The formation of liquid-liquid phase separation in PfCen1, 3 and HsCen2 tied well with IUPred3 predictions of intrinsically disordered regions in these proteins. Using an inducible DiCre overexpression system with two promoters of varying strengths, the authors have shown accumulation of centrin1 outside of centrosomes and premature appearance of centriolar plaques. Finally, changes on the centrin1 protein, i.e., N-terminal deletion, and mutations in calcium binding sites in the EFh domains, have shown a reduction in the formation of ECCAs during overexpression and inability to form LLPS in vitro, respectively.

      Major comments:

      1. Given that parasites cannot tolerate endogenous C-terminal tagging of some centrins (but not all, as PbCen4 was successfully tagged), has N-terminal tagging been attempted either by the authors or in previous publications? Note that this is not a request for further experimentation; rather, maybe this can be noted in the manuscript; and line 62 can be rephrased for transparency.

      We have not attempted N-terminal tagging ourselves but through personal communication with Rita Tewari we were informed that neither N- nor C-terminal tagging for PbCen1-3 was successful in the context of the study published by Roques et al 2018. We have only unsuccessfully attempted C-terminal tagging in several iterations. Due to importance of N-terminus for interaction and function in other organisms it is plausible that N-terminal tagging is even more unlikely to work. Since we have not exhaustively attempted every tagging strategy on every centrin we, as suggested, rephrased the text accordingly (lines 81ff).

      1. Is there a possibility that by adding a C-terminal tag, centrin may lose a specific function or cause change in the physicochemical properties of the protein (thus making C-terminal tagging lethal)? Was His tag removal attempted so the native protein can be used in the LLPS experiments? IUPred3 analysis showed potential IDR at the C-terminal end of PfCen4. Could the C-terminal tag have caused the protein to not form droplets in the presence of Ca2+?

      As we could show for PfCen1-GFP, the tag did not impair its ability to undergo LLPS which is at least partly mediated by the N-terminus, and that it could still properly localizes to the centriolar plaque. The fact that some endogenous centrins cannot be tagged suggest that there is a functional relevance to the C-terminus that could e.g. be an interaction with other essential centriolar plaque components. As suggested in a reply to Reviewer 1, we consider a substantial and centrin-specific effect of the small His-tag on phase separation unlikely. To be sure, we have repeated our turbidity assays with tag-free versions of PfCen1-4 and found no change in phase separation properties (now Fig. S3E).

      1. It has been shown by the authors that different tagged centrins co-condense which may support the localisation data (Figure 1C). However, is there a way to show that the episomally- and endogenously-expressed centrin co-localise with each other (e.g., confocal microscopy with anti-centrin vs anti-gfp in PfCen-GFP lines, that is if the authors have access to anti-centrin antibodies)? Has endogenous centrin been demonstrated to form ECCAs (in previous publications or by the authors)?

      These are important questions by the reviewer. Due to the high sequence homology centrin antibodies, even if raised against a specific centrin (such as PfCen3 in this study), will likely cross-react with other centrins. So far, we have not been able to produce a staining were the anti-GFP-positive foci are devoid of anti-centrin3 staining, which limits the interpretation of these data. The outer centriolar plaque compartment containing centrin is, however, well defined by now and the localization pattern of endogenous centrin and Centrin1 and 4-GFP seems identical. In a more recent study from our lab Cen1-GFP IP has identified other endogenous centrins as interaction partners (Wenz et al 2023), like the Roques et al. 2018 study did for PbCen4-GFP indicating that the tag does not abolish interaction between centrins. So far, we have never detected any ECCAs, nor have we identified any similar structure in the literature. This suggest that this is indeed a consequence of excessive centrin concentration. Importantly we now have added data from a new parasite line overexpressing untagged PfCen1 using the T2A skip peptide (pFIO+_GFP-T2A-Cen1) which displays ECCAs upon induction, showing that this effect is not a mere consequence of tagging (now Fig. 5H-K).

      Minor comments:

      1. How were the times (post addition of Ca2+) presented in Figure 2A determined?

      We noted down the time of calcium addition and cross-referenced it with the timestamps available in the metadata of the movie files (e.g. file creation timepoint marks the start of the movie). We now mention this in the legend.

      1. Line 126: Figure 1B should be Figure 1C

      2. Line 145: Figure 1C-D should be Figure 1D-E

      3. Line 151: Figure 3A should be Figure 4A

      Thank you for spotting these mistakes, which now have been corrected.

      1. Line 152: Suggest rephrasing "placing the gene of interest in front of the promoter" to "placing the gene of interest immediately downstream of the promoter" or something similar

      Thank you for this good suggestion.

      1. Any growth phenotype changes observed in the overexpressors?

      The parasite lines seem to silence the Cen1-4-GFP expression plasmids readily, which suggest that there might be a growth disadvantage. However, repeated attempts to quantify a growth phenotype were unsuccessful due to high variability in the data, which might be partly connected to the fact that the fraction of GFP positive cells after induction can vary between lines and replicas.

      1. How often are ECCAs observed in pARL strains, or are they not observed at all? This might be good to mention.

      ECCAs in the pArl strains have been observed on very limited instances but are too rare to be quantified. We now mention this in the text (lines 217ff).

      1. Line 192 and Figure S8: n {less than or equal to} 33 (either a typographical error and should have been {greater than or equal to}, otherwise, it may be expressed as a range)

      It was indeed a typographical error that was now corrected.

      1. Line 258: Methods on the generation of FIO/FIO+ was a bit difficult to understand. Maybe a simple plasmid schematic with the restriction sites (at least for the original plasmid) in the supplementary may help clarify this.

      Cloning strategy has been expanded with additional information for clarity.

      1. Line 295: include abbreviation of cRPMI here rather than in Line 303

      Has been corrected.

      1. Line 322: typographical error on WR99210 working concentration?

      Has been corrected.

      1. Line 372: Last sentence on area and raw integrated density measurement is unclear.

      We have reformulated the sentence for more clarity.

      1. Line 461: typographical error in last sentence

      Has been corrected.

      1. Line 532: Figure 4E should be Figure 4F

      Has been corrected.

      Reviewer #3 (Significance):

      DNA replication is vital to the survival of malaria parasites. A deeper understanding on their unusual form of replication may be exploited to find drug targets uniquely directed to the parasite. Biological insights from this work can also provide a jump-off point for unravelling unusual replication in other organisms. Data on the physicochemical analysis of centrin is not just of great interest for those in the field of parasitology, but also for those in the much wider fields of biology, physics and chemistry. Techniques presented in this work (e.g., DiCre overexpression with different promoters) can definitely be utilised for the elucidation of protein function within and outside the field of parasitology.

      My field of expertise is in Plasmodium spp., particularly in parasite replication, molecular and cellular biology, and epigenetics.

      We thank the reviewer for the appreciation of our work in terms of insight and technology development.

    1. What is this I hear of sorrow and weariness, Anger, discontent and drooping hopes? Degenerate sons and daughters, Life is too strong for you– It takes life to love Life.

      Putting this into my perspective there are many older adults in my life who say we are foolishly discontent and have drooping hopes about the future. I believe they are different from this section. I interpreted this as: Life is full of pain and we can be discontent but do not think that it will feel this way forever, with time we may learn to love the complexities of being alive

    1. In reading a novel, any novel, we have to know perfectly well that the whole thing is nonsense, and then, while reading, believe every word of it. Finally, when we’re done with it, we may find—if it’s a good novel—that we’re a bit different from what we were before we read it, that we have been changed a little, as if by having met a new face, crossed a street we never crossed before. But it’s very hard to say just what we learned, how we were changed.

      This article says that you cannot find truth in fiction. I strongly disagree,fiction is based on our realities; they are the fears and fantasies that we harbor in our minds and can reflect certain aspects of our society. To say that fiction and art are to be taken as lies and just that suggest that we are incapable of understanding and interpreting the metaphor that represents real life. Racism in space is still racism, using current events to try a peek at the future is not going to be accurate. The insite it can provide is however very telling. How the future looks to the individual writing the story and through this we can find the present. The lenses they see through can color the future and this glimpse of life from their perspective is important . Think about all the classic books we read in life. We don't read them for their accuracy, we read them for the insight and thought provoking ideas present.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: This manuscript describes molecular mechanisms by which ACBD3 is recruited to the Golgi complex. ACBD3 recruits PI4KIIIb which is required to generate PI4P, a phosphoinositide which is key for the recruitment of essential Golgi proteins and hence is key to Golgi identity. The authors have used a combination of mass spectrometry, high quality fluorescence imaging, transient CRISPR knockdowns, and biochemical approaches such as IPs to identify the key determinant for recruitment of ACBD3 to the Golgi complex. They map the interaction between ACBD3 and the Golgi as a unique region (UR) upstream of its GOLD domain, identifying, in particular, an MWT motif as key for this recruitment. Using mass spectrometry they identify several novel interactors of ACBD3 as well as some established binding partners. Knockdown of these interactors reveal a key role for the SNARE, SCFD1, where reduced levels lead to complete loss of ACBD3 localisation to the Golgi without apparent disruption of Golgi structure. They further validate this interaction and that of another SNARE (Sec22b), which is part of the same SNARE complex as SCFD1, mapping the interaction to the longin domain of Sec22b. Surprisingly however they demonstrate that the UR domain does not mediate the interaction between ACBD3 and these SNAREs suggesting an alternative mechanism of recruitment. Previously identified ACBD3 interactors, Golgi proteins giantin and golgin-45 were also identified in the mass spectrometry screen and the authors demonstrate that these two proteins can recruit ACBD2 to the Golgi and this is dependent on the MWT motif identified in the UR domain. By knocking down SCFD1, they show reduced recruitment of ACBD3 leading them to propose a model of sequential recruitment of ACBD3 by SCFD1 followed by interactions with the golgins.

      Major points: This study is a well-executed and rigorous study of the molecular requirements for the recruitment of ACBD3 to the Golgi. The experimental approaches are state-of-the-art and the data are clean and convincing. The only caveat, raised by the authors themselves, is their interpretation that there are two sequential steps for Golgi recruitment of ACBD3. While they show that loss of SCFD1 reduces the interaction of ACBD3 with giantin and golgin 45, their model depends on doing the reverse experiment, i.e. assessing the effects of knocking down either giantin or golgin-45. This is especially relevant given the demonstration that golgin-45 is sufficient to recruit ACBD3 to mitochondria. It may well be that recruitment involves a tripartite complex, which is not uncommon in vesicular transport mechanisms Giantin is not an essential protein do it should be feasible to perform this experiment. The authors are equipped in the quantitative fluorescence microscopy which would be required and which would help resolve whether sequential or redundant mechanisms are required for ACBD3 recruitment.

      We thank the reviewer for the positive comments and are glad that they consider our study "well-executed and rigorous". We totally agree with the reviewer that our conclusions regarding the sequential aspect of the recruitment of ACBD3 in the original submission could be better supported. We have worked to strengthen this in our resubmission. As the reviewer states, this limitation was already discussed in the original submission. To further support our model, we have performed the experiment suggested by the reviewer, in which we test the effects of knocking down both giantin and golgin45 (double knockdown) on the binding of ACBD3 to SCFD1.

      The results of this experiment further support our sequential model with little to no effect of loss of the Golgins on ACBD3. As we already knew, a large effect of SCFD1 KO on the binding of the Golgins to ACBD3 was also observed here. We should note that this was performed in a different cell line than before (HeLa cells rather than HEK cells), as the efficiency of multiple knockdowns was much lower in HEK cells, as determined by qPCR. Taken together, the new data in Figure 7 supports a sequential model for Golgi recruitment. We also agree that other, less likely models could explain our data and have included this openly in the discussion. In conclusion, we thank the reviewer for their comments and have revised the manuscript with a new experiment with the relevant repeats, which supports our model.

      Reviewer #1 (Significance):

      Significance PI4P is a phosphoinositide that is important for the recruitment of Golgi proteins. As with most PIs it is likely to act by coincidence detection in that Golgi associated proteins will recognise PI4P as well as other factors on Golgi membranes. This results in different local membrane environments which will be specific for particular functions. PI4KIII__b_ is key for PI4P production although the absolute levels of PI4P are likely to be determined by a balance of lipid kinases and phosphatases. However, since ACBD3 is key for the recruitment of PI4KIII__b, it is important to understand the molecular mechanisms by which it is recruited. The manuscript thus makes a significant contribution to understanding one of the underlying mechanisms for PI4KIII__b _recruitment although, as indicated above, stops short of establishing a clear model for the roles SCDF1 and Sec22b versus golgin 45 and giantin. For the future it will be of interest to determine why either a sequential or a redundant mechanism is required for the recruitment of ACBD3 as a scaffold protein.

      We thank the reviewer for this set of positive comments on the manuscript and for agreeing that this is a significant contribution. Our revised version further supports our sequential model of ACBD3 recruitment to the Golgi apparatus, and the comments here have helped us further to strengthen the quality and clarity of the manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary This is a very interesting and potentially important paper for the field of membrane biology and membrane trafficking, in which the authors have studied the molecular mechanisms by which ACBD3 (and consequently PI4KIIIb) is recruited to the cis-Golgi membranes. The authors suggest that this recruitment is based on a two-step process, mediated by interactions to, on the one hand, SCFD-1 (SLY1) and, on the other hand, two redundant golgins (golgin-45 and giantin).

      We once again thank the reviewer for the positive comments and are glad that they consider our manuscript important.

      Comments:- Pg.1 : arfaptins, as far as I know, have not been shown to be involved in intra-golgi trafficking but rather in Golgi export (see e.g. ref. 12)

      We thank the reviewer for pointing this out. We have corrected the text accordingly.

      • Pg. 1: reigon --> region

      We thank the reviewer for noticing this typo. We have corrected the text accordingly.

      • Arf1 also recruits PI4KIIIb right?

      This is correct. The De Matteis lab has shown that PI4KIIIβ associates with the Golgi complex in an Arf1-dependent manner (Godi et al. 1999). We think this is excellent work. However, Arf1 is somewhat of a master regulator of the Golgi, affecting the recruitment and localisation of many different Golgi proteins. It has also previously been reported that Arf1 does not directly interact with PI4KIIIβ (Klima et al. 2016). Overall, the molecular relationship between Arf1 and the kinase remains unclear. We do not exclude, however, that there are factors other than ACBD3 important for recruiting and regulating PI4KIIIβ levels at the Golgi. We have changed the wording in the manuscript to reflect that there are multiple ways that PI4KIIIβ is recruited to the Golgi apparatus.

      Fig. S1: the information about the number of cells per experiment is missing. Also, please add the information about what exactly is represented in the box plots (is it the distribution of the mean value of R per experiment? or the total distribution on a cell-by-cell basis of a representative experiment?)

      For each experiment, a minimum of 100 cells per condition were imaged. The Pearson's correlation was then calculated, and the average was taken for each biological repeat. The plot in Fig. S1B represents 3 independent biological repeats. We have included this information in the revised manuscript.

      • The definition of Avg. Golgi int/avg. cell int. (a.u.) in Fig 1E,F is a bit difficult to understand to me. If I understand correctly, the total fl. int in the Golgi mask was computed and divided by the area of the Golgi mask (this is the av. Golgi intensity). A similar computation is done for the entire cell (including the Golgi), i.e., total fl. intensity in the cell mask is computed and divided by the area of the cell mask. Then the two av. intensities are divided (ratio = av. Golgi int / av. cell int.). This ratio, for a protein that is enriched in the Golgi area, should be larger than 1. For a protein that is equally distributed all over the cell, it should be 1, and for a protein that is excluded from the Golgi area, smaller than 1. Then to this value, the authors subtract the value of the ratio found for an inert construct (GFP of Halo alone), which I imagine should have an original ratio value of the order of 1, and hence, after this subtraction, norm. ratio values larger than 0 mean that they are more enriched at the Golgi area than GFP/HaloTag themselves. Is this correct? In principle, I don't see anything entirely wrong with this way of thought, but I just found it a bit difficult to understand, and in general one has to be careful when computing rations (quotients) and then subtract another ratio. Also, the units are not a.u., the value is dimensionless, what is "arbitrary" is the definition of 0 value and the based on this definition, also the actual value. I think it would probably be much clearer for the readers to compute somthing like the relative enrichment in the Golgi area as compared to the rest of the cell (excluding the Golgi area). That is, a value r'=(Int. Golgi mask / Area Golgi mask) / [(Int. Cell mask - Int. Golgi mask)/(Area cell mask - Area Golgi mask)]. This can be computed directly or defining a mask that is the cell mask - the Golgi mask. Also, some maths (unless I made a mistake) give that this r'= r (1-aG)/(1-r aG); where r is the ratio (before subtraction) defined by the authors, and aG=Area cell mask/Area Golgi mask. In any case, I'd suggest the authors to either adopt this other quantitation (without subtraction of the GFP/HAloTAG), which gives directly the fold-enrichment in the intensity density in the Golgi area with respect to the rest of the cell; or explain in more detail the maths of the value they are plotting now.

      We thank the reviewer for these well-reasoned and thoughtful suggestions for our imaging analysis. These are issues that we have also considered when quantifying this dataset. At the heart of it, the second method of calculation (Golgi/outside of Golgi), results in a non-linear distribution, as the pool of proteins re-distribute from inside the Golgi to the cytosol. This is why we have chosen to use the first method of Golgi/total, as it provides a linear distribution.

      The reviewer is also correct that the GFP (inert protein) ratio is 1 without adjustment. We have chosen to normalise to GFP/HaloTag (inert protein) as we think this is the clearest way of conveying our conclusions from these experiments. We have included the non-normalised graph here for the reviewer to see; however we thought that this conveys the key result less clearly. Overall, we agree this was poorly communicated in the manuscript and we have clarified it in the revised version.

      • Fig. 1C&F: Besides the MWT mutant, the FKE mutant also seems to have a somewhat compromised Golgi localization. Have the authors followed on that, or what is the reason that they have just focused on the MWT mutant?

      In contrast to the MWT mutant, the FKE mutant does not affect ACBD3 localisation significantly. In addition, when having a close look at the pdb structure of the GOLD domain of ACBD3 with 3A protein of Aichivirus A (5LZ3), the MWT patch, in particular residues M and T, make clear contact with protein 3A, which is not the case for FKE residues. Therefore we focused on the MWT residues, which we hypothesised to interact with a Golgi resident protein which competes with protein 3A to interact with ACBD3.

      • Very minor point, and without wanting to sound pedant at all, but I think (I might be wrong of course, so apologies if I am) that the plural of apparatus in latin is not apparati, but apparatus (fourth declination). So, I'd change the word in page 2 (or just rephrase the sentence: e.g. "resulting in Golgi fragmentation"). But of course, I'd leave this to the authors' discretion.

      We thank the reviewer for this precision, do not consider it pedantic, and have made the suggested change to the text.

      • Fig. 3A: have the authors tried or been able to perform IF of the endogenous SCFD1 protein?

      As suggested by the reviewer, we attempted to perform IF of endogenous SCFD1, as shown below. Despite trying several different antibodies, we were not satisfied that we were detecting real SCFD1 signal as there was no change in this staining upon SCFD1 CRISPR KO. Please see an example of this IF below (ProteinTech, 12569-1-AP). We have contacted the antibody manufacturers to inform them of this issue.

      • Similarly to what has been done for other panels, could you quantify Fig. 3C? Are PI4KIIIb protein levels affected upon the different KOs?

      As suggested by the reviewer, we are now showing in Figure S2D the percentage of cells with a partial or total loss of PI4KIIIβ at the Golgi in CRISPR-Cas9 KO cells of either PI4KIIIβ, ACBD3 or SCFD1. 3 independent biological repeats were performed and approximately 150 cells were quantified (~50 cells per condition). The results show that the PI4KIIIβ antibody used (BD Bioscience, 611816) is specific (93.22% of cells lose the antibody signal) and that ACBD3 and SCFD1 KO affects PI4KIIIβ recruitment to the Golgi in 88% and 73% of the cells, respectively._-

      The last paragraph of the "SCFD1 and ACBD3 interact upstream of PI4KIIIβ recruitment to the Golgi apparatus" section reads a bit odd placed there. I think it is more appropriate for the discussion or for the intro part on SCFD1.

      Many thanks to the reviewer for pointing this out. We simplified that paragraph to describe the relationship between SCFD1 and SEC22B.

      • I am confused on Fig. 5A/B. The labels in the blots show that 390-528 (without UR) does not bind sec22 or scfd1, but the 368-529 does? Or I guess, judging by the MW seen in the middle blots, that there's some error in the labelling?

      Many thanks to the reviewer for noticing this, which was clearly a labelling error. We corrected this accordingly in Figures 5A and B. We apologise for this oversight.

      also, the IP efficiency of the MWT mutant in the panel A blot is quite low, still sec22 seems to be very efficiently pulled down. Can the authors comment on that please? Would co-IPing against endogenous sec22 and scfd1 would work (so you don't need to rely on HaloTag+ligand?)

      We know that the MWT residues of ACBD3 are important for recruiting ACBD3 to the Golgi (Figure 1C and F). We also know that ACBD3 interacts with SEC22B and SCFD1 (Figure 3B and 4A) and that SCFD1 is important for ACBD3 Golgi recruitment. Therefore we initially speculated that ACBD3 interacts with SEC22B and SCFD1 through the MWT residues. However, as the reviewer points out, Figure 5 shows the opposite. Mutating MWT residues makes the interaction of ACBD3 with SEC22B and SCFD1 stronger. For this reason, we hypothesised that another player(s) also contributes to ACBD3 recruitment through interactions with the MWT residues. We have shown that the second recruitment factors are the 2 golgins, golgin-45 and giantin (Figure 6C). In short, whilst we agree that the IP efficiency is low, the binding is actually stronger, supporting our conclusions. No interaction of ACBD3 with endogenous SEC22B could be detected due to a lack of a sufficiently sensitive antibody (we tried Abcam ab181076 and ProteinTech 14776-1 AP).

      • I really like the experiment 6B. Have the authors tested whether SEC22 is also recruited to mitochondria in those conditions? But not SCFD1?

      We thank the reviewer for the positive comment. We have performed the suggested experiment and are now including this as an additional figure (Figure S3). Ectopic expression of golgin-45 targeted to the mitochondria is not sufficient to redistribute SCFD1-HaloTag or HaloTag-SEC22B to the mitochondria (Figure S3A and B, respectively). We, therefore, speculate that the fraction of ACBD3 that gets redirected in Figure 6B must be the small fraction of ACBD3 that is spontaneously in an open conformation and compatible for interaction with golgin-45.

      • The results shown in Fig 7 might show a partial depletion in the interactions, but to be fully trusted they would need to be quantified and a statistical test used to compare the values. I think this part is important to show very clearly, because even with low binding to golgins (remember, single knockouts do not prevent Golgi localization of ACBD3), one could expect that ACBD3 still localized to the Golgi but it does not in the absence of SCFD1 as shown in this paper. A prediction of the proposed model is that in cells depleted of the two Golgins, SCFD1 and ACBD3 should still bind to one another, right? Did the authors test this?

      We fully agree with the reviewer. As discussed in the replies to reviewer 1, we have repeated this experiment, including both sets of KO. This was not trivial, as a double transient KO is technically challenging and involves validation with qPCR and switching cell types (HEK cells to HeLa). The new data supports our current model and suggests some additional regulatory mechanisms at play.

      • The model presented here (fig 8) seems to suggest that only the conformational variation of ACBD3 that binds Golgins is able to recruit (bind) PI4KIIIb. Is this known, or is there any experimental evidence for that?

      HDX-MS experiments show that the ACBD and GOLD domains undergo conformational changes in the presence of 3A proteins (McPhail et al. 2017). Demonstrating this would require a complicated reconstitution experiment which is technically very challenging and would involve purifying various complex proteins, including SNAREs, SM proteins and golgins. This could perhaps be the subject of several future studies.

      • Have the authors thought about testing the FKE mutant in the experiemnts shown in Fig. 5?

      As mentioned above, since the FKE residues are not making any contact with the protein 3A and since the loss of ACBD3 recruitment to the Golgi is not statistically significant (Figure 1F), we haven't tested the FKE mutant for the binding to SEC22B and SCFD1. We do, however, agree with the reviewer that there might be something interesting happening here. We would like to experimentally interrogate this in future studies and develop more sensitive assays to test if there is a significant effect with the FKE mutant.

      In general, I think the title might be a bit misleading because of the use of PI4Kiiib. I understand what the authors mean, but because they have not thoroughly tested PI4Kiiib recruitment in their experiments, I think they should focuse rather on the mechanism of recruitment of ACBD3 the authors have found.

      We thank the reviewer for their advice regarding the manuscript title, and this is something that we have discussed internally. We chose that title as it highlights the key mechanistic impact of our findings and note that we did include a figure on the recruitment of PI4KIIIβ. However, we remain open to discussing this with advice from the journal editorial team.

      Reviewer #2 (Significance):

      I think, as said above, that this is potentially an important paper for the field of membrane trafficking and membrane biology. Most of the experiments are in general well performed and well controlled, and the paper is clearly written and follows a logical line.

      We once again thank the reviewer for their comments and overall thoughtful and considered review. We believe that the suggestions here have improved the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Stalder and colleagues report experiments designed to identify interactors of the Golgi-localized protein ACBD3 (a.k.a. GCP60), and to delineate mechanisms that allow ACBD3 to localize at Golgi compartments. ACBD3 is a 528aa protein with diverse previously reported interactions and functions, both in normal physiology and as a host factor in viral assembly processes. Stalder et al. first map which domains of ACBD3 are required for Golgi localization in HeLa cells, concluding that residues 368-528 are sufficient for localization. This region includes a GOLD (GOLgi Dynamics) domain previously reported to interact with Golgin tethering proteins. Alanine scanning identifies the motif MWT just upstream of the GOLD motif as necessary for Golgi localization. Acute CRISPR knockout identifies two Golgins, Golgin45 and Giantin, as necessary for ACBD3 Golgi localization, and IP indicates that the MWT motif breaks this interaction. These data are a bit scattered around the paper but taken together are reasonably persuasive, particularly when viewed in context with published work. This reader would have found the manuscript easier to follow had the Golgin and MWT motif data been presented en bloc.

      We thank the reviewer for these comments and have considered presenting and rewriting the data as the reviewer suggested. On reflection, we have decided to present it in the original order. We feel that this allows us to highlight the two independent mechanisms individually, bringing them together at the end. In addition, as the experiments were performed in the order presented, it allows for more appropriate controls for each experiment rather than trying to combine them. We hope the reviewer accepts our preferred order.

      In a second set of experiments, IP-mass spec is used to identify ACBD3 interactors that might assist in the protein's localization. The MS data presented are filtered to exclude proteins not already identified as Golgi-localized. This is, I think, a mistake. Even if the authors choose to focus on known Golgi interactors as candidates for a localization function, the biological functions of ACBD3 are far from fully understood, and the full dataset would be of value to both cell biologists and virologists.

      We agree with the reviewer that there are many interesting mysteries surrounding ACBD3 and have therefore included an additional table (table S1) in the revised manuscript, showing the dataset of newly identified ACBD3 interactors before applying the Golgi localisation filter.

      Hits in the filtered dataset include the R-SNARE Sec22B, and the SNARE chaperone Sly1/SCFD1. Acute CRISPR inactivation of Sec22 decreases ACBD3 localization to the Golgi and SCFD1 inactivation more or less abolishes localization. Co-IP experiments are used to argue that ACBD3 interacts with the N-terminal regulatory Longin domain of SEC22B, as well as with SCFD1. The Sec22 data are more detailed and persuasive. No experiments with purified proteins are presented to establish that the detected interactions are direct rather than mediated through a bridging factor or factors. Importantly, SCFD1 is likely to have multiple different client SNARE complexes that operate at different stages of ER and Golgi traffic. Hence its inactivation is likely to be pleiotropic and consequently phenotypes arising must be interpreted with caution.

      We completely agree that studying membrane trafficking in an interconnected system is challenging. We also agree that direct binding experiments in reconstituted systems would be key to proving our model. Our data uses multiple different experimental approaches, including co-localisation, co-immunoprecipitation, CRISPR-KO, and biochemistry, to support our model. In the future, we agree full reconstitution would be necessary to examine this further, and we hope that either ourselves or others can do this in further studies.

      Lastly, the authors perform IP experiments which show that ACBD3-Golgin co-IP efficiency is lower in cells with acute inactivation of SCFD1. This epistatic relationship is used to argue for a sequential model of recruitment with SCFD1 and perhaps client SNARE proteins operating upstream of ACBD3-Golgin interaction. This argument is not persuasive because we do not know whether SCFD1 and its downstream activities increase the rate of ACBD3-Golgin complex asssembly, or alternatively stabilizes ACBD3-Golgin complexes, decreasing the rate of their dissociation.

      We agree with this weakness in our original submission, and it is a comment shared among all reviewers. Overall, we feel that we have chosen the model that best summarises our data. We, of course, accept that there are still components of this pathway that need clarification and are open for further study. This includes the issue raised here by the reviewer, as well as the intriguing observation that both golgins are transcriptionally upregulated upon SCFD1 KO in HeLa cells. In the revised manuscript, we have more clearly laid out the weaknesses of our model in the discussion and suggested future experiments to help clarify some of these issues. We have also modified the model to reflect some of these potential additional regulatory mechanisms.

      In general the methods are fairly clear but that there is room for improvement. The "high throughput" imaging pipeline is not clearly described.

      We agree with the reviewer, and apologise for not clearly explaining this. We feel that this unbiased approach of quantification is particularly rigorous and we have clarified this in the methods section of the updated manuscript.

      Each figure legend should specify the microscopy methods used, and for each result the number of biological replicates and cells analyzed should be specified.

      We agree with the reviewer and have included these details appropriately in the revised manuscript.

      The statistical methods (Student, Tukey, etc.) used for each experiment should be specified. Saying that statistics were calculated using Python 3.7 is useless without additional details. e.g. at least the libraries and codebase used should be indicated or deposited.

      We agree with the reviewer and have updated the manuscript accordingly. In short, all comparisons were made using either Student's t-test or Multiple Comparison of Means - Tukey HSD, FWER=0.05. These were conducted in Python 3.9 using pandas, matplotlib, seaborn and scipy. We used the MultiComparison function in scipy, and the comp.tukeyhsd for the post-hoc adjustment.

      Many figure labels (e.g. Fig. 2) use absurdly small fonts.

      We apologise for this. We believe that this is because we submitted it with in-line formatting. Our resubmission has full-page figures, and we feel the text is clearer now.

      The mass spec hits obtained should be provided both with and without exclusion of non-Golgi-localized proteins.

      We agree with the reviewer. Please see the new Table S1.

      Reviewer #3 (Significance):

      In general I think this is a useful and well controlled set of experiments producing useful insights. However, the interpretations need to be more carefully considered, and alternative interpretations must laid out as clearly as possible. Specifying the limitations of the study will make it more, not less, useful to the field. If the authors want to make the case more robustly that the interactions described are mediated through direct binding, or that the operation of SCFD1 and Golgins operate sequentially to recruit ACBD3, additional wet bench work will be required which will of course take time to complete.

      We once again thank the reviewer for the thoughtful and critical comments. These have helped to strengthen the manuscript. We have performed the additional bench work requested by the reviewer, which has further supported the paper and our model.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank all reviewers for taking the time to evaluate our manuscript fairly and critically. Many helpful suggestions and discussion points were raised. One important group of comments raised concerns whether our proposed timer and counter models were the appropriate conceptual framework to discuss nuclear multiplication in schizogony, whether they were mutually exclusive, and whether other alternatives should be considered. These comments were instrumental for us to uncover some inconsistencies in our previous modeling approach. In the new manuscript, we now define the counter and timer models much more rigorously in the context of Plasmodium cell division. Based on these refined models we now provide a new statistical analysis that goes beyond the previous analysis, significantly improving the statistical support for our conclusions. Details are given in the following individual replies.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Malaria parasites replicating in human red blood cells show a striking diversity in the number of progeny per replication cycle. Variation in progeny number can be seen between different species of malaria parasites, between parasite isolates, even between different cells from the same isolate. To date, we have little understanding of what factors influence progeny number, or how mechanistically it is controlled. In this study, the authors try to define how the mechanism that determines progeny number works. They propose two mechanisms, a 'counter' where progeny number is determined by the measurement of some kind of parasite parameter, and a 'timer' where parasite lifecycle length would be proportional to progeny number. Using a combination of long-term live-cell microscopy and mathematical modelling, the authors find consistent support for a 'counter' mechanism. Support for this mechanism was found using both Plasmodium falciparum, the most prominent human malaria parasite, and P. knowlesi, a zoonotic malaria parasite. Of the parameters measured in this study, the only thing that seemed to predict progeny number was parasite size around the onset of mitosis. The authors also found that during their replication inside red blood cells, malaria parasites drastically increase their nuclear to cytoplasmic ratio, a cellular parameter remains consistent in the vast majority of cell-types studied to date.

      Major Comments

      It is stated a few times in this study that P. knowlesi has an ~24 hour lifecycle, and while this is the case for in vivo P. knowlesi, it was established in the study when P. knowlesi A1-H1 was adapted to human RBCs (Moon et al., 2013) that this significantly extended the lifecycle to ~27 hours, which should be made clear in the text. As much of this study revolves around lifecycle length and timing, the authors should consider some of their findings with the context that in vitro adaption can significantly alter lifecycle length.

      The reviewer raises an important point that we didn’t discuss for P. knowlesi. We now mention this directly in the introduction chapter (line 67) and in the discussion (lines 470ff). We are aware that P. knowlesi takes about 27 hours in the lab, which was also communicated by the Moon lab. We now cite relevant studies again in this context. We further address the issue of modified cell cycle time in vitro in the discussion in the sense that absolute values must be taken with caution and the focus of this study is about the relative ratio and correlation between the different cell cycle metrics.

      • The dichotomous distinction between 'timer' and 'counter' as mutually exclusive mechanisms seems to be a drastic oversimplification. Considering the drastic variation we see in merozoite number across species, between isolates, and between cells, it seems much more likely that there are factors controlled by both time-sensed and counter-sensed mechanisms that both influence progeny number.

      The study of progeny regulation in malaria parasites is very much in the early stages. We can agree that our models are simplifications, as is the case with all models. Our choice of just the two models timer and counter was driven by the number of cellular parameters we measure, i.e., duration of division phase and progeny number. These data essentially allow us to test the two competing models we presented. As we quantify more and more cellular parameters, based on the quantitative live cell imaging protocols established here, we will be able to test more complex cell cycle models. With our current data, we believe more complex models are not warranted.

      However, this valuable criticism, in conjunction with related remarks by other reviewers, made us reevaluate the constraints of our model more precisely. We noticed that the criteria used in the previous version in the manuscript contained unnecessary additional assumptions. Briefly, the previous counter model also required that final merozoite number was tightly controlled, while the previous timer model required the growth rate to be tightly controlled. These side assumptions were not made explicit in the manuscript and could bias the support towards one or the other model.

      We now improved the modeling approach substantially by removing implicit side assumptions, and clearly defining timer and counter models in terms of their correlations. The refined formulation of the timer posits that between individual parasites the target duration and the nuclear multiplication rate vary in a statistically independent way; while in a counter, target number and nuclear multiplication rate are statistically independent. We now explain this extended analysis in more detail in the introduction (lines 86ff). We also now more clearly state the dichotomous nature of the model (line 488). A new results paragraph (lines 213ff) and an entirely new Fig. 2 (and Fig. S4) contains the model predictions and statistical comparison between the models.

      This more rigorous treatment showed that including the variance of the multiplication rate was critical to allow a clean discrimination between the models. Also, with the sole exception of P.knowlesi H2B, where no model was clearly favored (Fig. 2G-H,K), the timer model was found to be inconsistent with the data, while the counter was clearly favored. Our new goodness-of-fit analysis also showed that although the counter is strongly simplified, it produced adequate fits, demonstrating that potential model refinements would need to be justified by new, more extensive data.

      It is also important to consider that the degree of variation in merozoite number could rather be an expression of varying growth conditions and does not directly predict which of the proposed models are true. For instance, a counter where the target merozoite number varies strongly depending on growth conditions, would be consistent with all available data. It is an interesting question for future work whether a counter would indeed describe growth across different isolates.

      The biological reality of growth regulation is certainly complex, and the counter model will likely need to be refined in the future, which we acknowledge in a corresponding statement in the discussion (lines 491ff). Nevertheless, we find it encouraging that a simple model can explain the vast majority of our data very well.

      Additionally, the only parasite parameter measured in this study, size at time of first nuclear division, explained only a small proportion of the variance observed in merozoite number.

      It is indeed the case that amongst the measured parasite parameters i.e. schizont stage duration, nuclear volume, and cell size we only found the latter to correlate with the final progeny number. We did not aim to imply that all variation in progeny number is explained by cell size. It is likely that a putative counter relies on a set of factors, which are somehow linked to cell size. In addition, intrinsic stochasticity in nuclear growth is likely to contribute to final merozoite number variability, which is included in our models via a variable growth rate. Defining the actual limiting factor or combination of factors will be an exciting challenge for the future studies building on this one.

      • For modelling of a timer-based mechanism, the designation of t0 is subjective. The authors chose the time of first nuclear division as their t0. It is possible that a timer-based mechanism could not be supported based on this model the chosen t0 differs from when the "parasite's timer" starts. For example, t could also have been designated as the time from merozoite invasion (t0) to egress (tend). It would be unreasonable to suggest the authors repeat experiments with a longer time-frame to address this, but this possibility should be discussed as a limitation of the model. It may also be possible to develop a different model where t0 = merozoite invasion and tend = egress, and test this model against the data already collected in this study.

      This is a valid point. We indeed, considered the time point of invasion as the other relevant time point in the IDC for a possible timer. Due to necessary compromises in imaging protocols between acquisition length, temporal, and spatial resolution we have not been able yet to combine full-length IDC measurements with quantification of progeny number. Given the choice, however, between time point of invasion and the onset of nuclear division as starting point for a potential timer we would still favor the latter: An argument can be made that a timer that regulates offspring number would be more accurate when activated at the moment of the relevant cellular events rather than “running” for a very prolonged growth phase before any “decision” concerning parasite replication. We are still convinced that the entry into the schizont stage, which we analyze here, marks an important cell cycle transition point that has been highlighted in many different studies. As suggested, we now discuss the limitations of our selection of t0 in the text (lines 146ff).

      • The calculation of the multiplication rate is confusingly defined. In Figure 1 it is stated that it is "...based on t and n", which would imply that the multiplication rate is the number of merozoites formed per hour of schizogony, which would give an average value of ~2 for P. falciparum and ~1.5 for P. knowlesi. The averages rate values shown, however, are in the range of 0.15-3. The authors should clarify how these values were determined.

      Thank you for pointing out the need for more clarity. Since the nuclear multiplication, similar to e.g. cell population growth, follows an exponential law, the multiplication rate used (lambda) is in fact a logarithmic growth rate. Therefore, it occurs in the exponent (not as a coefficient) in the exponential growth function ( ), which explains the range. We now mention this more explicitly in the results (lines 163ff).

      • In Figure 2, the time from tend until egress is calculated, and this is interpreted as the time required for segmentation. In the Rudlaff et al., 2020 study cited in this paper, it is shown that segmentation starts before the final round of nuclear divisions are complete. Considering this, the time from tend until egress is not an appropriate proxy for segmentation time. The authors should consider rewording to something akin to "time from final nuclear division until egress" to more accurately reflect these data.

      Thank you for indicating our imprecise use of the nomenclature. Indeed, some essential segmentation-associated structures such as rhoptries and subpellicular microtubules are clearly forming before the last division. We were referring to “segmentation” as the time window where actual ingression of the plasma membrane occurs between nuclei with the concurrent formation of more prominent IMC-associated sub-pellicular microtubules between nuclei (as in Fig. 1A last panel). We can, however, agree that consistently using the term “merozoite formation” is more adequate here. We have now corrected the terminology according to the suggestions of the reviewer (lines 271ff).

      • There is a significant discrepancy between the data in Figure 5 and Supplementary Figure 8. In Supplementary Figure 8, the authors establish that culturing parasites in media diluted 0.5x has a marginal effect on parasite growth, with no discernible change in parasitaemia over 96 hours. By contrast, in Figure 5a the parasitaemia of parasites cultured in 0.5x diluted media is approximately 5-fold lower than those in 1x media. The authors should explain the significant discrepancy between these results.

      The reviewer correctly points out a difference in parasitaemia between two parasite culture experiments, shown in Figs 5a (now 6A) and S8 (now S11), respectively. There were several differences in the experimental setup used in the two experiments that could explain this discrepancy. In Fig. 5a the parasites were synchronized to early ring stages while in Fig. S8 we used asynchronous cultures (maybe with a slight majority of late stages). One could speculate that by the time the synchronized ring stage culture reached egress the effect of nutrient depletion, which started at t = 0 h is more pronounced. This effect could have been exacerbated by the more frequent media change of 24 h in Fig. 5a vs 48h in Fig. S8. Lastly, the starting parasitemia was differently set being higher at around 0.5% in the Fig. 5a while only 0.2% in Fig. S8. Possibly a lack of nutrient is “felt less” by the culture at lower parasitemias. Generally, in Fig. S8 we were more focused on highlighting the difference between 1x/0.5x and the more diluted conditions on the long-term culture and to show that continuous culture is actually possible in 0.5x medium. We have now expanded the legends to highlight those differences more clearly.

      • In Supplementary Figure 4, the mask on the cell at t0 shows two distinct objects, but it seems very unlikely that they are two distinct nuclei as they vary approximately 5-fold in diameter. The authors should provide more detail on how their masking was performed for their volumetric analysis. Specifically, whether size thresholds were also applied during object detection.

      Thank you for requesting clarification here. Fig S4 (now S7) shows only one z-slice (not a projection) of the entire image stack, to illustrate how the thresholding approach was performed on every single image slice. The two objects in the shown cell are indeed two nuclei, but because they are not in the same z-plane appear to be of different size. In particular, only a slice of the upper part of the nucleus on the lower right is visible in the shown slice. Throughout the study, volume determination was realized by adding up the individual slices, as is explained in detail in the Materials and Methods sections. We have now added a more explanation in the figure legend to clarify the procedure.

      Minor Comments

      • Line 45-48 mentions that merozoite number influences growth rate and virulence, but the corresponding reference (Mancio-Silva et al., 2013) only discusses the relationship between merozoite number and growth rate, not virulence.

      We thank the reviewer for requesting this distinction. Merozoite number and virulence have not been correlated in vivo so far. Certainly, because one can’t retrieve late-stage P. falciparum parasites from patients, but maybe partly because merozoite number has not gotten significant attention as a metric in the previous decades. Even if merozoite number is intuitively connected to growth rate which might causes higher parasitemia which is in turn linked to more severe disease outcome it is important to emphasize that those are certainly not equivalent. We have therefore removed the statement about virulence (line 48).

      • Line 59 states that a 48 hour lifecycle is a baseline from which in vitro cultured parasites deviate. Clinical isolates also show variation in lifecycle length and so it is more accurate to just say that 48 hours is an average, rather than a baseline.

      The word “baseline” has been changed to “average” (line 61).

      • Line 63 cites a study for the lifecycle length of P. knowlesi (Lee et al., 2022), but there seems to be no mention of lifecycle length in this reference

      This reference was meant to serve as an introductory review article to research in P. knowlesi. Actually, to the knowledge of the authors, there is no study presenting quantitative data showing that the in vitro cycle of P. knowlesi is actually around 27 h. Our lab experience is however coherent with a 27 h cycle, which was confirmed by personal communication by the Moon lab. We now also cite in the next sentence the inaugural P. knowlesi adaptation publication (Moon et al. 2013) showing some time course data indicating the duration of the IDC to be around ~27h (lines 67ff).

      • If I am interpreting Figure 3B correctly, this is essentially a paired analysis where the same erythrocytes are measured twice, once at t0 and once at tend. If this is the case, this data may be better represented with lines that connect the t0 and tend values.

      Yes, these are the same erythrocytes measured twice. We have modified Figure 3 (now Fig. 4) accordingly.

      • Figure 3A seems to imply that to calculate diameter of the erythrocytes, three measurements were made and averaged for each cell. I think this is a nice way to get a more accurate erythrocyte diameter, but if this is the case, it should be specified in the figure legend or methods.

      This is already described in the figure legend (line 305).

      • In Figure 4I it is shown that in P. falciparum merozoite number doesn't correlate with nucleus size, but for P. knowlesi in Supplementary Figure 7c, a significant anticorrelation is observed. The authors should state this in the text and discuss this discrepancy.

      Contrary to all other graphs, visual inspection of the distribution of data points in Fig. S10C shows that it contains two outlier data points at the bottom right. Those two specific points are also responsible for the significant anticorrelation. We did not filter or remove any quantification results but also didn’t have sufficient confidence in this data distribution (which is further based on the segmentation of the Histone2B not on an NLS mCherry signal) to make substantial claims about anticorrelation. Because we considered it informative we still decided to show it in the supplements. We now briefly mention the issues with the data set and its interpretation in the text (lines 350ff).

      • The authors show that merozoite number roughly correlates with cell size at t0 but it would be interesting to see whether cell size at tend also corresponds with cell size at t0. This might help answer whether the cell is larger because it has more merozoites, or whether it has more merozoites because it is larger.

      Plotting parasite cell volume at t0 against cell volume at tend (as well as between t-2 and tend) indeed shows a positive correlation (see below). While it is an interesting thought we concluded after some discussion that no convincing causal relationship between cell size and merozoite number can be inferred based on this analysis. Since we consider the possible statement that cells that are bigger in the beginning are also bigger in the end unavailing, we decided not to include the data.

      • I don't feel that "nearly identical" is an appropriate summary of erythrocyte indices in Supplementary Figure 9, considering there is a statistically significant increase in mean cell volume. I think it is unlikely that this change is consequential, and performing these haematology analyses is a nice quality control step, but this change should be stated in the text.

      In the modified text we now express the significant change in MCV in terms of percentage, which is around 1.2% (line 381).

      • In Supplementary Figure 8, parasitaemia only increases ~2-fold compared to >5-fold the previous two cycles. It seems likely that at the final timepoint on this graph the parasites are starting to crash, and therefore it may be best to end the graph with the 96 hour timepoint.

      The reviewer suggests that cultures at those parasitemias might not be in perfect health. Our Giemsa stains did not show signs of an unhealthy culture and kept growing. It was, however, important for us to show that cultures can be maintained in culture over a prolonged period of time in 0.5x medium, even when resulting in reduced growth, while this was not possible with lower dilutions. Therefore, we would like to keep the data point. We have added a cautionary comment in the legend.

      • The error bars in Figure 5C aren't easily visible, moving them in front of the datapoints may help their visibility.

      Error bars were moved in front of the data points.

      • In Figure 6D & E, the y-axis labels should be changed to whole integers as all the values in the graph are whole numbers.

      We have changed the y-axis labels accordingly.

      • My interpretation of Figure 6 C-E, is that these are the same cells measured at three time points (t-2, t0 and tend). If this is the case, 6C is missing the cell that has a merozoite number of 8, which is presumably why the y-axes are not equalised for the three graphs.

      It is correct that the same cells are displayed in all three plots, with the exceptions of three cells in 6C (for the timepoint t-2), which are missing for the following reasons: 1) it was not possible to determine the volume at this respective timepoint due to technical issues or 2) the cell was already just before t0 at the start of the movie so that t-2 had already passed. We now note this in the figure legend and have also equalized the y-axes (now Fig. 7C-E).

      Reviewer #1 (Significance):

      In the asexual blood-stage of their lifecycle, malaria parasites replicate through a process called schizogony. During schizogony an initially mononucleated parasite undergoes multiple asynchronous rounds of mitosis followed by nuclear division without cytokinesis, producing a variable number of daughter nuclei. Parasites then undergo a specialised cytokinesis, termed segmentation to where nuclei are packaged into merozoites that go on to invade new host cells. While nucleus, and therefore merozoite, number are known to be varied between cells, across isolates, and across species, little is known about the mechanisms regulating merozoite number. In this study, the authors use live-cell microscopy to understand how parasites determine their progeny number. They suggest that parasites regulate their progeny number using a 'counter' mechanism, which would respond to the size or concentration of a cellular parameter, as opposed to a 'timer' mechanism. Long-term live-cell microscopy experiments using malaria parasites are extremely technically challenging, and the authors should be commended for their efforts in this regard. While I agree that the data generated from these experiments are technically sound, I have some reservations expressed above about the interpretation of some of these results. I would strongly encourage the authors to consider rewording some of their interpretations taking into account some of the caveats listed above. I would also consider fitting/testing an additional mathematical model where the time-frame proposed for the 'timer' mechanism begins following merozoite invasion.

      We thank the reviewer for the appreciation of our work and hope we have sufficiently reworked the manuscript based on the comments listed above. Furthermore, we think the improved model statement and analysis improves the clarity of our conclusions. Indeed, we would like to test additional models including the full IDC once, as mentioned above, we are technically able to generate these data.

      This work is of specific interest to anybody who grows malaria parasites, as the dynamics of their growth is obviously important to understand. Further, this work is of interest more generally to cell biologists who study the regulation of progeny number or cell size. I have no experience with the application of mathematical modelling to understand biological systems, and so I cannot comment on the interest of this work to that field.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This is a solid study that further characterises the dynamics of nuclear division in Plasmodium falciparum and P. knowlesi. Of two, among potentially several, models for how the number of daughter nuclei, and thus parasites - (called merozoites in this genus), are one that posits nuclei divide until a fixed timer ends, and one that posits that nuclei divide to reach a fixed number that is defined by a cellular counter. I find some practical difficulties in definitive measurement of either model, one issue with the former is that experimental definition of the start of the timer is problematic - we may define the starter's gun (eg by the first nuclear division) but it isn't necessary that the cell is using that same start time.

      We are pleased that the Reviewer found our study ‘solid’. Concerning the timer model, we agree that the selection of the starting point is a critical aspect of this study, as also Reviewer 1 pointed out. We selected this particular “t0” because the entry into the mitotic phase marks an important cell cycle transition. Several studies have suggested a “schizogony entry checkpoint” might be active just before (Matthews et al, 2018; Voß et al, 2023; van Biljon et al, 2018; McLean & Jacobs-Lorena, 2020). Once cells are committed to the schizont stage they are less responsive to stimuli. Alternatively, the timepoint of erythrocyte invasion could be a legitimate starting point. Due to necessary compromises in our imaging protocol between acquisition length, temporal, and spatial resolution we have not been able yet to combine full-length IDC measurements with quantification of progeny number, and therefore we leave exploration of an earlier timer start for future work. Within the confines of the model comparison in the current study, we think the selected t0 is already highly informative. We now explain the selection and limitations more explicitly in the text (line 144ff).

      Additionally, as the authors confirm here, being sure when that first nuclear division has occurred is particularly tricky with Plasmodium parasites, in part because the first few nuclei seem to clump together, preventing one from unambiguously calibrating the first division.

      The Reviewer is concerned about difficulties with precise reporting of the time point of first nuclear division. We suspect there was a misunderstanding here. In the text (line 137) we had written the following:

      “Although separating individual nuclei after the first two rounds of division was challenging due to their spatial proximity, the improvements in resolution and 3D image analysis allowed us to count the final number of nuclei routinely and reliably at the transition into the segmenter stage.”

      To clarify, when analyzing 3D image stacks produced by the LSM900 Airyscan the first nuclear division can consistently and unambiguously be detected. In anaphase the nuclei are pushed apart quite substantially before getting a bit closer together afterwards (see e.g. Fig. 1B and C). Hence the precision of the detection is only limited by the 30 min interval of the time lapse. Later, at the four nuclei stage, crowding makes distinction more difficult. In the final segmenter stage, the reorganization and condensation of nuclei makes reliable counting possible again. We have now reformulated the quoted sentence for more clarity (lines 137ff).

      Furthermore, getting decent replicate numbers is hard because of the difficulties of time lapse microscopy, and most Plasmodium studies (including this one) suffer from low enough numbers that it isn't always clear whether the numbers support one model over another.

      The reviewer points out the difficulty of obtaining enough replicates in Plasmodium time-lapse studies. We agree that depending on technology, sufficient replicates can be challenging. In the present study we obtained Ns between 25 and 35 for all conditions in P. falciparum and P. knowlesi from three independent replicas. To gain confidence in the conclusions from a limited, but not austere, data, it is essential to 1) reduce model complexity to a minimum and 2) perform stringent statistical analysis including accounting for small-sample variation. Motivated by this concern of the Reviewer and a similar point raised by Reviewer 1, we have revisited our modeling approach in the revised manuscript. This led us to a corrected, more rigorous definition of what precisely we mean by ‘counter’ and ‘timer’ models: The timer posits that between individual parasites the target duration and the nuclear multiplication rate and vary in a statistically independent way, while in a counter target number and nuclear multiplication rate are statistically independent. With no further adjustable parameters, the two models are thus both mutually exclusive and minimal. Although biological reality is likely to be more complex, we feel that these minimal models are adequate for the amount and resolution of our current, state-of-the art data. The general result remained the same: The counter model is strongly preferred in almost all our experiments data (new Fig. 2), with the sole exception of P. knowlesi H2B, where indeed more data may be needed to come to a clear conclusion. Furthermore, we have taken care to scrutinize these conclusions accounting for goodness-of-fit for the respective sample size N. This analysis showed, surprisingly, that the counter model was sufficient to account for the data: the real dataset was as similar to the counter prediction as synthetic, counter-generated data. We hope that this improved statistical analysis can help the reader judge the robustness of our conclusions.

      Nonetheless, several recent studies, particularly a study from the same institute (Klaus et al., 2022) employing timelapse imaging of nuclei, and timing the nuclear division of parasites, finds poor correlation between the duration of "schizogeny" (although perhaps using a different definition to the one used by the parasite) and the final number or merozoites. They therefore argue that there is poor evidence for a timer, and conclude by elimination that a counter must exist instead. A review by some of the authors of that study and some of this current study (Voß et al 2023), also concludes that the data from Klaus and colleagues "strongly support" a counter model. This current study also concludes that a counter model controls final nuclear/merozoite number in P. falciparum and P. knowlesi. This much at least is not particularly novel given the recent work on this topic, although the addition of the P. knowlesi data is interesting and consistent with the prior P. falciparum work.

      Our present work, indeed, does confirm the previous report of a counter over a timer, through a more targeted approach. While Klaus et al. used timing data of first nuclear cycle vs. the full duration, we now provide, thanks to an improvement microscopy setup and protocol, simultaneous measurements of timing and final progeny number, i.e. counting of merozoites/nuclei. While the preference for a counter model is not fundamentally novel, the additional information that the counter model holds in different strains, conditions and species is, in our opinion, not trivial and points to some degree of evolutionary conservation. We also demonstrate here that the counter model is not only preferred over the timer, it also fits the data adequately, so that it can be considered ‘correct’ at this level of complexity. Another, possibly more important, value of this study lies in the quantitative and time-resolved assessment of multiple important parasite metrics such a cell volume and nuclear volume together with merozoite number at the single cell level. Although descriptive, this has not been achieved in Plasmodium until now.

      As above, the authors concede that it is difficult to determine with strong confidence when the first nuclear division has occurred, so it may well be that there is substantial noisiness in the time that they define schizogeny to commence. If that were the case, this would contribute to the poor correlation observed between schizogeny duration and number of merozoites produced, so this could be an important confounding experimental factor. This deserves some more discussion by the authors.

      Concerning the confidence with which we identify the first nuclear division we could hopefully clarify in the section above that our precision is only limited by the time resolution of the acquired time-lapse. Therefore, the uncertainty about the start time is not particularly high, and moreover, can expected to affect timer and counter (via the growth rate) to a similar degree. We see no unfair advantage for the counter for this reason.

      Alternative methods to count absolute DNA content (rather than trying to count individual nuclei) might be useful ways of independently confirming this phenomenon. Alternative possibilities for what constitutes the "start" of a possible timer are also warranted - it could be for example, the first division of one of the other organelles.

      This is an interesting suggestion. Next generation fluorogenic DNA dyes have been used by us and the Ganter group (Simon et al. 2021, Klaus et al. 2022, Wenz et l. 2023) to assess DNA content of single cells over time. Our experience shows that there are some caveats to using these Hoechst based dyes, some of which we discussed in the aforementioned publications. While they allow some reasonable absolute quantification of DNA content for the very first S-Phase (and subsequent nuclear division), in later stages only relative quantification can be achieved. One underlying reason is the apparent increase of dye permeability, and therefore higher intensity, at late schizont stages. This issue is exacerbated by the asynchronous DNA replication of multiple nuclei. Further, nuclear division itself can be delayed or even inhibited when increasing the concentration of the dye, which suggest an impact on cell physiology (well documented for Hoechst based dyes in other organisms). When reaching the segmenter stage, the resulting variance in fluorescent intensity would make it challenging to assign a reliable number of nuclei required for analysis, a problem that does not occur when counting individual nuclei. Taken together, unfortunately, all these confounding factors make DNA content analysis in live single cells for the entire schizont stage unachievable at this point.

      These and previous authors in any case conclude that a counter model must exist through exclusion of a timer model. I am less convinced that the evidence discounting the timer is conclusive, and that a straight counter model is the only alternative. Indeed I am unconvinced by the suitability of this strictly dichotomous two-model system to categorise the division of unicellular eukaryotes, and these theories are not universally held to be sufficient to describe division.

      We thank the Reviewer for this insightful comment. As already detailed above, we have clarified and corrected our model definitions in the revised manuscript. Further, we want to make the important distinction between organisms, including unicellular ones that undergo binary fission and the ones like Plasmodium that use schizogony. Our model, although inspired by model organisms, is tailored to a multinucleated division mechanism, and clearly defined within those boundaries. The timer and counter models we consider are defined by their correlation structures. They are at two extremes of a continuum of models which could be characterized, for instance, by the ratio of correlations (growth rate - nuclear number) vs. (growth rate – duration) as an additional parameter. As the reviewer points out, excluding the timer model is not equivalent to proving the counter model, and indeed a partially correlated model, or a more complex model entirely, could yield a better fit. However, within the realm of models without additional parameters, and which are testable with the available data, only timer and counter remain, as different timer start points are not experimentally accessible. Importantly and somewhat surprisingly, the counter model also gave a fit that is as good as can be reasonably expected for the experimental sample size (new Fig. 2). So, we maintain that within the current experimental constraints, the counter model is the only viable option for almost all our tested conditions. The observation that in H2B-GFP expressing P. knowlesi parasites no clear distinction can be made between the models, indeed, suggest that the reality of multiplication rate regulation is more complex and may be limited by different constraints in different growth regimes. We now state these limitations and the room for further model adjustments with more data in the Discussion section.

      Nonetheless, if a counter exists, what is being counted that determines the final number? The authors consider that this might be a physical object or resource inside the parasite, or an extrinsic/extracellular resource. They investigate this by comparing the final cell number to a number of factors. First, the authors investigate the size of the RBC (by musing the diameter as an indicator)- little information is given about the source of the blood used, but it appears to be from a single donor of unknown age, who has approximately typical variance in RBC diameter (at least, after manipulation and storage). The authors observe little correlation between these variables.

      We share the curiosity of the reviewer about what might be “counted” by the parasite. This shall be the subject of future studies, and our present study provides the necessary basis for asking this question and defines a framework to investigate it. Concerning the size of the host cell, the blood used was from a different donor for each of the replicas, which we now specify in the figure legend (line 302). No significant difference between the RBC diameters between the donors was observed. A correlation between RBC diameter and progeny number was indeed not observed.

      Second the authors measure parasite size at the onset of schizogeny, and find that bigger parasites result in more daughter merozoites early in schizogeny (perhaps not surprising, given the earlier mentioned technical problems with measuring the first few steps of schizogeny), but that this different initial cell size doesn't result in a different final merozoite number, or as they describe it "not quite significant anymore". Previous p values were taken as cause for rejecting the timer hypothesis and the timer model. In this case the authors instead interpret the data as suggesting "that the setting of the counter might correlate with parasite cell size". This is inconsistent statistical and analytical handling, and highlights the earlier potential pitfall of rejecting timer-based models based on not gathering data that statistically show a correlation. This needs reworking to highlight that these data are inherently noisy, difficult to measure accurately, and aren't necessarily going strongly reveal a trend even where one biologically exists, and that this ought not be used as grounds for confident rejection of a model.

      The Reviewer raises concerns about the consistency of the statistical interpretation of our data. We care deeply about the well-foundedness of our conclusions and hope to eliminate these concerns in the following. First, we hope that the issue about the “technical problems” in measuring the first division has been solved in our response to previous comments. Next, to clarify an apparent misunderstanding: As stated in the text (lines 329ff) and shown in now Fig. 5D-E, cell size at onset of nuclear division or 2 hours prior does significantly correlate with final merozoite number. The lack of significant p-value (0.08) only pertains to the correlation of cell size at the end of the schizont stage (tend) with merozoite number (now Fig. 5F). We have removed the unfortunate wording “not quite significant anymore” in that context. Finally, regarding potential mechanisms, a potential counter must be set before the first nuclear division is completed because only that way it can be set independent of the speed of nuclear multiplication. This observation gives the statistically significant correlation of volume at the onset of division and progeny number its relevance. We have reformulated the marked sentence for more clarity (lines 331ff). Furthermore, we point out that our rejection of the timer is now based on a revisited statistical analysis (Fig. 2), which is no longer based on a simple correlation between final number and duration, as detailed above.

      Finally, the authors grow the parasites in dilute media, and find that they produce fewer daughter parasites. This is anecdotally unsurprising, as most Plasmodium laboratories are aware that sub-optimal growth conditions result in less healthy schizonts with fewer viable merozoites (and lower magnitudes of single-cycle expansion), but is nonetheless an important result that highlights explicitly how much this occurs in the specific conditions of dilute media. Given the lack of investigation of exactly which nutrient, carbon source, or combination thereof leads to the reduced merozoite number, it is unclear if or how much this is relevant to the scenario of a natural infection and realistic levels of that nutrient in a human or primate parasite environment.

      As rightfully pointed out by the reviewer suboptimal growth conditions affecting parasite growth and multiplication rate have been shown in many instances. The number of studies that actually quantify a reduction in merozoite number under different growth conditions is certainly much lower (Brancucci et al. 2017 (lipids), Mancio-Silva et al. 2017 (calorie-restriction in mice), Tinto-Font et al. 2022 (temperature) come to mind). What our study adds to this body of literature is to which extent duration of the schizont stage and cell volume are affected in relation to progeny number at the single cell level. Importantly, we wanted to test whether the counter model still holds under these more adverse conditions, which we found to be the case. Along the lines of the work on calorie restriction and the likely implication of isoleucine in the process investigated in the laboratory of Maria Mota, it will be exciting to identify a “limiting factor” in future studies. Indeed, any study done in complete RPMI culture medium can be questioned regarding its physiological relevance and we added a sentence addressing this aspect in the discussion (lines 514ff). Yet, our medium dilution experiments suggest that at least to some degree an extracellular resource is implicated, which makes sense from a biological function point-of-view.

      Minor issues

      The manuscript confuses the terms "less" and "fewer". Fewer should be used for countable nouns (fewer daughter cells, fewer nuclei, fewer merozoites), less for uncountable nouns (e.g. less speed, less volume).

      Thank you for pointing this out. The words have been replaced accordingly.

      I didn't understand lines 93-95; "This excluded a timer and thereby confirmed a counter as the mechanism regulating termination of nuclear multiplication (Klaus et al., 2022). A direct correlation between duration of schizont stage and merozoite number is, however, still missing." If I understand the first sentence concludes that there ought not be a direct correlation between schizont duration and merozoite number, but the second sentence, says that that correlation is "however" missing. Isn't this expected? Perhaps reword for clarity?

      Thank you for requesting clarification here. The exclusion of the timer by Klaus et al. 2022 was based on the correlation between duration of the first nuclear division cycle and the total duration of all nuclear replication phases. At no point did Klaus et al. count merozoites in live single cells, which was mainly due to lower spatial resolution of their images (M. Ganter, personal communication). Therefore, they could not directly assess the relation between progeny number and schizont stage duration, which we now report for the first time. The sentence was supposed to convey that this type of data was missing and was now reformulated for more clarity (line 114).

      Lines 104

      "We further uncover that throughout schizogony P. falciparum infringes on the otherwise ubiquitously constant N/C-ratio (Cantwell and Nurse, 2019)" This seems obvious to me, and not something uncovered by this study. In most of the numerous apicomplexans that divide by endoschizogeny, the cells achieve a near final size considerably before the final rounds of nuclear division so the N/C ratio must not remain constant - this is a direct corollary of many previous descriptions and not a novel finding of this study, and this claim here should be made more modest.

      We understand the point raised by the reviewer but still think that our claim is justified due to several aspects. There are examples of eukaryotic cells that undergo multinucleated stages during division were the N/C-ratio is constant (Dundon et al. 2016, Cantwell and Nurse, 2019), while we are not aware of any counter-example in the literature. Studies have also shown that e.g. certain mutant yeast that fail to undergo cytokinesis will increase their volume by factor of up to 16 alongside the still replicating and growing nucleus maintain the N/C-ratio (Neumann et al. 2007, Jorgensen et al. 2007). This demonstrates the tremendous plasticity that cells can reveal with respect to nucleus and cell size regulation. Until the contrary was shown, it was conceivable that nuclear compaction, which does occur (Fig. 5H), compensates for the increase in nuclear number while the cell volume is only increasing slightly. Importantly, we are not aware of any literature where nuclear volume has been quantified for blood stage Plasmodium. Cell volume quantifications remain limited to modelling and the study by Waldecker et al., which provides a few datapoints throughout the IDC. Whether this finding is expected or not, formally speaking, our claim is justified, but for more clarity we replace “uncover” with “demonstrate”. We also introduce the N/C-ratio as cellular parameter in P. falciparum pointing out another divergent aspect of its biology and might in the future understand the functional implication of this usually constant ratio, which is still unclear.

      Dundon SE, Chang SS, Kumar A, Occhipinti P, Shroff H, Roper M, Gladfelter AS. Clustered nuclei maintain autonomy and nucleocytoplasmic ratio control in a syncytium. Mol Biol Cell. 2016 Jul 1;27(13):2000-7.

      Neumann FR, and Nurse P. Nuclear size control in fission yeast. J. Cell Biol. 2007; 179: 593–600. pmid:17998401

      Jorgensen P, Edgington NP, Schneider BL, Rupeš I, Tyers M & Futcher B Molecular Biology of the Cell 18 (2007) The size of the nucleus increases as yeast cells grow.

      Helena Cantwell, Paul Nurse; A homeostatic mechanism rapidly corrects aberrant nucleocytoplasmic ratios maintaining nuclear size in fission yeast. J Cell Sci; 132 (22)

      I lack specialist statistical knowledge to comment on the statistical analyses performed on the correlation data, and in particular, whether the high p values for t-Tests for correlation are sufficient to support the argument that there is not a correlation, and whether these observations are sufficiently powered to robustly test that hypothesis.

      We are confident that our reworked model analysis, as explained above, now sufficiently supports our hypotheses.

      Reviewer #2 (Significance):

      The manuscript purports to find a counting mechanism that determines parasite merozoite numbers, and that this coutner is set by an externally provided and diffusible resource. Many nutrients are in excess in normal culture media, but not all. If that counted nutrient(s) were normally in excess in the bloodstream, it could hardly be said to be the factor that is counted and that therefore defines merozoite number. Conversely, if the amount of that nutrient were increased in normal media, would parasites make even more merozoites? Further, if the "counted" item is a freely diffusible compound in the media, it should be equally accessible to each parasite in a culture condition, and isn't a reasonable explanation for the variable merozoite numbers in the normal media conditions. To me, it is unsurprising that parasites that are healthy and well fed are able to produce more merozoites, but I don't see this as being the same as support for a counter model where the parasite senses and counts a set number of merozoites to produce in response to a specific external counter. I think the shoehorning of this phenomenon into a paradigm used to describe some other eukaryotes may not be appropriate, and that the rejection of one overly simplistic timer model should not automatically lead to us dichotomously accepting a simple counter method as the alternative. The authors need to do more to either identify a countable input whose gradual increase leads to a predictable and gradual increase in merozoite number, to show that they do use a counter, or provide substantially more caveats to their argument that the parasites are using a counter based on an externally provided resource to determine merozoite number.

      The reviewer comments on the feasibility of a counter mechanism based on an externally provided and diffusible resource. In fact this is a limited view of how a counter may arise and not the one we subscribe to. Rather, while a resource may be diffusible in the medium, it would need to be consumed during schizogony, and insufficiently replenished, in order to enable counting by dilution in the host cell. Furthermore, the reviewer has doubts that the fact that “healthy and well fed […] produce more merozoites” implies “support for a counter model”. We fully agree, and we argue in the manuscript that it is the correlations between schizogony durations and merozoite counts that support a counter model.

      As we have argued above, the two alternative models we consider are inspired by paradigm from other eukaryotes, but their definitions in the present context are simple enough for them to be considered natural minimal models of schizogony. As the simplest imaginable phenomenological models of multiplication control, we find it natural to compare them, and we hope our new introductory section introduces them appropriately now. Naturally, we hope to expand on this simple model in the future and identify more precisely the limiting resources and describe a more direct response.

      Audience - relatively specialised - likely interested audience would combine apicomplexan cell biologists, as well as theorists of cell division mechanism

      Advance - limited - confirms phenomenon also described by other researchers in their institute, and extends to another related organism.

      We would like to add that the present data are the first quantitative joint measurements of schizogony dynamics and outcome in P.falciparum and knowlesi. They allowed for the first time a direct correlation of duration and merozoite number, thereby accessing the question of growth control head on. Further they provide a quantitative reference of several key cellular parameters for anybody studying asexual blood stage parasites.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Stürmer and colleagues used super-resolution time-lapse microscopy to probe the mechanism regulating the number of merozoites produced by a single cell in Plasmodium falciparum and P. knowlesi. The authors conclude the followings-

      1. P. knowlesi has similar duration of schizont stage to P. falciparum, although having a 24 h intraerythrocytic developmental cycle (IDC) to 48 h of P. falciparum.
      2. Nuclear multiplication dynamics suggests a counter mechanism of division- which is further suggested by a significant relation of merozoite numbers with schizont size at the onset of division.
      3. Nutritional deprivation caused increase in nuclear volume and decrease in merozoite number. For the most part, the experiments that are presented in this manuscript support the conclusion of the authors. The data are presented in a concise and clear manner. However, some clarification and a couple of experiment (listed below) would improve this manuscript.

      Major comments:

      1. The authors generated at least 3 transgenic lines for this study, But the did not present any genetic validation of the lines in the manuscript. For completeness, I recommend to provide genetic validation (either pcr genotyping or whole genome sequencing) of the lines that were generated and used in this study in the supplement.

      Our study exclusively used episomal expression of the respective fluorescent reporter (H2B-GFP, NLS-mCherry, and cytoplasmic GFP). As is customary in the field resistance to selection drugs and distinct fluorescent signals are assumed to sufficiently validate the presence of the plasmids. We now added the schematic maps of the plasmids in a new Fig. S1 to make our approach more visually clear.

      1. In the H2B-GFP lines, the authors episomally GFP-tagged histone 2B to label the nuclear chromatin for both P. falciparum and P. knowlesi. This provides a very useful parasite line which enables the live time-lapse microscopy. Using these parasite lines, the authors first show that despite having a 24 h IDC in P. knowlesi vs 48 h in P. falciparum, both these parasites have a similar duration of the schizont stage (8.s vs 9.4 h). My concern here is whether this GFP-tagging is influencing the growth dynamics as in slowing down the P. knowlesi parasites. However, if that was the case authors should have seen that for P. falciparum too. Also, for the P. falciparum parasites that episomally express cytosolic GFP and Nuclear mCherry have a higher number of merozoites compared to the H2B-GFP P. falciparum and the authors speculate this is probably because of not tagging Histone 2B. Given this, it is important to show that none of the H2B-GFP parasites show any significant fitness cost due to GFP tagging of histone. I recommend a simple experiment to compare the multiplication rate of H2B-GFP lines to the parental lines in identical growth conditions. This suggested experiment was described in PMID: 35164549 to determine fitness cost of knockout lines. This experiment is vital for validation of the H2B-GFP lines and subsequent interpretation of the data that were presented in this manuscript.

      We thank the reviewer for this excellent suggestion. To validate our lines further we now have carried out multiplication rate measurements similar to the one described in the designated publication for all the used lines alongside their parental strains (Fig. S2). We found no significant differences in between the wild type and the episomally expressing parasite lines (lines 131ff), which gives us confidence that episomal expression of tagged proteins do not significantly alter growth dynamics in these cases.

      1. The authors used the microtubule live cell dye SPY555-Tubulin in P. falciparum to validate the findings presented in 1D and 1E. They did not do that for P. knowlesi. If there is no unsurmountable technical difficulty, I suggest doing the same with P. knowlesi. This will also address the concern that I have pointed out in #1.

      Thank you for this suggestion. We have now generated the requested data with P. knowlesi, added it to what is now Supplemental Figure 3 and included it in our new analysis (Fig. 2I-J). The numerical values align well with the observations made when measuring schizont stage dynamics with the H2B-GFP expressing P. knowlesi line (line 158). A notable difference is that the Tubulin data strongly support the (refined) counter model, while the H2B data alone allow no distinction.

      1. The data in Figure 3 shows that merozoite number does not depend on host cell diameter. My question here is, were these data collected using different donor blood? Or were this measured from different biological replicate? These are not clear from the writing. I am not sure about whether blood from various donor would have on the data, however, different preparation of the cells across various biological replicate will have some effect on host cell diameter hence on data. State if these were collected from independent biological replicates and about the donor blood.

      The data results where indeed collected from three independent biological replicates using different donor blood batches. This is now stated in the figure legend. The batches displayed no difference in RBC diameter.

      1. It is interesting to see that nutrient-limited conditions increase average nuclear volume but less merozoite numbers. In this experiment, as I understand, complete media was diluted 0.5x, which basically diluted every component of the media by half. From this experiment I can see nutritional deprivation as a whole having an effect and supports the counter mechanism, it would be intriguing to see if there is any effect of a particular nutrient have any effect on progeny division. For example, parasites can be grown in amino acid deprived media (except isoleucine) which makes the parasites fully dependent on host cell amino acids. This sort of specific nutrient deprivation will probably allow the authors to probe for specific nutrients that plays role as counter mechanism factor.

      This is indeed a very exciting direction we would like to investigate in more detail in follow-up studies. Our aim for this study was to confirm that nutrient deprivation actually affects “counting” and to provide a workflow to investigate individual nutrients. In the meantime the Mota group, in a study we now cite in the discussion (lines 507ff), actually reported that isoleucine (and possibly methionine) levels are linked to progeny number. A follow-up on this topic using our strains and methodology is certainly worthwhile but requires more detailed analysis in the future.

      Minor comments:

      1. P. knowlesi is sometimes just written as knowlesi. Please, write P. Knowlesi.

      Has been corrected.

      1. Supplemental figure 1D, missing x-axis label.

      We added the x-axis label.

      1. In line 105, define N/C.

      Done.

      1. In line 205, I assume the authors mean episomally, not episomally.

      Thank you for pointing this out. We have replaced “ectopically” with “episomally” throughout the text.

      1. In line 275, Duration of Schizont stage was slightly....

      Has been corrected.

      1. All 'ml' or 'µl' should be 'mL' or 'µL'.

      Changes have been made.

      1. Define iRPMI.

      We added a definition (line 610).

      1. In line 475, replace 'as' with 'and'.

      Done.

      Reviewer #3 (Significance):

      The factors that regulate the number of progenies in malaria parasites remain unknown. While there are few previous studies attempting to answer the question, those studies were done on fixed stained cells. In this study, the authors used genetically modified fluorescent P. falciparum and P. knowlesi parasites that enable live microscopy. These parasites coupled with super-resolution time-lapse microscopy the authors attempt to investigate the mechanism(s) at play in regulating progeny division. This manuscript provides data to suggest that external resources might have some role in progeny division and supports the counter mechanism. More careful validation of the transgenic lines that were used to collect data presented needs to be more systematic and rigorous.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a solid study that further characterises the dynamics of nuclear division in Plasmodium falciparum and P. knowlesi. Of two, among potentially several, models for how the number of daughter nuclei, and thus parasites - (called merozoites in this genus), are one that posits nuclei divide until a fixed timer ends, and one that posits that nuclei divide to reach a fixed number that is defined by a cellular counter. I find some practical difficulties in definitive measurement of either model, one issue with the former is that experimental definition of the start of the timer is problematic - we may define the starter's gun (eg by the first nuclear division) but it isn't necessary that the cell is using that same start time. Additionally, as the authors confirm here, being sure when that first nuclear division has occurred is particularly tricky with Plasmodium parasites, in part because the first few nuclei seem to clump together, preventing one from unambiguously calibrating the first division. Furthermore, getting decent replicate numbers is hard because of the difficulties of time lapse microscopy, and most Plasmodium studies (including this one) suffer from low enough numbers that it isn't always clear whether the numbers support one model over another.

      Nonetheless, several recent studies, particularly a study from the same institute (Klaus et al., 2022) employing timelapse imaging of nuclei, and timing the nuclear division of parasites, finds poor correlation between the duration of "schizogeny" (although perhaps using a different definition to the one used by the parasite) and the final number or merozoites. They therefore argue that there is poor evidence for a timer, and conclude by elimination that a counter must exist instead. A review by some of the authors of that study and some of this current study (Voß et al 2023), also concludes that the data from Klaus and colleagues "strongly support" a counter model. This current study also concludes that a counter model controls final nuclear/merozoite number in P. falciparum and P. knowlesi. This much at least is not particularly novel given the recent work on this topic, although the addition of the P. knowlesi data is interesting and consistent with the prior P. falciparum work. As above, the authors concede that it is difficult to determine with strong confidence when the first nuclear division has occurred, so it may well be that there is substantial noisiness in the time that they define schizogeny to commence. If that were the case, this would contribute to the poor correlation observed between schizogeny duration and number of merozoites produced, so this could be an important confounding experimental factor. This deserves some more discussion by the authors. Alternative methods to count absolute DNA content (rather than trying to count individual nuclei) might be useful ways of independently confirming this phenomenon. Alternative possibilities for what constitutes the "start" of a possible timer are also warranted - it could be for example, the first division of one of the other organelles.

      These and previous authors in any case conclude that a counter model must exist through exclusion of a timer model. I am less convinced that the evidence discounting the timer is conclusive, and that a straight counter model is the only alternative. Indeed I am unconvinced by the suitability of this strictly dichotomous two-model system to categorise the division of unicellular eukaryotes, and these theories are not universally held to be sufficient to describe division. Nonetheless, if a counter exists, what is being counted that determines the final number? The authors consider that this might be a physical object or resource inside the parasite, or an extrinsic/extracellular resource. They investigate this by comparing the final cell number to a number of factors. First, the authors investigate the size of the RBC (by musing the diameter as an indicator)- little information is given about the source of the blood used, but it appears to be from a single donor of unknown age, who has approximately typical variance in RBC diameter (at least, after manipulation and storage). The authors observe little correlation between these variables. Second the authors measure parasite size at the onset of schizogeny, and find that bigger parasites result in more daughter merozoites early in schizogeny (perhaps not surprising, given the earlier mentioned technical problems with measuring the first few steps of schizogeny), but that this different initial cell size doesn't result in a different final merozoite number, or as they describe it "not quite significant anymore". Previous p values were taken as cause for rejecting the timer hypothesis and the timer model. In this case the authors instead interpret the data as suggesting "that the setting of the counter might correlate with parasite cell size". This is inconsistent statistical and analytical handling, and highlights the earlier potential pitfall of rejecting timer-based models based on not gathering data that statistically show a correlation. This needs reworking to highlight that these data are inherently noisy, difficult to measure accurately, and aren't necessarily going strongly reveal a trend even where one biologically exists, and that this ought not be used as grounds for confident rejection of a model.

      Finally, the authors grow the parasites in dilute media, and find that they produce fewer daughter parasites. This is anecdotally unsurprising, as most Plasmodium laboratories are aware that sub-optimal growth conditions result in less healthy schizonts with fewer viable merozoites (and lower magnitudes of single-cycle expansion), but is nonetheless an important result that highlights explicitly how much this occurs in the specific conditions of dilute media. Given the lack of investigation of exactly which nutrient, carbon source, or combination thereof leads to the reduced merozoite number, it is unclear if or how much this is relevant to the scenario of a natural infection and realistic levels of that nutrient in a human or primate parasite environment.

      Minor issues

      The manuscript confuses the terms "less" and "fewer". Fewer should be used for countable nouns (fewer daughter cells, fewer nuclei, fewer merozoites), less for uncountable nouns (e.g. less speed, less volume).

      I didn't understand lines 93-95;<br /> "This excluded a timer and thereby confirmed a counter as the mechanism regulating termination of nuclear multiplication (Klaus et al., 2022). A direct correlation between duration of schizont stage and merozoite number is, however, still missing."<br /> If I understand the first sentence concludes that there ought not be a direct correlation between schizont duration and merozoite number, but the second sentence, says that that correlation is "however" missing. Isn't this expected? Perhaps reword for clarity?

      Lines 104<br /> "We further uncover that throughout schizogony P. falciparum infringes on the otherwise 105 ubiquitously constant N/C-ratio (Cantwell and Nurse, 2019)" This seems obvious to me, and not something uncovered by this study. In most of the numerous apicomplexans that divide by endoschizogeny, the cells achieve a near final size considerably before the final rounds of nuclear division so the N/C ratio must not remain constant - this is a direct corollary of many previous descriptions and not a novel finding of this study, and this claim here should be made more modest.

      I lack specialist statistical knowledge to comment on the statistical analyses performed on the correlation data, and in particular, whether the high p values for t-Tests for correlation are sufficient to support the argument that there is not a correlation, and whether these observations are sufficiently powered to robustly test that hypothesis.

      Significance

      The manuscript purports to find a counting mechanism that determines parasite merozoite numbers, and that this coutner is set by an externally provided and diffusible resource. Many nutrients are in excess in normal culture media, but not all. If that counted nutrient(s) were normally in excess in the bloodstream, it could hardly be said to be the factor that is counted and that therefore defines merozoite number. Conversely, if the amount of that nutrient were increased in normal media, would parasites make even more merozoites? Further, if the "counted" item is a freely diffusible compound in the media, it should be equally accessible to each parasite in a culture condition, and isn't a reasonable explanation for the variable merozoite numbers in the normal media conditions. To me, it is unsurprising that parasites that are healthy and well fed are able to produce more merozoites, but I don't see this as being the same as support for a counter model where the parasite senses and counts a set number of merozoites to produce in response to a specific external counter. I think the shoehorning of this phenomenon into a paradigm used to describe some other eukaryotes may not be appropriate, and that the rejection of one overly simplistic timer model should not automatically lead to us dichotomously accepting a simple counter method as the alternative. The authors need to do more to either identify a countable input whose gradual increase leads to a predictable and gradual increase in merozoite number, to show that they do use a counter, or provide substantially more caveats to their argument that the parasites are using a counter based on an externally provided resource to determine merozoite number.

      Audience - relatively specialised - likely interested audience would combine apicomplexan cell biologists, as well as theorists of cell division mechanism

      Advance - limited - confirms phenomenon also described by other researchers in their institute, and extends to another related organism.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      Malaria parasites replicating in human red blood cells show a striking diversity in the number of progeny per replication cycle. Variation in progeny number can be seen between different species of malaria parasites, between parasite isolates, even between different cells from the same isolate. To date, we have little understanding of what factors influence progeny number, or how mechanistically it is controlled. In this study, the authors try to define how the mechanism that determines progeny number works. They propose two mechanisms, a 'counter' where progeny number is determined by the measurement of some kind of parasite parameter, and a 'timer' where parasite lifecycle length would be proportional to progeny number. Using a combination of long-term live-cell microscopy and mathematical modelling, the authors find consistent support for a 'counter' mechanism. Support for this mechanism was found using both Plasmodium falciparum, the most prominent human malaria parasite, and P. knowlesi, a zoonotic malaria parasite. Of the parameters measured in this study, the only thing that seemed to predict progeny number was parasite size around the onset of mitosis. The authors also found that during their replication inside red blood cells, malaria parasites drastically increase their nuclear to cytoplasmic ratio, a cellular parameter remains consistent in the vast majority of cell-types studied to date.

      Major Comments

      • It is stated a few times in this study that P. knowlesi has an ~24 hour lifecycle, and while this is the case for in vivo P. knowlesi, it was established in the study when P. knowlesi A1-H1 was adapted to human RBCs (Moon et al., 2013) that this significantly extended the lifecycle to ~27 hours, which should be made clear in the text. As much of this study revolves around lifecycle length and timing, the authors should consider some of their findings with the context that in vitro adaption can significantly alter lifecycle length.
      • The dichotomous distinction between 'timer' and 'counter' as mutually exclusive mechanisms seems to be a drastic oversimplification. Considering the drastic variation we see in merozoite number across species, between isolates, and between cells, it seems much more likely that there are factors controlled by both time-sensed and counter-sensed mechanisms that both influence progeny number. Additionally, the only parasite parameter measured in this study, size at time of first nuclear division, explained only a small proportion of the variance observed in merozoite number.
      • For modelling of a timer-based mechanism, the designation of t0 is subjective. The authors chose the time of first nuclear division as their t0. It is possible that a timer-based mechanism could not be supported based on this model the chosen t0 differs from when the "parasite's timer" starts. For example, t could also have been designated as the time from merozoite invasion (t0) to egress (tend). It would be unreasonable to suggest the authors repeat experiments with a longer time-frame to address this, but this possibility should be discussed as a limitation of the model. It may also be possible to develop a different model where t0 = merozoite invasion and tend = egress, and test this model against the data already collected in this study.
      • The calculation of the multiplication rate is confusingly defined. In Figure 1 it is stated that it is "...based on t and n", which would imply that the multiplication rate is the number of merozoites formed per hour of schizogony, which would give an average value of ~2 for P. falciparum and ~1.5 for P. knowlesi. The averages rate values shown, however, are in the range of 0.15-3. The authors should clarify how these values were determined.
      • In Figure 2, the time from tend until egress is calculated, and this is interpreted as the time required for segmentation. In the Rudlaff et al., 2020 study cited in this paper, it is shown that segmentation starts before the final round of nuclear divisions are complete. Considering this, the time from tend until egress is not an appropriate proxy for segmentation time. The authors should consider rewording to something akin to "time from final nuclear division until egress" to more accurately reflect these data.
      • There is a significant discrepancy between the data in Figure 5 and Supplementary Figure 8. In Supplementary Figure 8, the authors establish that culturing parasites in media diluted 0.5x has a marginal effect on parasite growth, with no discernible change in parasitaemia over 96 hours. By contrast, in Figure 5a the parasitaemia of parasites cultured in 0.5x diluted media is approximately 5-fold lower than those in 1x media. The authors should explain the significant discrepancy between these results.
      • In Supplementary Figure 4, the mask on the cell at t0 shows two distinct objects, but it seems very unlikely that they are two distinct nuclei as they vary approximately 5-fold in diameter. The authors should provide more detail on how their masking was performed for their volumetric analysis. Specifically, whether size thresholds were also applied during object detection.

      Minor Comments

      • Line 45-48 mentions that merozoite number influences growth rate and virulence, but the corresponding reference (Mancio-Silva et al., 2013) only discusses the relationship between merozoite number and growth rate, not virulence.
      • Line 59 states that a 48 hour lifecycle is a baseline from which in vitro cultured parasites deviate. Clinical isolates also show variation in lifecycle length and so it is more accurate to just say that 48 hours is an average, rather than a baseline.
      • Line 63 cites a study for the lifecycle length of P. knowlesi (Lee et al., 2022), but there seems to be no mention of lifecycle length in this reference
      • If I am interpreting Figure 3B correctly, this is essentially a paired analysis where the same erythrocytes are measured twice, once at t0 and once at tend. If this is the case, this data may be better represented with lines that connect the t0 and tend values.
      • Figure 3A seems to imply that to calculate diameter of the erythrocytes, three measurements were made and averaged for each cell. I think this is a nice way to get a more accurate erythrocyte diameter, but if this is the case, it should be specified in the figure legend or methods.
      • In Figure 4I it is shown that in P. falciparum merozoite number doesn't correlate with nucleus size, but for P. knowlesi in Supplementary Figure 7c, a significant anticorrelation is observed. The authors should state this in the text and discuss this discrepancy.
      • The authors show that merozoite number roughly correlates with cell size at t0 but it would be interesting to see whether cell size at tend also corresponds with cell size at t0. This might help answer whether the cell is larger because it has more merozoites, or whether it has more merozoites because it is larger.
      • I don't feel that "nearly identical" is an appropriate summary of erythrocyte indices in Supplementary Figure 9, considering there is a statistically significant increase in mean cell volume. I think it is unlikely that this change is consequential, and performing these haematology analyses is a nice quality control step, but this change should be stated in the text.
      • In Supplementary Figure 8, parasitaemia only increases ~2-fold compared to >5-fold the previous two cycles. It seems likely that at the final timepoint on this graph the parasites are starting to crash, and therefore it may be best to end the graph with the 96 hour timepoint.
      • The error bars in Figure 5C aren't easily visible, moving them in front of the datapoints may help their visibility.
      • In Figure 6D & E, the y-axis labels should be changed to whole integers as all the values in the graph are whole numbers.
      • My interpretation of Figure 6 C-E, is that these are the same cells measured at three time points (t-2, t0 and tend). If this is the case, 6C is missing the cell that has a merozoite number of 8, which is presumably why the y-axes are not equalised for the three graphs.

      Significance

      In the asexual blood-stage of their lifecycle, malaria parasites replicate through a process called schizogony. During schizogony an initially mononucleated parasite undergoes multiple asynchronous rounds of mitosis followed by nuclear division without cytokinesis, producing a variable number of daughter nuclei. Parasites then undergo a specialised cytokinesis, termed segmentation to where nuclei are packaged into merozoites that go on to invade new host cells. While nucleus, and therefore merozoite, number are known to be varied between cells, across isolates, and across species, little is known about the mechanisms regulating merozoite number. In this study, the authors use live-cell microscopy to understand how parasites determine their progeny number. They suggest that parasites regulate their progeny number using a 'counter' mechanism, which would respond to the size or concentration of a cellular parameter, as opposed to a 'timer' mechanism. Long-term live-cell microscopy experiments using malaria parasites are extremely technically challenging, and the authors should be commended for their efforts in this regard. While I agree that the data generated from these experiments are technically sound, I have some reservations expressed above about the interpretation of some of these results. I would strongly encourage the authors to consider rewording some of their interpretations taking into account some of the caveats listed above. I would also consider fitting/testing an additional mathematical model where the time-frame proposed for the 'timer' mechanism begins following merozoite invasion.

      This work is of specific interest to anybody who grows malaria parasites, as the dynamics of their growth is obviously important to understand. Further, this work is of interest more generally to cell biologists who study the regulation of progeny number or cell size. I have no experience with the application of mathematical modelling to understand biological systems, and so I cannot comment on the interest of this work to that field.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this manuscript the author is presenting a deep-learning model used to predict the development stage of zebrafish embryo. A robust method that can accurately classify a zebrafish into different development stages is highly relevant for many researchers working with zebrafish and hence the importance in developing methods like this is high.

      The manuscript is overall ok. However, more data is needed to convince the reader that the method is robust enough to work with other samples in other labs. This would greatly improve the impact of the publication.

      We agree with the reviewer and have included in our revised manuscripts additional test data that was acquired at a different laboratory to the training data (Figures 5 - 7).

      Page 6.<br /> - How is the data acquired?

      Images used to do initial model training are the same as those used in a previous study - the details of image acquisition are contained in the relevant publication (doi: 10.12688/wellcomeopenres.18313.1). However, we have now added “Zebrafish Husbandry” and “Live Imaging” for newly-acquired images. We have added a table (Table 1) listing details of all image data used in the study.

      Page 8.<br /> "This indicates that whileKimmelNet can be used successfully with noisier test data than that on which it was trained,there is an upper limit to how noisy the data can be."<br /> - This is an obvious statement there will always be an upper limit on noise.

      We agree with the reviewer that this statement is not terribly informative. This section (“KimmelNet’s prediction accuracy is not significantly impacted by moderate levels of additive noise”) has been removed from the revised manuscript in favour of incorporating a section detailing testing of the model on newly-acquired images (“KimmelNet can generalise to previously unseen data”).

      Page 9.<br /> - Are only wildtype embryos used? How would this work on different mutants. To evaluate the robustness of the method this it would be valuable to test on some mutant line with known developmental difference from the wild type.

      We agree with the reviewer that testing on a mutant line would lend more weight to our findings. For example, the p53-/- zebrafish has a reported, published developmental delay, but we did not have access to that line. However, the developmental delay reported for the p53-/- mutant is virtually indistinguishable from that effected by a temperature change. We therefore focussed our efforts on developmental delay affected by altering incubation temperature only.

      Image data.<br /> - I would strongly suggest that the author should include a description of the data in the manuscript. A description of how the data is acquired, microscope, different batches, type of embryos used.

      The image data used in the first draft of the manuscript is the same as that used in a previous publication (Jones et al. 2022), which contains all the relevant details the reviewer seeks. However, we have now added the relevant information for the newly-acquired image data.

      "Random160translation in the y-direction was avoided due to the aspect ratio of the images (width>161height) - any artifacts introduced by translation in the x-direction would be removed by the162centre crop layer, but this would not be the case for translation in the y-direction."<br /> - Could this be solved by using some border method reflection, repetition or fixed value?

      The reviewer is correct that some form of image reflection or repetition could be utilised. However, given the nature of our images, if an embryo is located close to the image boundary, reflection/repetition can result in some odd artefacts, so we minimised the use of such fill methods (also used by the random zoom augmentation layer). We could instead use an arbitrary fixed value, as the reviewer suggested, but finding a value suitable for all images is difficult.

      Page 10.<br /> Addition of Noise to Image Data<br /> - This should be added in the training phase. This would probably improve the robustness of the network and also improve the results on the test data.

      We agree with the reviewer and have now added a random Gaussian noise layer for data augmentation purposes during model training (see Figure 1).

      • Supplementary 3 images with high noise. It is worrying that the network is not able to handle the noise in this figure. Looks like the features that is used to distinguish the developmental stage of the embryo is still clearly seen with this high noise level? Retrain the model with noise as an augmentation to improve this.

      As the reviewer suggested, addition of random noise is now incorporated into model training. The new version of the manuscript does not include the same supplemental figures, but instead includes additional testing on newly-acquired data instead.

      Reviewer #1 (Significance):

      The development of methods like this is highly relevant in the zebrafish community. Staging and evaluating the developmental stage for zebrafish is common and is of interest to the broad community. A lot of this work today is done manually, limiting the throughput, and adding human bias.

      The limit of this study is the dataset used for training and evaluation. Firstly, it is not clear about the structure of the data and how it is acquired, different types of fish or imaging setup etc. For a method to be useful to the community it needs to be robust enough to handle different types of fish (transgenic lines). The manuscript would be greatly improved by adding this to the training and evaluation.

      We have now added additional datasets for the purposes of evaluating the model.

      My expertise is image analysis and machine learning for quantification of biological samples, with focus on zebrafish screening.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary<br /> The paper "Automated staging of zebrafish embryos with KimmelNet" by Barry et al., presents a method to automatically stage developmental timepoints of zebrafish embryos based on convolutional neural networks (CNN). The authors show that a CNN trained on ~20k images can determine time post fertilization on test-image sets with an accuracy on the range of a few hours. This technique undoubtedly has the potential to become very useful for any zebrafish researchers interested in developmental timing as it eases analysis and removes potential subjective bias.

      Major comments<br /> In its current form the paper lacks sufficient graph annotations and method descriptions. This makes it hard in places to judge the validity of the claims. I do believe that the presented method can be useful and is likely valid but to be convincing, the authors need to spend more time expanding the methods, justifying their choices of analysis and clarifying figure annotations.

      We believe that we have addressed the reviewer’s concerns in this revised manuscript, as detailed in response to the specific points below.

      Specific points:<br /> 1) The annotation of the training data is not described and specifically it is unclear how valid the staging of the training data itself is. The authors state in the introduction "the hours post fertilization (hpf) [...] provides only and approximation of the actual developmental stage". It is therefore critical to know how this was accounted for in the annotation of the training data. Since the quality of the training data will ultimately limit the best-case quality of Kimmel Net. The authors need to go into some detail here even though the training data appears to be from another published dataset.

      The reviewer raises a valid point – two individual zebrafish embryos that are x hours post-fertilisation are not necessarily at the same developmental stage. However, we believe it is reasonable to assume that two populations of embryos x hours post-fertilisation are, on average, at the same developmental stage and it is this assumption that forms the basis for our approach. Given the inherent variability the reviewer refers to, we are not suggesting that our model would be particularly accurate for staging individual embryos. However, we are very confident (and we believe the data in the manuscript supports this) that given a population of embryos, our model will provide an accurate rate of development. We have added a paragraph (lines 131-141) to address this point.

      2) Why were "test predictions fit to a straight line through the origin". On the one hand this makes sense (since the slope would indicate the correspondence) but it should be clarified why an intercept was omitted in the fit. After all it is unclear if Kimmel net correctly identifies 0Hpf embryos.

      The reviewer makes a valid point – we do not know what predictions KimmelNet would produce for images of embryos closer to 0 hpf. However, we felt an equation of the form y=mx was a reasonable choice for two reasons. First of all, it matches the form of the Kimmel equation, which, despite its flaws, we are using as a benchmark of sorts – the absence of a y intercept makes comparisons with the Kimmel equation straightforward. Secondly, a “perfect” model would produce a straight line fit with y=x, so the lack of a y intercept seemed a reasonable constraint to impose. We have added some brief text (lines 103-105) to clarify this choice.

      3) The methods do not list how the mean of the absolute error was calculated from 3B/C. I think this should be the mean of the absolute error (not the mean of the error) but in that case the numbers listed in the text appear rather small given the histograms in 3 B/C. A clear statement in the methods would clarify this issue.

      We have now added a “Statistical Analysis” section under Materials & Methods to detail exactly what was used to calculate the values given for error analysis. We have calculated the mean of the error, not the mean of the absolute error, as we wish to illustrate that the mean is close to zero. We have used the standard deviation of the errors to illustrate that there is a significant spread in the error values, as depicted in Figure 3C and D.

      Minor comments<br /> 1) The Y-axis in Figure 2B is simply labeled "Loss" - what is the unit of this loss? HPF? Or HPF^2? This is important for judging the quality of the fit

      We thank the reviewer for drawing our attention to this omission. The loss is hpf2 (mean squared error) and we have updated the plot to reflect this.

      2) Figure 3 B. I would suggest changing the labels of the confidence intervals in the legend. "Inner and outer" is already clear from the figure itself, so labels that state that these are derived from n=100 vs. n=20 test image sized samples would be more useful to the reader

      We thank the reviewer for this suggestion – we have updated the figure legend accordingly.

      Referees cross-commenting

      I concur with comments issued by the other reviewers. I think it will be especially important to address the comments related to testing the method on mutants (Reviewer #1) and training the model in the presence of noise to increase its robustness (Reviewers #1 and #3) as well as addressing the overall annotation/generation of the training data (Reviewers #1 and #2).

      We believe we have now addressed all of these concerns. The model has been retrained with additional data augmentation incorporating random noise, tested on newly-acquired data and we have added tables summarising the details of all image data used in this study.

      I think these points will be critical to make the paper useful by increasing transparency and ensuring reproducibility in other labs with different imaging conditions, strains, mutants, etc.

      Reviewer #2 (Significance):

      Developmental delay is a common occurrence that can be caused by genetic and environmental background effects. It is therefore highly desirable to properly quantify this variable. The work presented here makes an important step in this direction, by allowing to quantify developmental timepoints independent of subjective staging. This speeds up analysis, increases reproducibility and enhances rigor. However, as my comments above indicate, the significance also depends on the ability of potential users to judge the quality of the work. Once those issues have been addressed, I think the work will be of broad interest to the developmental biology community, first and foremost labs utilizing the zebrafish model. However, as the authors state, the presented model architecture could be trained with the data from other species as well.

      Expertise: Zebrafish, quantitative analysis, behavior, neuroscience

      We thank the reviewer for their positive feedback.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Properly staging embryos of zebrafish embryos is important, yet provides challenging since it can depend on many factors, such as temperature, water quality, fish population density, etc. Here, the authors provide a deep-learning-based model called KimmelNet that allows the prediction of the age of zebrafish embryos, using 2D brightfield images. The technique is robust to weak measurement noise and can also be used to identify developmental delays from a very small number of experimental data.

      The code is accessible to the reader, open-source and should be useable by experimentalists without huge effort.

      Major comments:

      I suggest retraining the model and application of the model to additional data for the following reasons:<br /> • Why did the authors not train for (high) measurement noise and heterogeneous background illumination? Would that not make the model more robust? In principle, creating training should not be considerably harder than before, since the manipulation of the images has been already shown in the manuscript and the authors just need to run it again on the HPC cluster. If there are no technical or administrative constraints (access to the cluster, computational effort, high costs, etc.), the authors should retrain their model.

      We thank the reviewer for this suggestion. As detailed in Figure 1, with a view to making the model more robust, we have now added several more layers of data augmentation, including the addition of random noise, and retrained our model.

      • For Fig. S2 and S3 it is not clear if there is such a strong deviation from the Kimmel equation due to measurement noise or due to the background illumination. The saliency maps appear as if they are mainly affected by the illumination, and maybe less by the noise. Would it be possible to apply the model to a case without artificial noise, but with heterogeneous background illumination to identify what has a bigger impact?

      We thank the reviewer for this suggestion. We have now replaced the “artificial” examples used in the previous version of the manuscript with newly-acquired data (Figure 5), which exhibits different characteristics to that used for training.

      Additionally, the authors need to clarify what exactly they are comparing in this manuscript and rework their interpretation of their findings:<br /> • When comparing the predictions between KimmelNet and the Kimmel equation, the authors use an equation of the form y=mx. Could the authors please elaborate on why they introduce the constraint of y(0)=0? It might be naturally given by the so-called Kimmel equation, but by looking at Fig 3a, it seems like an equation of the form y=mx+a would be more appropriate and it appears like KimmelNet introduces an offset of around a=2h for 25 Celsius. The authors need to discuss this.

      The main rationale for using an equation of the form y=mx is to be consistent with the Kimmel equation (see lines 103-105). The reviewer is correct that an equation of the form y=mx+c may well produce a better fit to the data, but omitting a y intercept makes comparison with the Kimmel Equation trivial.

      • In lines 5-8 the authors say that developmental stage progression does not only depend on temperature, but also on population density, water quality etc. and they explain that usually not only hpf, but also staging guides based on morphological criteria are used! If that is true, how good is their training data set that only uses hpf and not the other important guides? How did the authors test that these factors have no impact on their training data?

      We have now added a paragraph (lines 131-141) to address this point.

      Since this tool has the potential to have a big impact on zebrafish research, it would be nice to provide some examples of how exactly this could be achieved:<br /> • Could the authors discuss how exactly their tool is useful to experimentalists? Is it the idea that if an experimentalist wants to investigate an embryo of a particular stage, they apply KimmelNet to images of embryos to identify the stage of the embryo and only then undertake their planned experiment? Is that a realistic undertaking?

      As is evidenced by the errors presented in Figure 3C & D, testing KimmelNet on individual images of embryos may well result in a large error in the predicted hpf. As such, it is not appropriate to use the tool in such a manner. However, to modify the example provided by the reviewer, should an experimentalist have a population of embryos they wished to stage, then yes, KimmelNet would certainly be an appropriate tool for doing so.

      • Would it be possible to provide a tutorial (or even video tutorial if such skills are available in the group of authors) that provides real examples of how to apply the technique? This would make it easier for people without advanced Python/Deep-Learning skills to use the tool, hence improving the impact of KimmelNet.

      A lack of user-friendly interfaces for applying deep learning methods in biology is well-documented – basic knowledge of python and tools like jupyter notebooks are often necessary (https://doi.org/10.1038/s41592-023-01900-4). However, we have endeavoured to make the running of KimmelNet as easy as possible for new users. A jupyter notebook instance can be run on Binder with absolutely no set-up required. To run KimmelNet on their own data, biologists just need to download the Git repo and replace the test images with their own data. We have updated the landing page on the GitHub repo to provide more specific step-by-step instructions for each of these tasks. We will also endeavour to upload our model to the BioImage Model Zoo (https://bioimage.io/#/) to further increase accessibility.

      I am very critical towards equation 1. Please note that I don't think this has any impact on the quality of the technique provided in this manuscript and the significant flaws can already be found in Kimmel 1995 (which is not under review here). That is why I suggest rewriting of this manuscript to not support an over-interpretation of this equation.<br /> • I do not think that the Kimmel equation is an established term. At least a Google Scholar Search for "Kimmel equation" only gives one result: the preprint of this manuscript.<br /> • The equation has no mathematical meaning regarding its units (subtracting temperature and a unitless value). I also very rarely see equations with Degrees Celsius and not Kelvin.<br /> • Additionally, the equation provides a linear relationship between the development time and temperature h(T) and in Kimmel et al, it is shown that this is only true for 25-33 Celsius. Such a linearisation is not very surprising for a small temperature range, but I am not sure how true it is for higher temperature differences. Hence, I feel that it is very bold to give a specific name to such an equation, giving it an importance that it does not deserve.

      We appreciate the reviewer’s concerns and have removed explicit references to “The Kimmel Equation”, without substantively changing the content of the manuscript.

      Minor comments:

      • For the measurement noise cases it would be nice to have some example images of fish with the specific noise levels in Fig S1 and Fig S2.

      We have now removed the “synthetic” additive noise test data, previously depicted in Figures S1-3, in favour of newly-acquired images in Figures 5-7.

      • Could the authors hypothesize why they predict a slower dynamic for 25 Celsius than predicted by the Kimmel equation?

      Referring to Figure 2 in Kimmel et al (1995), it is apparent that the straight lines are by no means perfect fits to the datapoints. In Fig 2A in particular, some datapoints for the 25C data fall well below the line fit. While the published equation suggests a rate of development 80.5% of the rate at 28.5C, according to Fig 2A, an alternative line fit could give a developmental rate as low as 70-75%, which would be in agreement with our data.

      Reviewer #3 (Significance):

      Strengths of the study:

      An easy-to-use method to automatically stage zebrafish embryos and identify differences in the developmental stage is very important for the zebrafish community and the technique in this manuscript definitely novel. The tool is can be used without large effort and the authors suggest that it can also find applications beyond zebrafish embryos. Hence, it is not only interesting to the zebrafish community, but to a broader developmental biology audience.

      Weakness of the study:<br /> The main weakness of the manuscript is in the training data used for the deep-learning model and the apparent large impact of heterogeneous background illumination. If that is not solved, it is unclear if this technique will find an application in the zebrafish community.

      We believe this weakness has now been addressed by incorporating additional data augmentation measures in the training process and testing the model on newly-acquired data.

      Field of expertise of the reviewer: Image Analysis, Mathematical Modelling, Biological Physics. While I have limited experience in deep learning, I cannot evaluate the specific details of the network architecture. I also have no experience in zebrafish research.

    1. Reviewer #2 (Public Review):

      This study examines the construct of "cognitive spaces" as they relate to neural coding schemes present in response conflict tasks. The authors use a novel experimental design in which different types of response conflict (spatial Stroop, Simon) are parametrically manipulated. These conflict types are hypothesized to be encoded jointly, within an abstract "cognitive space", in which distances between task conditions depend only on the similarity of conflict types (i.e., where conditions with similar relative proportions of spatial-Stroop versus Simon conflicts are represented with similar activity patterns). Authors contrast such a representational scheme for conflict with several other conceptually distinct schemes, including a domain-general, domain-specific, and two task-specific schemes. The authors conduct a behavioral and fMRI study to test which of these coding schemes is used by prefrontal cortex. Replicating the authors' prior work, this study demonstrates that sequential behavioral adjustments (the congruency sequence effect) are modulated as a function of the similarity between conflict types. In fMRI data, univariate analyses identified activation in left prefrontal and dorsomedial frontal cortex that was modulated by the amount of Stroop or Simon conflict present, and representational similarity analyses (RSA) that identified coding of conflict similarity, as predicted under the cognitive space model, in right lateral prefrontal cortex.

      This study tackles an important question regarding how distinct types of conflict might be encoded in the brain within a computationally efficient representational format. The ideas postulated by the authors are interesting ones and the statistical methods are generally rigorous. The evidence supporting the authors claims, however, is limited by confounds in the experimental design and by lack of clarity in reporting the testing of alternative hypotheses within the method and results.

      (1) Model comparison

      The authors commendably performed a model comparison within their study, in which they formalized alternative hypotheses to their cognitive space hypothesis. We greatly appreciate the motivation for this idea and think that it strengthened the manuscript. Nevertheless, some details of this model comparison were difficult for us to understand, which in turn has limited our understanding of the strength of the findings.

      The text indicates the domain-general model was computed by taking the difference in congruency effects per conflict condition. Does this refer to the "absolute difference" between congruency effects? In the rest of this review, we assume that the absolute difference was indeed used, as using a signed difference would not make sense in this setting. Nevertheless, it may help readers to add this information to the text.

      Regarding the Stroop-Only and Simon-Only models, the motivation for using the Jaccard metric was unclear. From our reading, it seems that all of the other models --- the cognitive space model, the domain-general model, and the domain-specific model --- effectively use a Euclidean distance metric. (Although the cognitive space model is parameterized with cosine similarities, these similarity values are proportional to Euclidean distances because the points all lie on a circle. And, although the domain-general model is parameterized with absolute differences, the absolute difference is equivalent to Euclidean distance in 1D.) Given these considerations, the use of Jaccard seems to differ from the other models, in terms of parameterization, and thus potentially also in terms of underlying assumptions. Could authors help us understand why this distance metric was used instead of Euclidean distance? Additionally, if Jaccard must be used because this metric seems to be non-standard in the use of RSA, it would likely be helpful for many readers to give a little more explanation about how it was calculated.

      When considering parameterizing the Stroop-Only and Simon-Only models with Euclidean distances, one concern we had is that the joint inclusion of these models might render the cognitive space model unidentifiable due to collinearity (i.e., the sum of the Stroop-Only and Simon-Only models could be collinear with the cognitive space model). Could the authors determine whether this is the case? This issue seems to be important, as the presence of such collinearity would suggest to us that the design is incapable of discriminating those hypotheses as parameterized.

      (2) Issue of uniquely identifying conflict coding

      We certainly appreciate the efforts that authors have taken to address potential confounders for encoding of conflict in their original submission. We broach this question not because we wish authors to conduct additional control analyses, but because this issue seems to be central to the thesis of the manuscript and we would value reading the authors' thoughts on this issue in the discussion.

      To summarize our concerns, conflict seems to be a difficult variable to isolate within aggregate neural activity, at least relative to other variables typically studied in cognitive control, such as task-set or rule coding. This is because it seems reasonable to expect that many more nuisance factors covary with conflict --- such as univariate activation, level of cortical recruitment, performance measures, arousal --- than in comparison with, for example, a well-designed rule manipulation. Controlling for some of these factors post-hoc through regression is commendable (as authors have done here), but such a method will likely be incomplete and can provide no guarantees on the false positive rate.

      Relatedly, the neural correlates of conflict coding in fMRI and other aggregate measures of neural activity are likely of heterogeneous provenance, potentially including rate coding (Fu et al., 2022), temporal coding (Smith et al., 2019), modulation of coding of other more concrete variables (Ebitz et al., 2020, 10.1101/2020.03.14.991745; see also discussion and reviews of Tang et al., 2016, 10.7554/eLife.12352), or neuromodulatory effects (e.g., Aston-Jones & Cohen, 2005). Some of these origins would seem to be consistent with "explicit" coding of conflict (conflict as a representation), but others would seem to be more consistent with epiphenomenal coding of conflict (i.e., conflict as an emergent process). Again, these concerns could apply to many variables as measured via fMRI, but at the same time, they seem to be more pernicious in the case of conflict. So, if authors consider these issues to be germane, perhaps they could explicitly state in the discussion whether adopting their cognitive space perspective implies a particular stance on these issues, how they interpret their results with respect to these issues, and if relevant, qualify their conclusions with uncertainty on these issues.

      (3) Interpretation of measured geometry in 8C

      We appreciate the inclusion of the measured similarity matrices of area 8C, the key area the results focus on, to the supplemental, as this allows for a relatively model-agnostic look at a portion of the data. Interestingly, the measured similarity matrix seems to mismatch the cognitive space model in a potentially substantive way. Although the model predicts that the "pure" Stroop and Simon conditions will have maximal self-similarity (i.e., the Stroop-Stroop and Simon-Simon cells on the diagonal), these correlations actually seem to be the lowest, by what appears to be a substantial margin (particularly the Stroop-Stroop similarities). What should readers make of this apparent mismatch? Perhaps authors could offer their interpretation on how this mismatch could fit with their conclusions.

    1. Reviewer #2 (Public Review):

      Summary: The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths: I find the study interesting and relevant especially because of the following:<br /> 1. The identification of Pkc53E in phosphorylation of Su(H) is very interesting.<br /> 2. The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:1. Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.<br /> While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      2. Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed: First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response. Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed. The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed. Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      3. In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      4. In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      5. Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Editors for the opportunity to submit a revised manuscript, and the Reviewers for their positive evaluations and constructive comments. We feel that the comments and suggestions significantly improved the quality of our manuscript. We addressed all questions and suggestions in a point-by-point fashion below.

      Reviewer #1 (Public Review):

      This paper proposes and evaluates a new approach for the registration of human hippocampal anatomy between individuals. Such registration is an essential step in group analysis of hippocampal structure and function, and in most studies to date, volumetric registration of MRI scans has been employed. However, it is known that volumetric deformable registration, due to its formulation as an optimization problem that minimizes the combination of an image similarity term and relatively simple geometric regularization terms, fails to preserve the topology of complex structures. In the cerebral cortex, surface-based registration of inflated cortical surfaces is broadly preferred over volumetric registration, which often causes voxels of different tissue types to be matched (e.g., voxels belonging to a sulcus in one individual mapping onto voxels belonging to a gurys in another). The authors recognize that hippocampal anatomy is similarly complex, and, with proper tools, can benefit from surface-based registration. They propose to first unfold the hippocampus to a two-dimensional rectangle domain using their prior HippUnfold technique, and then to perform deformable registration in this rectangle domain, matching geometric features (curvature, thickness, gyrification) between individuals. This registration approach is evaluated by comparing how well hippocampal subfields traced by experts using cytoarchitectural information align between individuals after registration. The authors indeed show that surface-based registration aligns subfields better than volumetric registration applied to binary segmentations of the hippocampal gray matter.

      Overall, I find the methods and results in this paper to be convincing. The authors framed the comparison between surface-based and volumetric registration in a fair way, and the results convincingly show the advantage of surface-based registration. One slight limitation of the current study is that it is uncertain whether the benefits demonstrated here translate to in vivo MRI data for which the authors' HippUnfold algorithm is tailored. The current study utilized the unfolding technique used in HippUnfold on manual segmentations of high-resolution ex vivo MRI and blockface 3D volumes, which are likely closer to anatomical ground truth than automated segmentations of in vivo MRI. However, it is reasonable to assume that given that the volumetric registration to which the proposed approach was compared also used this high-detail data, the advantages of surface-based over volumetric registration would extend to in vivo MRI as well. However, I would encourage the authors to perform future evaluations on datasets with available in vivo and ex vivo MRI from the same individuals.

      We thank the Reviewer for the positive evaluation and the thoughtful feedback. We address each comment in the red text below.

      We have considered the Reviewer suggestion for a demonstration of the gains from our proposed method in MRI, and decided to include a new analysis of 7T in-vivo MRI data from 10 healthy participants (Supplementary Materials 1: in-vivo MRI demonstration).

      It is difficult to assess whether changes to the registration methods are indeed an improvement without same-subject “ground-truth” subfield definitions typically obtained from histology. In this new Supplementary Materials section, we demonstrate an overall sharpening of MRI-mapped features as an indirect indication of better inter-subject alignment (similar to the paper referenced in the comment, below). This is an important proof of concept that demonstrates that the gains made in the current project can be translated to in in-vivo MRI. We did not perform a demonstration of these gains in ex-vivo data, since this also comes with a host of challenges including access to such data and deformations and artifacts associated with ev-vivo scanning. However, we believe that the gains provided by our methods are limited mainly by image resolution and so while we note some concern about the gains from this method at 3T MRI, we expect that in ev-vivo gains provided by our method in higher resolution ex-vivo images should be consistent or better.

      We have added the following in-text Discussion of this new analysis (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      I would also like to point out the relevance of the 2021 paper "Unfolding the Medial Temporal Lobe Cortex to Characterize Neurodegeneration Due to Alzheimer's Disease Pathology Using Ex vivo Imaging" by Ravikumar et al. (https://link.springer.com/chapter/10.1007/978-3-030-87586-2_1) to the current work. This paper applied an earlier version of the unfolding method in HippUnfold to ex vivo extrahippocampal cortex and performed registration using curvature features in the rectangular unfolded space, also finding slight improvement with surface-based vs. volumetric registration, so its findings support the current paper.

      Thank you, we agree this is a highly relevant paper and have added a summary of it in the newly added Discussion paragraph which also outlines the new Supplementary Materials section (see previous comment).

      Overall, the paper has the potential to significantly influence future research on hippocampal involvement in cognition and disease. Outside of simple volumetry studies, most hippocampal morphometry studies rely on volumetric deformable registration of some kind, typically applied to whole-brain T1-weighted MRI scans. With HippUnfold available for anyone to use and not requiring manual registration, the paper provides a strong impetus for using this approach in future studies, particularly where one is interested in localizing effects of interest to specific areas of the hippocampus. Additional evaluation of in vivo HippUnfold using in vivo / ex vivo datasets, would make the use of this approach even more appealing.

      We would like to thank the Reviewer for their enthusiasm for the translatability of this work. We hope they are satisfied with our newly added in-vivo evaluation, and we appreciate the thoughtful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      No additional recommendations.

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. While the conclusions of this paper are mostly well supported by data, some aspects of the method need to be clarified. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer for their thoughtful evaluation of our paper and helpful comments. We address them in the red text below each comment.

      Regarding the methodological clarification of the surfaced-based registration method, the last step of the process needs further clarification. Specifically, after creating the averaged 2D template, it is unclear how each individual sample is registered to sample1's space. If I understand correctly, after creating the averaged 2D template, each individual sample is then registered to sample1's space via the transform from each sample to the averaged template and then the inverse transform from the template to sample1's space. Samples included both left and right hemispheres, so were all samples being propagated to left hemisphere sample 1 space? The authors also note that a measure of the subfield labels overlap with that sample's ground-truth subfield definitions was calculated. Is this a measure of overlap, for example, between sample 3 (registered to sample 1 space) and the ground-truth (unfolded, not registered) sample 3 labels? It would be beneficial to provide a full walkthrough of one example sample to clarify the steps. Clarification of this aspect of the method is critical for understanding the evaluation of the method.

      We would like to thank the Reviewer for the suggestion, and have clarified the passage with the following walkthrough example as suggested by the Reviewer (p. 8):

      “For example, sample3 was unfolded and then registered to the unfolded average, making up two transformations. These were then concatenated with the inverse transformation of unfolded sample1 to the same unfolded average, and the inverse transformation of native sample1 to unfolded space. This concatenated transformation was used to project labels from sample3 native space directly to sample1 native space, which should ideally lead to near-perfect subfield alignment in sample1 native space. Dice overlap between sample1 and sample3 registered to sample1 was then calculated in sample1 native space.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      In the Data section, it would be helpful for the authors to clarify whether each hippocampal histology sample is from a different individual or not. Additionally, for the 3D-PLI sample, the authors mention that the anterior/posterior parts of the hippocampus were cut off and the labels were extrapolated over the missing regions. It would be useful to know whether the extrapolation was done manually.

      Thank you, we have added separate labels (donors 1-4) for each individual from each dataset. We have also added that the 3D-PLI dataset was extrapolated manually. See the revised Materials and Methods: Data section.

      A small clarification, but for the morphological features calculated by HippUnfold, is thickness a measure of how much space each subfield takes up in the 2D unfolded space?

      Thickness is measured via HippUnfold, and we have clarified in-text that it is done in each subject’s native space (p. 6):

      Results:

      In the Results section, a brief summary or description of the Dice overlap metric would be helpful. The authors should also clarify if the Dice metric measures the overlap between an individual sample (e.g., sample3) that has been unfolded and registered/propagated to sample1 compared to the sample1 ground-truth subfields.

      We thank the Reviewer, and hope this is now clarified alongside the Reviewer’s Public comment with the addition of the example as quoted in our response to that comment.

      We also added to our description of Dice overlap as a measurement (p. 8):

      “The Dice overlap metric (Dice, 1945), which can also be considered an overlap fraction ranging from 0-1, was calculated for all subjects’ subfields registered to sample1.”

      Figure 3:

      In Figure 3A, it is unclear what "moving (sample 3)" refers to. Clarification is needed, and it would be helpful to know if this is sample 3 in native space before it has been unfolded/registered. In Figure 3B, there is a missing "native" before "folded" and "(right)" at the end of the sentence. With these edits, the sentence in the caption would read: "Each measure was calculated in unfolded space (left) and again in the first sample's (BigBrain left hemisphere) native folded space (right)."

      We thank the Reviewer, and have now changed “moving” to “sample3 before registration”, and added the suggested caption changes. See the revised Figure 3.

      Discussion:

      In the introduction, the authors provide a detailed description of the traditional 3D volumetric registration technique that utilizes gyral and sucal patterning as the primary feature for registration, along with other features such as thickness and intracortical myelin. Using their surface-based registration, the authors highlight an interesting finding that hippocampal curvature is the most informative individual feature, and thickness and curvature combined are the most informative features for registration and boundary alignment. In the discussion, it would be beneficial for the authors to discuss the relationship between curvature, thickness, and gyrification (e.g., is there overlapping information across these features) and comment on the reliability of these features observed in the current study compared to past work using traditional methods.

      This is an interesting point of discussion, thank you for raising it. We’ve added the following paragraph to the Discussion section (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      Miscellaneous:

      There is a typo on page 11, line 318, with extra parentheses: "(e.g., (Borne et al., 2023;..."

      Thank you, we have corrected this error.

      Reviewer #3 (Public Review):

      Dekraker and colleagues previously developed a new computational tool that creates a "surface representation" of the hippocampal subfields. This surface representation was previously constructed using histology from a single case. However, it was previously unclear how to best register and compare these surface-based representations to other cases with different morphology.

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing seven different ground-truth subfield definitions. This is an impressive and valuable effort that provides important groundwork for future in vivo multi-atlas methods.

      We thank the Reviewer for their positive evaluations, and helpful suggestions. We provide responses to the recommendations in the red text below.

      Reviewer #3 (Recommendations For The Authors):

      There are a few points I think the authors should address, listed below.

      1) As the authors are well aware, subfield definitions vary considerably across laboratories. The current paper states that JD labeled the samples using three different atlas references: Ding & Van Hoesen, 2015; Duvernoy et al. ,2013, and Palomero-Gallagher et al., 2020. This is unclear, however, since these three references differ in their subfield definitions. For example, Ding & Van Hoesen and Palomero-Gallagher define a region called the prosubiculum (area between subiculum and CA1) but Duvernoy does not. Please clarify which boundary rules from which particular references were used here. How were discrepancies across these references resolved when applying labels to the current histological samples?

      We thank the Reviewer, and have added the following elaboration (p. 5):

      “Since these sources differ slightly in their boundary criteria, and no prior reference perfectly matches the present samples, subjective judgment was used to draw boundaries after considering all three prior works. The “prosubiculum” label used by Ding & Van Hoesen and Palomero-Gallagher et al. was included as part of the subicular complex. See Supplementary Materials 2: ground-truth segmentation for more details.”

      2) Another comment has to do more with the "style" of how this paper is written, especially given that this paper was submitted to eLIFE (i.e. not a specialty journal). For example, the motivation for the unfolded with and without registration methods was not well described. Similarly, there was almost no justification for the different methods applied in Figure 4 and I fear that the impact of these results will be lost on a non-expert reader.

      We added the following elaboration to the last paragraph of the Introduction section to motivate our benchmark against unfolding without registration (p. 3):

      “We benchmark this new method against unfolding alone, which provides some intrinsic alignment between subjects (DeKraker et al., 2018) but which we believe can be further improved with the present methods, and against more conventional 3D volumetric registration approaches.”

      We also added a Discussion paragraph on the results shown in Figure 4 which we hope helps to make these results more informative and impactful (p. 13):

      “The feature most strongly driving surface-based registration in the present study was curvature. Many neocortical surface-based registration methods focus on gyral and sulcal patterning at various levels (e.g. strong alignment of primary sulci, with weaker weighting on secondary and tertiary sulci) (Miller et al., 2021). In the present study, hippocampal gyri are variable between samples and so could perhaps be thought of as similar to tertiary neocortical gyri, and this may help explain why gyrification was not the primary driving feature in aligning hippocampal subfields. The methods used to quantify gyrification are often related to curvature, but differ across studies. In the hippocampus, unlike in the neocortex, the mouth of sulci are wide and so sulcal depth, which is often used in defining neocortical gyrification index, is not straightforward to measure. Using HippUnfold, gyrification is defined by the extent of tissue distortion between folded and unfolded space, and individual gyri/sulci are hard to resolve in unfolded gyrification maps, but are readily visible in curvature maps. Thus, hippocampal curvature may be more informative simply due to higher measurement precision. Future work could also employ measures like quantitative T1 relaxometry or other proxies of intracortical myelin content (e.g. Tardif et al., 2015; Glasser et al., 2016; Paquola et al. 2018) for hippocampal alignment, but this is not possible in cross-modal work as in the various histology stains examined here.”

      3) Finally, the application of the current work beyond the current dataset needs to be made more clear. From what I understand, the discussion says that using a multi-atlas approach with HippUnfold is unfeasible at this point. What kind of computational or technical developments need to take place in order for these labeled datasets to be used for this purpose? How can the current labeled datasets be used in other contexts?

      The question of translation to other contexts, namely, in-vivo MRI, was also raised by Reviewer 1, and as such we decided to include an additional analysis to explore this question (Supplementary Materials 1: in-vivo MRI demonstration). Validation using ground-truth subfields is not plausible in MRI, and so we show only an indirect validation of intersubject alignment based on the sharpening of group-averaged features following better alignment using the present methods. We believe this new analysis significantly clarifies the applications we have in mind for this work. See the new Supplementary Section for details, and also a summary of this analysis in the Discussion section (p. 13):

      “Ravikumar et al. (2021) recently performed flat mapping of the medial temporal lobe neocortex using a Laplace coordinate system as employed here, and showed sharpening of group-averaged images following deformable registration in unfolded space. This indirectly suggests better intersubject alignment. We perform a similar group-averaged sharpening analysis in Supplementary Materials 1: in-vivo demonstration. Though the gains in image sharpness observed here were modest, we note that current MRI resolution and automated segmentation methods allow for only imperfect hippocampal feature measures. We thus expect unfolded registration to grow in importance as MRI and segmentation methods continue to advance. “

      Multi-atlas approaches are also presently possible, but we believe HippUnfold can apply unfolding and subfield definition with even higher validity. Unfolding of the hippocampus was previously possible in-vivo but still showed limited intersubject alignment. The present work validates a novel alignment method ex-vivo, and now additionally shows that this can be translated to better alignment even at the resolution of in-vivo imaging. We hope the above new Discussion paragraph also helps to clarify this.

      4) A minor comment is that there are three panels (a,b,c) in Figure 4 but the figure legend does not describe them separately.

      We thank the Reviewer, and added a Figure legend for parts B and C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for these helpful and thoughtful comments.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • What was the nature of the 0.1 increase in pH caused by illumination in CheRiff-negative cells? Is this thought to be a temperature effect?

      The increase in pHoran4 fluorescence in CheRiff-negative cells is most likely not from a pH change; rather, it most likely reflects blue light-mediated photoactivation of the mOrange-derived chromophore in pHoran4. Similar photoartifacts have been reported in other fluorescent protein reporters (see e.g. Farhi, Samouil L., et al. "Wide-area all-optical neurophysiology in acute brain slices." Journal of Neuroscience 39.25 (2019): 4889-4908.).

      The baseline measurement in CheRiff-negative cells is to control for this type of artifact. We subtract the mean signal from the CheRiff-negative cells to correct the signals from the CheRiff-positive cells, as described in the Main Text.

      • Does Kir2.1 have a proton conductance? Was the resting pH of HEK cells changed by Kir2.1 expression? Fig 2D suggest basal pH is equivalent +/- Kir2.1 but it would be good to show that data.

      This is an interesting question which our data do not answer conclusively. Since we used an intensiometric (as opposed to ratiometric) pH indicator, our measurements only provide relative pH changes. We assumed a constant initial pH. We have revised the text to make clear that this is an assumption.

      Prior studies of pH-dependent Kir2.1 activity did not find evidence of a proton current (i.e. no change in current upon extracellular acidification), though the channel is closed by intracellular acidification. See: Ye, Wenlei, et al. "The K+ channel KIR2. 1 functions in tandem with proton influx to mediate sour taste transduction." Proceedings of the National Academy of Sciences 113.2 (2016): E229-E238. We added this information to the text.

      The pKa of pHoran4 is 7.5, so a decrease in initial pH would decrease the slope of F vs pH. We observed higher (absolute value) F/F in the Kir2.1 expressing cells than in the non-expressing cells, confirming that the Kir2.1-expressing cells had larger CheRiff-mediated acidification than the Kir2.1-negative cells (Figure 2D). Thus this conclusion remains true regardless of whether Kir2.1 has a proton conductance.

      What channels/transporter mediate proton flux in CheRiff + Kir2.1 experiments? Is the increased proton flux simply due to more H+ ions passing through CheRiff when cells are hyperpolarized or may other voltage-dependent processes effect pH?

      Fig. 2G-M address this question, specifically. We targeted the blue light in a “zebra” pattern to only activate CheRiff in a subset of cells. We then used voltage imaging to show that the induced voltage spread over a much wider area than the blue-illuminated region, due to gap junction coupling between the cells. If protons flowed through some voltage-dependent channel other than CheRiff, then we would expect the acidification to follow the voltage profile. If protons primarily flowed through the CheRiff, then we would expect the acidification to follow the illumination profile. Fig. 2K and the following quantification show clearly that the acidification followed the illumination profile, and hence the proton current was primarily through CheRiff.

      • Is Kir2.1 included in the spatial illumination experiments (Fig. 2G-M)? If so, it would be helpful to note it. The color scheme suggest it is but it would be good to note it explicitly.

      Yes. Clarified in text.

      • Why is the acidification caused by 10 second of illumination smaller in Fig 2L, as compared to the equivalent experiment in 2D? Is this due to the spatial nature of the illumination? It seems that the pH change at the site of illumination should be equivalent between these 2 experiments.

      The illumination protocol between the two experiments has different duty cycles (compare Fig. 2C and 2J), so the time-averaged intensity is different. There can also be batch-to-batch variation in CheRiff expression which would alter the proton flux and thus pH change. To control for this, comparisons were always made between batches of cells prepared together.

      • The authors used 150 second illumination to examine pH changes but only 13.5 seconds to differentiate between pH changes caused by the light-activated conductance and those secondary to depolarization. Would pH changes lose their spatial limitations if a similar 150 second illumination was used? This is important because the pH change seen in the "Blue On" region was quite small.

      Yes, protons can diffuse between cells via gap junctions, smoothing out the spatial structure of the pH over long times. See e.g. Wu, Ling, et al. "PARIS, an optogenetic method for functionally mapping gap junctions." Elife 8 (2019): e43366.

      We used a short (13.5 s) protocol specifically to distinguish CheRiff-mediated acidification from acidification via other conductances in electrically coupled neighboring cells. If we had waited for longer, lateral proton diffusion could have muddied the interpretation of these experiments.

      • How long do action potentials shown in between illuminations in Fig 4H (ChR2 3M) last following cessation of illumination?

      The closing time, τoff, of the Channelrhodopsins are shown in Table 1. The ChR2-3M has an off-time of almost 2 seconds. The duration of post-stimulus persistent firing is expected to depend on the expression level of the ChR2-3M, the strength of the optogenetic stimulus and the excitation threshold of the neurons, i.e. on how far above threshold the neuron is at the moment the blue light turns off. Thus we expect the post-stimulus firing time to be highly variable between cells and also to depend on optogenetic stimulus strength. In our experiments action potentials were observed throughout the 0.5 s dark interval between stimuli.

      • While ChR2-3M construct may have promise for therapeutic applications, those strengths limit its use or basic science applications like circuit mapping. This should be noted in the discussion.

      Ok. We now mention this in the discussion.

      • Please define EPD50 within the text of the results section.

      Ok. Fixed.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting manuscript investigating a potential limitation of optogenetic manipulation of cell excitability and its solution. The work is conducted rigorously and explained clearly. I only have minor concerns:

      I think the impact of the study could be broadened by examining additional proton permeable opsins for their effects on intracellular pH. A single assay could be used to compare different opsins to CheRiff and show that the problem of intracellular acidification is not limited to CheRiff.

      Yes, this is interesting. There are so many opsins and illumination protocols in use that we could not do an exhaustive characterization; we encourage people to test their own opsin under their conditions if doing chronic simulation. The plasmid constructs used for this work are available on Addgene.

      I am not clear on what Figure S3A is showing because I cannot see a patterning like the one shown in Fig. 2H. Perhaps a higher magnification could solve the problem.

      Figure S3A does not have the zebra-striped pattern of Figure 2H. In Fig S3A, we used just one column of illumination. The point was to test the ability of each opsin to depolarize the HEK cells. We added images of the illumination pattern and adjusted the caption to make this clear.

      When discussing the sustained photocurrent of PsCatCh2.0, a reference to Govorunova et al. J. Biol. Chem. 2013 should be added as the low extent of light induced inactivation appears to be, at least in part, a characteristic of the particular type of opsin from P. subcordiformis.

      Added reference.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Meta-cognition, and difficulty judgments specifically, is an important part of daily decision-making. When facing two competing tasks, individuals often need to make quick judgments on which task they should approach (whether their goal is to complete an easy or a difficult task).

      In the study, subjects face two perceptual tasks on the same screen. Each task is a cloud of dots with a dominating color (yellow or blue), with a varying degree of domination - so each cloud (as a representation of a task where the subject has to judge which color is dominant) can be seen an easy or a difficult task. Observing both, the subject has to decide which one is easier.

      It is well-known that choices and response times in each separate task can be described by a driftdiffusion model, where the decision maker accumulates evidence toward one of the decisions (”blue” or ”yellow”) over time, making a choice when the accumulated evidence reaches a predetermined bound. However, we do not know what happens when an individual has to make two such judgments at the same time, without actually making a choice, but simply deciding which task would have stronger evidence toward one of the options (so would be easier to solve).

      It is clear that the degree of color dominance (”color strength” in the study’s terms) of both clouds should affect the decision on which task is easier, as well as the total decision time. Experiment 1 clearly shows that color strength has a simple cumulative effect on choice: cloud 1 is more likely to be chosen if it is easier and cloud 2 is harder. Response times, however, show a more complex interactive pattern: when cloud 2 is hard, easier cloud 1 produces faster decisions. When cloud 2 is easy, easier cloud 1 produces slower decisions.

      The study explores several models that explain this effect. The best-fitting model (the Difference model is the paper’s terminology) assumes that the decision-maker accumulates evidence in both clouds simultaneously and makes a difficulty judgment as soon as the difference between the values of these decision variables reaches a certain threshold. Another potential model that provides a slightly worse fit to the data is a two-step model. First, the decision maker evaluates the dominant color of each cloud, then judges the difficulty based on this information.

      Thank you for a very good summary of our work.

      Importantly, the study explores an optimal model based on the Markov decision processes approach. This model shows a very similar qualitative pattern in RT predictions but is too complex to fit to the real data. It is hard to judge from the results of the study how the models identified above are specifically related to the optimal model - possibly, the fact that simple approaches such as the Difference model fit the data best could suggest the existence of some cognitive constraints that play a role in difficulty judgments.

      The reviewer asks “how the models identified above are specifically related to the optimal model”. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      The Difference model produces a well-defined qualitative prediction: if the dominant color of both clouds is known to the decision maker, the overall RT effect (hard-hard trials are slower than easyeasy trials) should disappear. Essentially, that turns the model into the second stage of the twostage model, where the decision maker learns the dominant colors first. The data from Experiment 2 impressively confirms that prediction and provides a good demonstration of how the model can explain the data out-of-sample with a predicted change in context.

      Overall, the study provides a very coherent and clean set of predictions and analyses that advance our understanding of meta-cognition. The field would benefit from further exploration of differences between the models presented and new competing predictions (for instance, exploring how the sequential presentation of stimuli or attentional behavior can impact such judgments). Finally, the study provides a solid foundation for future neuroimaging investigations.

      Thank you for your positive comments and suggestions.

      Reviewer #2 (Public Review):

      Starting from the observation that difficulty estimation lies at the core of human cognition, the authors acknowledge that despite extensive work focusing on the computational mechanisms of decision-making, little is known about how subjective judgments of task difficulty are made. Instantiating the question with a perceptual decision-making task, the authors found that how humans pick the easiest of two stimuli, and how quickly these difficulty judgments are made, are best described by a simple evidence accumulation model. In this model, perceptual evidence of concurrent stimuli is accumulated and difficulty is determined by the difference between the absolute values of decision variables corresponding to each stimulus, combined with a threshold crossing mechanism. Altogether, these results strengthen the success of evidence accumulation models, and more broadly sequential sampling models, in describing human decision-making, now extending it to judgments of difficulty.

      The manuscript addresses a timely question and is very well written, with its goals, methods and findings clearly explained and directly relating to each other. The authors are specialists in evidence accumulation tasks and models. Their modelling of human behaviour within this framework is state-of-the-art. In particular, their model comparison is guided by qualitative signatures which are diagnostic to tease apart the different models (e.g., the RT criss-cross pattern). Human behaviour is then inspected for these signatures, instead of relying exclusively on quantitative comparison of goodness-of-fit metrics. This work will likely have a wide impact in the field of decisionmaking, and this across species. It will echo in particular with many other studies relying on the similar theoretical account of behaviour (evidence accumulation).

      Thank you for these generous comments.

      A few points nevertheless came to my attention while reading the manuscript, which the authors might find useful to answer or address in a new version of their manuscript.

      1) The authors acknowledge that difficulty estimation occurs notably before exploration (e.g., attempting a new recipe) or learning (e.g., learning a new musical piece) situations. Motivated by the fact that naturalistic tasks make difficult the identification of the inference process underlying difficulty judgments, the authors instead chose a simple perceptual decision-making task to address their question. While I generally agree with the authors’s general diagnostic, I am nevertheless concerned so as to whether the task really captures the cognitive process of interest as described in the introduction. As coined by the authors themselves, the main function of prospective difficulty judgment is to select a task which will then ultimately be performed, or reject one which won’t. However, in the task presented here, participants are asked to produce difficulty judgments without those judgements actually impacting the future in the task. A feature thus key to difficulty judgments thus seems lacking from the task. Furthermore, the trial-by-trial feedback provided to participants also likely differ from difficulty judgments made in real world. This comment is probably difficult to address but it might generally be useful to discuss the limitations of the task, in particular in probing the desired cognitive process as described in introduction. Currently, no limitations are discussed.

      We have added a Limitations paragraph to the Discussion and one item we deal with is the generalization of the model to more complex tasks (line 539).

      2) The authors take their findings as the general indication that humans rely on accumulation evidence mechanisms to probe the difficulty of perceptual decisions. I would probably have been slightly more cautious in excluding alternative explanations. First, only accumulation models are compared. It is thus simply not possible to reach a different conclusion. Second, even though it is particularly compelling to see untested predictions from the winning model in experiment #1 to be directly tested, and validated in a second experiment, that second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014). Based on these different observations, I would thus have interpreted the results of the study with a bit more caution and avoided concluding too widely about the generality of the findings.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 sessions making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) As suggested, we have now calculated exceedance probabilities for the 4 models which gives[0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      3) Deriving and describing the optimal model of the task was particularly appreciated. It was however a bit disappointing not to see how well the optimal model explains participants behaviour and whether it does so better than the other considered models. Also, it would have been helpful to see how close each of the 4 models compared in Figures 2 & 3 get to the optimal solution. Note however that neither of these comments are needed to support the authors’ claims.

      The reviewer asks how close each of the four models is to the optimal solution. We did fit the four models to simulations of the optimal model and found that the Difference model was the closest. However, we did not fit the parameters of the optimal model to the data (no easy feat given the complexity of the model) as the experiment was not designed to incentivize maximization of the reward rate and fitting would have been computationally laborious. We therefore focused on the qualitative features of the optimal model and how they compare to our models. We now also include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it qualitatively to the Difference model.

      4) The authors compared the difficulty vs. color judgment conditions to conclude that the accumulation process subtending difficulty judgements is partly distinct from the accumulation process leading to perceptual decisions themselves. To do so, they directly compared reaction times obtained in these two conditions (e.g. ”in other cases, the two perceptual decisions are almost certainly completed before the difficulty decision”). However, I find it difficult to directly compare the ’color’ and ’difficulty’ conditions as the latter entails a single stimulus while the former comprises two stimuli. Any reaction-time difference between conditions could thus I believe only follow from asymmetric perceptual/cognitive load between conditions (at least in the sense RT-color < RT-difficulty). One alternative could have been to present two stimuli in the ’color’ condition as well, and asking participants to judge both (or probe which to judge later in the trial). Implementing this now would however require to run a whole new experiment which is likely too demanding. Perhaps the authors could instead also acknowledge that this a critical difference between their conditions, which makes direct comparison difficult.

      We feel we can rule out that participants make color decisions (as in the color task) to make difficulty decisions. For example, making a color choice for 0% color strength takes longer than a difficulty choice for 0:52% color strengths. Thus, the difficulty judgment does not require completion of the color decisions. Therefore, average reaction time for a single color patch (C𝑆1) can be longer than the reaction time for the difficulty task which contains the same coherence (C𝑆1) for one of the patches. This is true despite the difficulty decision requiring monitoring of two patches (which might be expected to be slower than monitoring one patch). We have added this in to the Discussion at line 449.

      Reviewer #3 (Public Review):

      The manuscript presents novel findings regarding the metacognitive judgment of difficulty of perceptual decisions. In the main task, subjects accumulated evidence over time about two patches of random dot motion, and were asked to report for which patch it would be easier to make a decision about its dominant color, while not explicitly making such decision(s). Using 4 models of difficulty decisions, the authors demonstrate that the reaction time of these decisions are not solely governed by the difference in difficulties between patches (i.e., difference in stimulus strength), but (also) by the difference in absolute accumulated evidence for color judgment of the two stimuli. In an additional experiment, the authors eliminated part of the uncertainty by informing participants about the dominant color of the two stimuli. In this case, reaction times were faster compared to the original task, and only depended on the difference between stimulus strength.

      Overall, the paper is very well written, figures and illustrations clearly and adequately accompanied the text, and the method and modeling are rigor.

      The weakness of the paper is that it does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus. One may claim that an observer makes an implicit color decision about each stimulus, and then compares the confidence levels about the correctness of the decisions. This concern is reflected in the paper in several ways:

      We tested a Difference in confidence model (line 315) in the orginal paper and showed it was inferior to the Difference model. We did this for experiment 2, RT task so that we could fit the unknown color condition and try to predict the known color condition. To emphasize this model (which we think the reviewer may have missed) we have moved the supplementary figure to the main results (now Fig. 6) as we think it is very cool that we were able to discard the confidence model.

      When comparing the confidence model to the Difference we found the difference model was pre-Δ ferred with BIC of 38, 56, 47. We are unsure why the reviewer feels this “does not provide sufficient evidence to rule out the possibility that judging the difficulty of a decision may actually be comparing between levels of confidence about the dominant color of each stimulus.” We regard this as strong evidence.

      1) It is not clear what were the actual instructors to the participants, as two different phrasings appear in the methods: one instructs participants to indicate which stimulus is the easier one and the other instructs them to indicate the patch with the stronger color dominance. If both instructions are the same, it can be assumed that knowing the dominant color of each patch is in fact solving the task, and no judgment of difficulty needs to be made (perhaps a confidence estimation). Since this is not a classical perceptual task where subjects need to address a certain feature of the stimuli, but rather to judge their difficulties, it is important to make it clear.

      We now include the precise words used to instruct the participant (line 604): “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      Knowing both colors or the dominant color is not sufficient to solve the task. Knowing both are yellow does not tell you which has more yellow which is what you need to estimate to solve the task. Again, we tested a confidence model in the original version of the paper and showed it was a poor model compared to the Difference model.

      2) Two step model: two issues are a bit puzzling in this model. First, if an observer reaches a decision about the dominant color of each patch, does it mean one has made a color decision about the patches? If so, why should more evidence be accumulated? This may also support the possibility that this is a ”post decision” confidence judgment rather than a ”pre decision” difficulty judgment. Second, the authors assume the time it takes to reach a decision about the dominant color for both patches are equal, i.e., the boundaries for the ”mini decision” are symmetrical. However, it would make sense to assume that patches with lower strength would require a longer time to reach the boundaries.

      In the Two-step model we assume a mini decision is made for the color of each stimulus. However, the assumption is that this is made with a low bound so it is not a full decision as in a typical color decision. Again estimating the colors from the mini decision does not tell you which is easier so you need to accumulate more evidence to make this judgment. In fact the Race model is a version of the two step in which no further accumulation is made after the initial decision and this model fits poorly (we now explain this on line 185). We assume for simplicity that the first stimulus to cross a bound triggers both mini color decisions. So although the bounds are equal the one with stronger color dominance is more likely to hit the bound first.

      We have already addressed this concern about the comparison with confidence above.

      3) Experiment 2: the modification of the Difference model to fit the known condition (Figure 5b),can also be conceptualized as the two-step model, excluding the ”mini” color decision time. These two models (Difference model with known color; two-step model) only differ from each other in a way that in the former the color is known in advance, and in the second, the subject has to infer it. One may wonder if the difference in patterns between the two (Figure 3C vs. Figure 6B) is only due to the inaccuracies of inferring the dominant color in the two-step model.

      In Experiment 2 the participant is explicitly informed as to the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      As the reviewer suggests, differences in predictions between the Difference and Two-step arise from trials in which there is a mismatch between the inferred dominant colors from the two-step model and the color associated with the final DVs in the Difference model. We now explain this on line 187. We do not see this as a problem of any sort but just defines the difference between the models. Note that the new exceedance analysis now strongly supports the Difference model as the most common model among the participants.

      An additional concern is about the controlled duration task: Why were these specific durations chosen (0.1-1.65 sec; only a single duration was larger than 1sec), given the much longer reaction times in the main task (Experiment 1), which were all larger on average than 1sec? This seems a bit like an odd choice. Additionally, difficulty decision accuracies in this version of the task differ between known and unknown conditions (Figure 7), while in the reaction time version of the same task there were no detectable differences in performance between known and unknown conditions (Figure 6C), just in the reaction times. This discrepancy is not sufficiently explained in the manuscript. Could this be explained by the short trial durations?

      The reviewer asks about the choice of stimulus durations in Experiment 2. First, RTs in Experiment 1 do not only reflect the time needed to make decisions but also contain non-decision times (0.23-0.47 s). So to compare decision time in RT and controlled duration experiment one must subtract the non-decision time from the RTs (the non-decision time is not relevant to the controlled duration experiment). Second, the model specifically predicts that differences in performance between the known and unknown color dominance conditions are largest for short duration stimulus presentation trials (see Fig. 7). We explain this on line 346. For long durations, performance pretty much plateaus, and many decisions have already terminated (Kiani 2008). We sample stimulus durations from a discrete truncated exponential distribution to get roughly equal changes in accuracy between consecutive durations (which we now explain at line 345).

      Group consensus review

      The reviewers have discussed with each other, and they have discussed a series of revisions which, if carried out, would make their evaluation of your paper even more positive. I outline them below in case you would be interested in revising your paper based on these reviews. You will see below that the reviewers share overall a quite positive evaluation of your study. All three limitations described in the Public Reviews could be addressed explicitly in the discussion which for the moment is limited to description and generalization of findings.

      1) The model selection procedure should be amended and strengthened to provide clearer results. As noted by one of the reviewers during the consultation session, ”the Difference model just barely wins over the two-step model, and the two-step model might produce the same prediction for the next experiment.” You will also see below that Reviewer #2 provides guidance to improve the model selection process: ”[...] the second experiment presents data from only 3 participants (1 of which has slightly different behaviour than the 2 others), thereby limiting the generality of the findings. Third, the winning model in experiment #1 (difference model) is the preferred model on 12 participants, out of the 20 tested ones. Fourth, the raw BIC values are compared against each other in absolute terms without relying on significance testing of the differences in model frequency within the sample of participants (e.g., using exceedance probabilities; see Stephan et al., 2009 and Rigoux et al., 2014).” Altogether, model selection appears currently to be the ’weakest’ part of the paper (Difference model vs. Two-step model, model comparison, how to better incorporate the optional model with the other parts). It would be great if you would improve this section of the Results.

      Thank you for these suggestions.

      i) We have now make it clear in the Results (line 126) that all four models we examine are accumu-lation models. In addition, we have added a paragraph on Limitations (line 530) in the Discussion where we explain why we only consider accumulation models and acknowledge that there are other non-accumulation models.

      ii) Each of three participants in Experiment 2 performed 18 session making it a large and valuabledataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      iii) We have now calculated exceedance probabilities for the 4 models which gave [0,0.97,0.03,0]. This shows that there is a 0.97 probability of the Difference model being the most frequent and only a 0.03 probability of the two-step model. We have included this in the results on line 237.

      2) All reviewers have noted that the relation of the optimal model with the human data and theother models should be clarified and discussed in a revised version of the manuscript. You will find their specific comments in their individual reviews, appended below.

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      3) Finally, the exclusion strategy is also unclear at the moment and should be clarified and discussed explicitly somewhere in a revised version of the manuscript. Reviewers were wondering why so many participants were excluded from Experiment 1, and only 3 participants were included in Experiment 2. This should also be clarified better in the manuscript.

      We have clarified the exclusion criteria in the Methods at line 651 as a new subsection.

      The data quality problem with MTurk is well documented (Chmielewski, M & Kucker SC. 2020. An MTurk Crisis? Shifts in Data Quality and the Impact on Study Results. Social Psychological and Personality Science, 11, 464-473). Given that this was an online experiment on MTurk, it is hard to know exactly why some participants showed low accuracy, but it’s likely that some may have misunderstood the instructions in the difficulty task or they may have been unmotivated to do well in this highly repetitive task. Either reason would be problematic for our model comparisons that are based on choice-RT patterns. Note that the cut-offs we chose for inclusion were purely based on accuracy, whereas the modeling approach considered RTs, which importantly were not used as a inclusion criterion (see revised methods). Moreover, accuracy cut-offs were fairly lenient and mainly aimed to exclude participants who appeared to be guessing/misunderstood instructions (for reference: mean sensitivity of participants who were included was 2x higher than the cut-offs we used).

      Each of three participants in Experiment 2 performed 18 session making it a large and valuable dataset necessary to test our hypothesis. We have now included a mention of the the small number of participants in Experiment 2 in a Limitations paragraph in the Discussion (line 539).

      Reviewer #1 (Recommendations For The Authors):

      Thank you for an excellent paper, I enjoyed reading it a lot. I have a few questions that could potentially clarify some aspects for the reader.

      (1) It seems from the model fit plots (Figure 3) that the RT predictions of the model tend to overshoot in cases where one of the clouds is very easy. Could you include potential interpretations of this effect?

      We assume the reviewer is examining the Difference Model (i.e. the preferred model) panel when commenting on the overshoot. It is true the predictions for the highest coherence (bottom purple line) for RT is above the data but it is barely outside the data errorbars of 1 s.e. To be honest we regard this as a pretty good fit and would not want to over-interpret this small mismatch.

      (2) On page 4, around line 121, the study discusses the ”criss-crossing” effect in the RT data. You mention that the fact that RTs are long in hard-hard trials compared to easy-easy trials could be important here: ”These tendencies lead to a criss-cross pattern..”. It is confusing since, for instance, the race model does not have a criss-cross, but still exhibits the overall effect. I was intrigued bythe criss-crossing, and after some quick simulations, I found that the equation RT2 ∗ = 2 − 2 ∗ Cs12 − Cs22 + 6 ∗ (Cs1 ∗ Cs2)2 can (very roughly) replicate Figure 1d (bottom panel), so it seems that the criss-crossing effect must be produced by some interactive effect of color strengths on RTs. I wonder if you could provide a better explanation of how this interactive effect is generated by the model, given that it is the main interesting finding in the data. I believe at this point the intuition is not well-outlined.

      The criss cross arises through an interaction of the coherences as the reviewer suspects. That is, for the Difference model the RT related to abs(|Coh1|- |Coh2|). If we replace the first abs with a square we get

      |coh1|2 + |coh2|2 − 2|coh1||coh2|

      The larger this is, the smaller the RT so

      RT = constant − coh12 − coh22 + 2|coh1||coh2|

      which is very similar to the formula the reviewer mentions.

      We now supply an intuition as to why the criss-cross arises in the Difference model (line 167). We do not get a criss-cross in the race model, because there the RT is determined by the Race that that reaches a bound first. Because the races are independent, RTs will be fastest when coherence is high for either stimuli.

      (3) Am I wrong in my intuition that the two-step model would produce very similar predictions as the Difference model for Experiment 2? It would be great to discuss that either way since the twostep model seems to produce very close quantitative and pretty much the same qualitative fit to the data of Experiment 1.

      In Experiment 2 the participant is explicitly informed about the color dominance of both stimuli. Therefore, assuming the two-step model skips the first step and uses this explicit information in the second step, the difference and two-step model are identical for modeling Experiment 2. We explain this now on line 277.

      (4) The inclusion of the optimal model is great. It would be beneficial to provide some more connections to the rest of the paper here. Would this model produce similar predictions for Experiment 2, for instance?

      We now include the optimal model for the known color dominance RT experiment (line 420). We have also added a new paragraph in the Discussion on the optimal model at line 503 comparing it to the Difference model.

      (5) In the Methods, it is quite striking that out of 51 original participants, most were excluded and only 20 were studied. It is not easy to trace through this section why and how and who was excluded, so it would be great if this information was organized and presented more clearly.

      We have clarified this in the Methods at line 651 as a new subsection in the Methods. We also explain that exclusion was not made on RT data which is our main focus in the models.

      Reviewer #2 (Recommendations For The Authors):

      • As detailed in the ’public review’, a more cautious discussion, notably delineating the limitations of the study would be appreciated.

      • In their models, the authors assume that participants sequentially allocate attention between the two stimuli, alternating between them. Did the authors test this assumption and did they consider the possibility that participants could sample from both stimuli in parallel? In particular, does the conclusion of the model comparison also holds under this parallel processing assumption?

      Our results are not affected by whether participants sample the stimulus sequentially through alternation or in a parallel manner (Kang et al., 2021). What does change is the parameters of the model (but not their predictions/fits). In the parallel model, information is acquired at twice the rate of the serial model. We can, therefore, obtain the parameters of parallel models (that has serial and parallel models): 𝜅𝑝 = 𝜅𝑠/√2, 𝑢𝑝 = 𝑢𝑠√2, 𝑎𝑝 = 𝑎𝑠/2 and 𝑑𝑝 = 2𝑑𝑠 (Eq. 2). We now explain𝑠 𝑝 identical predictions to the serial model) directly from the parameters of the current sequential models simply by adjusting the parameters that depend on the time scale (subscripts and for this on line 518.

      • I found the small paragraph corresponding to lines 193-196 particularly difficult to understand. If the authors could think of a better way to phrase their claim, it would probably help.

      We have rewritten this paragraph at line 211

      • I found a type on line 122: ”wheres” instead of ”whereas”.

      Corrected

      • I found a type on line 181: ”or” instead of ”of”.

      Yes corrected

      • Figure #2 is extremely useful in understanding the models and their differences, make sure it remains after addressing the reviews!

      Thank you, this figure is retained.

      Reviewer #3 (Recommendations For The Authors):

      All comments are detailed in the public review, with some clarifications here:

      1) The confusing instructions to the participants are detailed here: under ”overview of experimental tasks” in the methods it says: ”They were instructed... to indicate whether the left or right stimulus was the easier one” (line 520), and below it ”they were required to indicate which patch had the stronger color dominance...” (line 524).

      We have clarified the instructions by providing the actual text displayed to participants in the methods and have ensured consistency in the method to talk about judging the easier stimulus (line 604).

      The instructions were “Your task is to judge which patch has a stronger majority of yellow or blue dots. In other words: For which patch do you find it easier to decide what the dominant color is? It does not matter what the dominant color of the easier patch is (i.e., whether it is yellow or blue). All that matters is whether the left or right patch is easier to decide”.

      2) Minor comments: Line 76: ”that” should be ”than”.

      Thanks, corrected

      Line 574: ”variable duration task” means ”controlled duration task”?

      Yes, corrected

      Line 151: ”or” should be ”of”.

      Corrected

    1. On the basic level of perception and categorization, finally, theinfluence exerted by culture remains rather subtle, but may bepervasive nonetheless. Here, our cognitive processor is trainedby the constant confrontation with information input—eitherdirectly from an environment shaped by cultural activities, orindirectly through the spectacles of a linguistic taxonomy. Whilethe latter has been subject to intense research and continuousdebate (reviewed in Enfield, 2015), the former has been largelyneglected, which is why even the hypotheses on how exactlyculture affects perception have remained speculative.

      Our cognitive processing is constantly shaped by exposure to cultural information, either directly from our environment or indirectly through the lens of language. People have talked a lot about how the language we use affects the way we think. But not many people have paid as much attention to how the culture we live in and the things around us might also change the way we see and understand things.

    1. Author Response

      Reviewer #1 (Public Review):

      Mano et. al. use a combination of behavioral, genetic silencing, and functional imaging experiments to explore the temporal properties of the optomotor response in Drosophila. They find a previously unreported inversion of the behavior under high contrast and luminance conditions and identify potential pathways mediating the effect.

      Strengths:

      Quantifications of optomotor behavior have been performed for many decades. Despite a large number of previous studies, the authors still find something fundamentally novel: under high contrast conditions and extended stimulation periods, the behavior becomes dynamic over time. The turning response shows an initial transient positive following response. The amplitude of the behavior then decreases and even inverts such that animals show an anti-directional rotation response. The authors systematically explore the stimulation feature space, including large ranges of spatial and temporal frequencies and conditions with high and low contrast. They also test two wild-type fly species and even compare experiments across two different labs and setups. From these data, it seems clear that the behavior is robust and largely depends on the brightness of the stimulation, rearing conditions, and genetic background. The authors discuss that these effects have not clearly been reported elsewhere beforehand, and convincingly argue why this may be the case.

      In general, the presented behavioral quantifications illustrate the importance of further experimental studies of the temporal dynamics of behavior in response to dynamically varying stimulus features, across different stimulus types, genetic backgrounds, and model animal systems. It also illustrates the importance of relating the conditions that animals experience in the laboratory to the ones they would experience in the wild. As the authors mention, the brightness during a sunny day can reach values as high as 4000 cd/m2, while experimental stimulation in the lab has so far often been orders of magnitude below that.

      The study then systematically explores potential neural elements involved in the behavior. Through a set of silencing experiments, they find that T4 and T5 neurons, as expected, are required for motion behaviors. On the other hand, silencing HS cells largely abolishes the 'classical' syn-directional response but leaves anti-directional turning intact. On the other hand, silencing CH cells abolishes the anti-directional response but leaves the syn-directional behavior intact. Through functional imaging in T4, T5, HS, and CH neurons, the authors could show that none of these neurons shows a response inversion depending on contrast level. Together, these experiments nicely illustrate that the dynamics do not seem to be computed within the early parts of visual processing, but they must happen on the level of the lobula plate or further downstream.

      Weaknesses:

      While the authors have already explored various parameters of the experiment, it would have been nice to see additional experiments regarding the initial adaptation phase. The experiments in Figure 2e, where the authors show front-to-back or back-to-front gratings before the rotation phase, are a good start. What would the behavioral dynamics look like if they had exposed animals to long periods of static high or low contrast gratings, whole field brightness, or darkness? Such experiments would surely help to better understand the stimulus features on which the adaptation elements operate. It would be interesting to explore to what degree such static stimuli impact the subsequent behavioral dynamics.

      To address this question, we have added a new adaption condition, in which a high contrast, stationary sinusoidal grating is presented for 5 seconds before the high contrast rotational stimulus is presented (new Figure 2 – Supp. Fig. 1). We find that the turning looks identical to the case of a gray adapter. These results drive home the point that the direction of motion of the adapter is what matters most.

      Given the dynamics of the behavior, it would probably also be worth looking at the turning dynamics after the stimulus has stopped. If direction-selective adaptation mechanisms are regulating the turning response, one may find long-lasting biases even in the absence of stimulation. If the authors have more data after the stimulus end, it would be good to further expand the time range by a few seconds to show if this is the case or not (for example, in Figure 1b).

      We now show these dynamics in Figure 1. See Essential Revision #1.

      Another important experiment could be to initially perform experiments in a closed-loop configuration, and then quickly switch to open-loop. The closed-loop configuration should allow the motion computing circuitry to adapt to the chosen environmental conditions. Explorations of the changes in turning response dynamics after such treatments should then enable further dissections of the mechanisms of adaptation. Closed-loop experiments under different contrast conditions have already been performed (for example, Leonhardt et al. 2016), which also showed complex response dynamics after stimulus on- and offset. It would be great to discuss the current open-loop experiments, and maybe some new closed-loop results, in relation to the previous work.

      We have performed these suggested experiments; please see Essential Revision #2.

      The authors mention the different rearing conditions, and there is one experiment in Figure S2 which mentions running experiments at 25 deg C. But it is not clear from the Methods at which temperature all other experiments have been performed. It is also not clear at which temperature the shibire block experiments were performed. As such experiments require elevated temperatures, I assume that all behavioral experiments have been performed at such levels? How high were those?

      Our apologies for leaving out this important information. In DAC’s lab, behavioral experiments are run at 34-36ºC in a room maintaining ~50% relative humidity (this yields ~25% RH in the box with the experiment, as we now note in the methods). These conditions yield high quality, reproducible behavior, especially since this temperature elicits strong walking behavior. In TRC’s lab, behavioral experiments are similarly run at 34ºC in a room maintaining ~50% relative humidity (similarly with ~25% RH in the experimental box), for similar reasons. We have now added these details to the methods sections for each lab’s behavioral experiments.

      What does the fly see before and after the stimulus (i.e. the gray boxes in all figures)? Are these periods of homogenous gray levels or are these non-moving gratings with the luminance and contrast of the subsequent stimulus? It would be important to add this information to the methods and to the figure illustrations or legends.

      In the figures, gray is a uniform luminance screen that appears before and after the stimuli, with luminance matched to the mean stimulus luminance. We have now included this in the methods section where we describe how stimuli were generated in each lab.

      It would be nice to discuss the potential location where the motion adaptation may be implemented in the brain. A small model scheme as an additional figure could further help to discuss how such computations may be mechanistically implemented, helping readers to think about future experimental dissections of the behavior.

      Following this suggestion, we have created a diagram that shows a potential mechanistic implementation of the behavior observed, and summarizes our results (new Figure 6 – Supp. Fig. 2). There are many other possible alternatives that we do not show, including exactly how an opposing signal could ramp up under the conditions of these experiments. In the figure caption, we remind readers what locations have been excluded for this sort of computation. We reference this diagram where we discuss subtraction in the Discussion.

      For setting up similar experiments in other labs, the authors need to better describe how they measured the luminance of the arena. Do they simply report the brightness delivered by the Lightcrafter system, or did they measure this with a lux-meter? If so, at which distance was the measurement performed and with which device? Given that the behavior is sensitive to the specific properties of the stimulus, it will be important to report these numbers carefully to enable other groups to reproduce effects.

      In brief, since these are rear projection screens, we can easily measure light intensity by placing a power meter in front of the screen. This gives us the photon flux in watts, which can be converted to lumens by a standard conversion and then into candelas by making the approximation that our screen scatters into 2π steradians. Dividing by the sensor area gives us our desired candelas per square-meter. We have now added this methodology to the methods section.

    1. we think that even newborn infants may have innate intuitive theories and those theories are subject to revision even in infancy itself

      I support the statement that newborn infants may have innate intuitive theories that are subject to revision even in infancy itself. Infants are born with certain cognitive mechanisms and innate knowledge that help them make sense of the world from a very early age. These intuitive theories, such as those related to object permanence or basic physics, serve as the foundation for their understanding of the environment. However, as infants interact with the world, they continually refine and revise these theories based on their experiences and observations. This process of theory revision is a fundamental aspect of cognitive development, demonstrating the remarkable adaptability and learning capabilities of even the youngest humans.

      References Moore, M. K., & Meltzoff, A. N. (1999, November). New findings on Object permanence: A developmental difference between two types of occlusion. The British journal of developmental psychology.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important manuscript reveals signatures of co-evolution of two nucleosome remodeling factors, Lsh/HELLS and CDCA7, which are involved in the regulation of eukaryotic DNA methylation. The results suggest that the roles for the two factors in DNA methylation maintenance pathways can be traced back to the last eukaryotic common ancestor and that the CDC7A-HELLS-DNMT axis shaped the evolutionary retention of DNA methylation in eukaryotes. The evolutionary analyses are solid, although more refined phylogenetic approaches could have strengthened some of the claims. Overall, this study should be useful for researchers studying DNA methylation pathways in different organisms, and it should be of general interest to colleagues in the fields of evolutionary biology, chromatin biology and genome biology.

      We sincerely appreciate constructive comments and suggestions by the reviewers and a fair and accurate summary by the monitoring editor. Below we made point-by-point responses to reviewers’ comments.

      Reviewer #1 (Public Review):

      Overall, I find the work performed by the authors very interesting. However, the authors have not always included literature that seems relevant to their study. For instance, I do not understand why two papers Dunican et al 2013 and Dunican et al 2015, which provide important insight into Lsh/HELLS function in mouse, frog and fish were not cited. It is also important that the authors are specific about what is known and in particular about what is not known about CDCA7 function in DNA methylation regulation. Unless I am mistaken, there is currently only one study (Velasco et al 2018) investigating the effect of CDCA7 disruption on DNA methylation levels (in ICF3 patient lymphoblastoid cell lines) on a genome-wide scale (Illumina 450K arrays). Unoki et al 2019 report that CDCA7 and HELLS gene knockout in human HEK293T cells moderately and extremely reduces DNA methylation levels at pericentromeric satellite-2 and centromeric alpha-satellite repeats, respectively. No other loci were investigated, and it is therefore not known whether a CDCA7-associated maintenance methylation phenotype extends beyond (peri)centromeric satellites. Thijssen et al performed siRNA- mediated knockdown experiments in mouse embryonic fibroblasts (differentiated cells) and showed that lower levels of Zbtb24, Cdca7 and Hells protein correlate with reduced minor satellite repeat methylation, thereby implicating these factors in mouse minor satellite repeat DNA methylation maintenance. Furthermore, studies that demonstrate a HELLS-CDCA7 interaction are currently limited to Xenopus egg extract (Jenness et al 2018) and the human HEK293 cell line (Unoki et al 2019). Whether such an interaction exists in any other organism and is of relevance to DNA methylation mechanisms remains to be determined. Therefore, in my opinion, the conclusion that "Our co- evolution analysis suggests that DNA methylation-related functionalities of CDCA7 and HELLS are inherited from LECA" should be softened, as the evidence for this scenario is not very compelling and seems premature in the absence of molecular data from more species.

      We appreciate this reviewer’s thorough reading of our manuscript.

      Regarding the citation issues, we will cite Dunican 2013 and Dunican 2015. In addition, we went through the manuscript to update the citations.

      As pointed out by the reviewer, the role of CDCA7 in genome DNA methylation was extensively studied in Velasco et al 2018. The result, together with Thijssen et al (2015), and Unoki et al. (2018), supports the idea that ZBTB24, CDCA7 and HELLS act within the same pathway to promote DNA methylation, the pattern of which is overlapping but distinct from DNMT3B-mediated methylation. This observation suggests that a ZBTB24- CDCA7-HELLS mechanism for DNA methylation may involve an alternative DNMT. Interestingly, our analysis of the gene presence-absence pattern revealed that the presence of CDCA7 coincides with DNMT1 more than DNMT3 genes. Indeed, while CDCA7 is lost from diverse branches of eukaryote species, genomes encoding CDCA7 always encode HELLS, and almost always encode DNMT1. Based on this observation, we speculate the role of CDCA7 is tightly linked to HELLS and DNA methylation throughout evolution.

      As pointed out by Reviewer 1, the link between CDCA7, HELLS and DNA methylation has not been determined experimentally across these species. However, based on our previously published and unpublished data, we are confident about the functional interaction between CDCA7 and HELLS in Xenopus laevis and Homo sapiens.

      Furthermore, the importance of HELLS homologs in DNA methylation has been extensively studied in human, mice and plants. We hope our current study will motivate the field to experimentally test the evolutionary conservation of HELLS-CDCA7 interaction, as well as their importance in DNA methylation, in other species.

      The authors used BLAST searches to characterize the evolutionary conservation of CDCA7 family proteins in vertebrates. From Figure 2A, it seems that they identify a LEDGF binding motif in CDCA7/JPO1. Is this correct and if yes, could you please elaborate and show this result? This is interesting and important to clarify because previous literature (Tesina et al 2015) reports a LEDGF binding motif only in CDCA7L/JPO2.

      We searched for a LEDGF binding motif ({E/D}-X-E-X-F-X-G-F, also known as IBM described in Tesina et al 2015) in vertebrate CDCA7 proteins, and reported their positions in Figure 2A. Examples of identified LEDGF-binding motifs are now presented in Fig. 2C.

      To provide evidence for a potential evolutionary co-selection of CDCA7, HELLS and the DNA methyltransferases (DNMTs) the authors performed CoPAP analysis. Throughout the manuscript, it is unclear to me what the authors mean when referring to "DNMT3". In the Material and Methods section, the authors mention that human DNMT3A was used in BLAST searches to identify proteins with DNA methyltransferase domains. Does this mean that "DNMT3" should be DNMT3A? And if yes, should "DNMT3" be corrected to "DNMT3A"? Is there a reason that "DNMT3A" was chosen for the BLAST searches?

      As described in the Methods section, both Human DNMT1 and DNMT3A were used to initially identify any proteins containing a domain homologous to the DNA methyltransferase catalytic domain. Within Metazoa, if their orthologs exist, the top hit from BLAST search using human DNMT1 and DNMT3A show E-value 0.0, and thus their orthology is robust. This is even true for DNMT1 and DNMT3 homologs in the sponge Amphimedon queenslandica, which is one of the earliest-branching metazoan species. For other DNMTs, such as DNMT2, DNMT4, DNMT5, DNMT6, we conducted separate BLAST searches using those proteins as baits as described in Methods. The methyltransferase domain was then isolated using the NCBI conserved domains search. The selected DNMT domain sequences were aligned with CLUSTALW to generate a phylogenetic tree to further classify DNMTs. In response to reviewer #2’s comments, we also generated another multi-sequence alignment of DNMTs using MUSCLE v5 and conducted maximum-likelihood-based phylogenetic tree assembly using IQ-TREE (new Fig. S6). The overall topology of these trees is consistent except for orphan DNMTs. It has been suggested that vertebrate DNMT3A and DNMT3B are derived from duplication of a DNMT3 gene of chordates ancestor (e.g., Liu et al 2020, PMID 31969623). As such many invertebrates encode only one DNMT3. As previously shown (Yaari et al., 2019, PMID 30962443), plants have two distinct DNMT3-like protein family, the ‘true DNMT3’ and DRM, the plant specific de novo DNMT that is often considered to be a DNMT3 homolog (see Reviewer 2’s comment). Our phylogenetic analysis successfully deviated the clade of DNMT3 and DRM from the rest of DNMTs (Figure S6). Yaari et al noted that PpDNMT3a and PpDNMT3b, the two DNMT3 orthologs encoded by the basal plant Physcomitrella patens, are not orthologs of mammalian DNMT3A and DNMT3B, respectively. Therefore, to minimize such nomenclature confusions, any DNMTs that belong to either the DNMT3 or DRM clades indicated in Figure S6 are collectively referred to as ‘DNMT3’ throughout the paper (see Figure S2 for overview).

      CoPAP analysis revealed that CDCA7 and HELLS are dynamically lost in the Hymenoptera clade and either co-occurs with DNMT3 or DNMT1/UHRF1 loss, which seems important. Unfortunately, the authors do not provide sufficient information in their figures or supplementary data about what is already known regarding DNA methylation levels in the different Hymenoptera species to further consider a potential impact of this observation. What is "the DNA methylation status" of all these organisms? This information cannot be easily retrieved from Table S2. A clearer presentation of what is actually known already would improve this paragraph.

      As the DNA methylation status of the species in the Hymenoptera clade has not been comprehensively tested, we initially did not include this information to Figure 7. However, during the course of the revision, we realized that Bewick et al.2017 (PMID 28025279) reported that DNA methylation is absent from the braconid wasp Aphidius ervi. We originally conducted synteny analysis on Aphidius gifuensis, which has a chromosome-level genome assembly with annotated proteins available in NCBI, whereas annotated proteins for Aphidius ervi protein are not available in NCBI. By conducting tBLASTn search against the Aphidius ervi genome, we now found that the presence/absence pattern of CDCA7, HELLS, DNMT1, DNMT3 and UHRF1 in Aphidius ervi is identical to that of Aphidius gifuensis, with a caveat that genome assembly of Aphidius ervi is at scaffold-level. In other words, DNA methylation, DNMT1 and CDCA7 are absent in Aphidius ervi, where 5mC is undetectable. Additionally, we also realized that the DNA methylation status reported for some species in Bewick et al. 2017 was inferred from the CpG frequency instead of the direct experimental detection of methylated cytosines. Therefore, we have amended Table S3 to indicate the presence of 5mC only for those species where this was experimentally tested. As such, we now consider the DNA methylation status of Fopius arisanus, which lacks DNMT1 and CDCA7, to be unknown.

      Altogether, among the 17 Hymenoptera species that we analyzed (listed in the amended Table S3), the 8 species that have detectable DNA methylation all encode CDCA7, whereas the 2 species that do not have detectable DNA methylation lack CDCA7. We will note this finding in the revised text, and include the known 5mC status in the new Figure 7.

      Furthermore, A. thaliana DDM1, and mouse and human Lsh/Hells are known to preferably promote DNA methylation at satellite repeats, transposable elements and repetitive regions of the genome. On the other hand, DNA methylation in insects and other invertebrates occurs in genic rather than intergenic regions and transposable elements (e.g. Bewick et al 2017; Werren JH PlosGenetics 2013). It would be helpful to elaborate on these differences.

      We were aware of this interesting point, which was discussed in the third paragraph of the Discussion. To better illustrate this point, we now expanded the Discussion (page 14) to speculate about the role of DNA methylation in insects, where emerging evidence indicates the importance of DNMT1 in meiosis. It should be noted that, in the Arabidopsis ddm1 mutant, reduction of CG methylation of gene bodies is common (50% of all methylated euchromatic genes) (Zemach et al, 2013). In addition, hypomethylation is not limited to satellite repeats and transposable elements in ICF patients defective in HELLS or CDCA7 (Velasco et al., 2018).

      Reviewer #2 (Public Review):

      In this manuscript, Funabiki and colleagues investigated the co-evolution of DNA methylation and nucleosome remolding in eukaryotes. This study is motivated by several observations: (1) despite being ancestrally derived, many eukaryotes lost DNA methylation and/or DNA methyltransferases; (2) over many genomic loci, the establishment and maintenance of DNA methylation relies on a conserved nucleosome remodeling complex composed of CDCA7 and HELLS; (3) it remains unknown if/how this functional link influenced the evolution of DNA methylation. The authors hypothesize that if CDCA7-HELLS function was required for DNA methylation in the last eukaryote common ancestor, this should be accompanied by signatures of co-evolution during eukaryote radiation.

      To test this hypothesis, they first set out to investigate the presence/absence of putative functional orthologs of CDCA7, HELLS and DNMTs across major eukaryotic clades. They succeed in identifying homologs of these genes in all clades spanning 180 species. To annotate putative functional orthologs, they use similarity over key functional domains and residues such as ICF related mutations for CDCA7 and SNF2 domains for HELLS. Using established eukaryote phylogenies, the authors conclude that the CDCA7-HELLS-DNMT axis arose in the last common ancestor to all eukaryotes. Importantly, they found recurrent loss events of CDCA7-HELLS-DNMT in at least 40 eukaryotic species, most of them lacking DNA methylation.

      Having identified these factors, they successfully identify signatures of co-evolution between DNMTs, CDCA7 and HELLS using CoPAP analysis - a probabilistic model inferring the likelihood of interactions between genes given a set of presence/absence patterns. As a control, such interactions are not detected with other remodelers or chromatin modifying pathways also found across eukaryotes. Expanding on this analysis, the authors found that CDCA7 was more likely to be lost in species without DNA methylation.

      In conclusion, the authors suggest that the CDCA7-HELLS-DNMT axis is ancestral in eukaryotes and raise the hypothesis that CDCA7 becomes quickly dispensable upon the loss of DNA methylation and/or that CDCA7 might be the first step toward the switch from DNA methylation-based genome regulation to other modes.

      The data and analyses reported are significant and solid. However, using more refined phylogenetic approaches could have strengthened the orthologous relationships presented. Overall, this work is a conceptual advance in our understanding of the evolutionary coupling between nucleosome remolding and DNA methylation. It also provides a useful resource to study the early origins of DNA methylation related molecular process. Finally, it brings forward the interesting hypothesis that since eukaryotes are faced with the challenge of performing DNA methylation in the context of nucleosome packed DNA, loosing factors such as CDCA7-HELLS likely led to recurrent innovations in chromatin-based genome regulation.

      Strengths:

      • The hypothesis linking nucleosome remodeling and the evolution of DNA methylation.

      • Deep mapping of DNA methylation related process in eukaryotes.

      • Identification and evolutionary trajectories of novel homologs/orthologs of CDCA7.

      • Identification of CDCA7-HELLS-DNMT co-evolution across eukaryotes.

      Weaknesses:

      • Orthology assignment based on protein similarity.

      • No statistical support for the topologies of gene/proteins trees (figure S1, S3, S4, S6) which could have strengthened the hypothesis of shared ancestry.

      We appreciate the reviewers’ accurate summary, nicely emphasizing the importance of the our study. We agree that better phylogenetic analysis for orthology assignment will strengthen our conclusion. Having anticipated this weakness, however, we specifically conducted a CoPAP analysis exclusively for Ecdysozoa specieswhich supported our major conclusion, as orthology assignment is straightforward in these species. For example, if we conduct BLAST search against the clonal raider ant Oocerea biroi protein dataset using human HELLS as a query, top 1 hit is a protein sequence annotated as one of three isoforms of ‘lymphoid-specific helicase” (i.e., HELLS), with E value 0.0. Similarly, the top BLAST hit from the Oocerea biroi dataset using human DNMT1 as a query also returns with isoforms of DNMT1 with E value 0.0. As such, there are little disputes in orthology assignment in Ecdysozoa. Outside of Chordata, classification of DNMTs, particularly in Excavata and SAR, require more extensive identification in these supergroups. Our current orthology assignment for the major targets in this study (HELLS, DNMT1, DNMT3, DNMT5) is largely consistent with published results (Ponger et al., 2005 PMID 15689527; Huff et al, 2014 PMID 24630728; Yaari et al., 2019 PMID 30962443; Bewick et al., 2019 PMID 30778188). However, while we are preparing this response and re-crosschecking our assignments with these references, we realized that we had erroneously missed DNMT5 orthologs in Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata, and a DNMT6 ortholog in Fragilariopsis cylindrus. We also recognized that DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana in Huff et al 2014 (PMID 24630728), but in our phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4 (original Figure S6), although the confidence level of this classification by Huff et al was not strong. To resolve this potential confusion in DNMT annotations, we generated new multiple sequence alignments with MUSCLE v5 and IQ-TREE 2 (maximum likelihood-based method, coupled with selection of optimal substitution model and bootstrapping). The tree topology was not significantly altered between the two methods, except for the unambiguous location of orphan DNMTs and DNMT4-related proteins. To avoid unnecessary confusion in the DNMT annotations, we decided to present MUSCLE-IQ- TREE for the DNMT phylogenetic tree and classification (new Fig. S6). The raw results of IQ-TREE analysis for CDCA7/zf-4CXXC_R1, HELLS SNF2 domain, and DNMTs are included as Dataset S1-S3. We then conducted CoPAP analysis using the corrected classification. As it is not clear a priori if fungal specific CDCA7-like proteins (now referred to as CDCA7F with class II zf-4CXXC_R1) should be considered CDCA7 orthologs, we conducted CoPAP against two lists; the first list includes CDCA7F in the CDCA7 group, whereas the second list includes a separate category of class II zn-4CXXC_R1, which includes CDCA7F. Both results show slightly different topology in the coevolutionary linkages but support our major conclusion that CDCA7 coevolved with DNMT1-UHRF1 and HELLS. These new CoPAP results are shown in Fig. S7.

      Reviewer #1 (Recommendations For The Authors):

      Summary

      Last sentence: "...a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance...". What do the authors mean?

      Our analysis strongly indicates that CDCA7 is dispensable in systems lacking HELLS and DNMT (particularly DNMT1). In other words, species preserve CDCA7 only if it has both HELLS and DNMT1 (or in some cases DNMT5). The importance of HELLS homologs in DNA methylation has been extensively studied in human, mouse and plants. However, in these studies, substantial DNA methylation remains despite the defective HELLS/DDM1 (especially in euchromatic regions). Additionally, there are species (e.g., Bombyx mori) that have DNMT1 and detectable DNA methylation but lacks HELLS and CDCA7. These observations suggest that the role of CDCA7 must be unique and specialized in a way that it is strongly coupled to HELLS-dependent DNA methylation (but not HELLS-independent DNA methylation), and that this function of CDCA7 seems to be inherited from the last eukaryotic common ancestor.

      Introduction

      • page 3: "DNMTs are largely subdivided into maintenance and de novo DNMTs" - Which species are the authors referring to?

      As described in the cited reference (Lyko 2018), maintenance DNA methylation and de novo DNA methylation are well accepted functional classification of DNA methylation. It is also currently accepted that distinct DNMTs execute maintenance DNA methylation or de novo DNA methylation, although crosstalk between these processes has been reported. Therefore, we stated, “DNMTs are largely subdivided into maintenance DNMTs and de novo DNMTs”, and this subdivision is species independent.

      • page 3" "Maintenance DNMTs recognize hemimethylated CpGs. " - Can the authors please define the species and/or literature they are referring to? This seems important to clarify. For instance, mammalian DNMT1 requires a co-factor, UHRF1, which recognizes hemimethylated DNA and H3K9me3 (Bostick et al 2007).

      We meant to describe, “Maintenance DNMTs directly or indirectly recognize hemimethylated CpGs…”. The specific requirement of UHRF1 for DNMT1-mediated maintenance DNA methylation is explained in the subsequent sentence “In animals…”. In the case of Cryptococcus neoformans, DNMT5 recognizes hemimethylated DNA independently of UHRF1 in vitro to execute maintenance methylation.

      • page 3: The authors may want to mention that A. thaliana also has a de novo DNA methyltransferase, DRM2, a homolog of the mammalian DNMT3 methyltransferases. This seems important, since they show in Figure 1 that a de novo methyltransferase is found in A. thaliana. Also, later in their manuscript they mention plant de novo DNA methylation.

      Thanks for pointing this out. As shown in Figure 5, we classified plant DRMs as DNMT3-like proteins, but we now note this in the Introduction.

      • page 3: Sentence starting "In about 50% of ICF patients,..." - Why is DNMT3B referred to as "de novo", is it not a de novo DNA methyltransferase?

      You are correct. Quotation marks are now removed to avoid unnecessary confusion.

      • page 4: Sentence starting "Indeed, the importance of HELLS/CDCA7 in DNA methylation maintenance...", - Which references (Han et al., 2020; Ming et al., 2021; Unoki, 2021; Unoki et al., 2020) provide experimental evidence for a role of CDCA7 in DNA methylation maintenance by DNMT1?

      Thanks for pointing out the typo. “/CDCA7” is now removed.

      • page 5: Sentence starting "Indeed, it has been shown that DNMT3A..." - Should DNMTB be DNMT3B?

      Yes. This is now corrected.

      Results

      • Page 5: Sentence starting "However, we identified a protein..." - No A. thaliana reference?

      We added Zemach et al 2010, and Chan et al 2005.

      • Figure 2B: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 3: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure 4: "ICF4 mutations" should this be "ICF3 mutations"?

      • Figure S1: Orange colored "CDC7L (fish), CDC7e, CDC7, CDC7L" is there an "A" missing?

      • Figure S5: "ICF4 mutations" should this be "ICF3 mutations"?

      These typos are now corrected. Thank you.

      • Figure S7: What is "CDCA7(II)" referring to, "zf-4CXXC_R1 class II (plants)"?

      The original CDCA7 (II) included proteins with class II zf-4CXXC_R1, which are found in plants, fungi, Acanthamoeba castellanii and Amphimedon. Among those species, the prototypical CDCA7 orthologs are absent only in fungi. It has been a priori unclear if fungal proteins with class II zf-4CXXC_R1 (now we term CDCA7F) should be included in CDCA7 for CoPAP analysis. Although we originally included CDCA7F in CDCA7, we now show the results of two analyses. In the first one (Fig. S7A) CDCA7F was included in CDCA7, whereas in in the second one (Fig. S7B) CDCA7F was included in the separate category of class II zf-4CXXC_R1. Topologies of two results are slightly different, but they both show coevolutionary linkage between the CDCA7 and DNMT1- UHRF1 cluster.

      • Figure 4 and 5: In the case of preliminary genome assemblies what is the difference between empty squares with dotted lines and filled squares without dotted lines?

      As it is difficult to be certain of a gene’s absence (did the species lose the gene or is it simply not annotated due to incomplete genome coverage?), we illustrated the absence of a gene in preliminary genome assemblies with an empty square with dotted outline. Since the presence of a gene is evident regardless of the level of genome assembly, the presence of a gene is represented with filled squares with solid lines, even for preliminary genome assemblies.

      • Figure 1: Why was Mus musculus - one of the main model organisms used for many DNA methylation studies not included? Also what are empty and filled squares?

      Filled and empty squares indicate the presence and absence of the indicated genes, respectively. Clarifying statement is now added in the figure legends. Mus musculus is now included in the figure.

      • Figure S2: Adding the existence of DNA methylation and DNMT3 in the bottom right part of the figure (overall no of species) would make this panel more informative

      We included this overview to summarize the co-retention of CDCA7, HELLS and maintenance DNMTs across the analyzed species. We decided not to include DNA methylation, since DNA methylation status is known for only a fraction of the listed species. Inclusion of DNMT3 will introduce too many possible gene presence-absence combinations to convey a clear message. However, we now mention in the revised text (page 11, second paragraph) that unlike the prevalent co-retention of DNMT1 in species with CDCA7, we identified several species that possess CDCA7, HELLS and DNMT1 but lack DNMT3. These examples include insects such as the bed bug Cimex lectularius and the red paper wasp Polistes canadensis.

      • Page 6: Sentence starting "This leucine zipper sequence is highly conserved..." - Figure/Reference missing?

      The sequence alignment of the leucine zipper is now shown in Fig. 2C.

      • page 6: Sentence starting "In contrast to zf-4CXXC_R1 motif-containing proteins..." - The authors may want to mention the role of the CXXC zf domain in KDM2A/B, DNMT1, MLL1/2 and TET1/3 and what the CDCA7 CXXC zf domain is/could be required for.

      The notion that zf-CXXC binds to nonmethylated CpG is now included. Due to the substantial difference between zf-CXXC and zf-4CXXC_R1, we hesitated to relate the function of zf-4CXXC_R1 with zf-CXXC, but we now discuss a potential role of zf- 4CXXC_R1 in sensing DNA methylation status in Discussion (Page 13).

      • page 7: Sentence starting "Second, the fifth cysteine is replaced..."- Zoopagomycota" - Figure 4A does not have this labeling, one has to deduce this from Figure 4B.

      We fixed this by including the list of Zoopagomycota species in the main text.

      • page 7: Sentence containing "Neurospora crassa DMM-1 does not directly regulate DNA methylation or demethylation but rather..." - How does the information about DMM- 1 relate to what is shown in Figure 4B, to CDCA7, HELLS and DNMTs? Please clarify.

      Both Neurospora DMM-1 and Arabidopsis IBM1 contain the JmjC domain and are implicated in an indirect control mechanism of DNA methylation. Since it has never been pointed out that they have a divergent zf-4CXXC_R1 domain, which clearly shares the origin with CDCA7 proteins, we thought that this is important to note. We realized that we did not clearly mark Neurospora XP-956257 as DMM-1 in Fig. 4B. This is now fixed.

      • Heading "Systematic identification of CDCA7, HELLS and DNMT homologs in eukaryotes". When mentioning CDCA7 the authors may want to decide on the use of one consistent definition of "prototypical (Class I) CDCA7-like proteins (i.e. CDCA7 orthologs)" "Class I CDCA7 proteins". Constantly changing the way how they refer to these proteins is very confusing.

      We now make it clear that we call proteins with class I zf-CXXC_R1 motif CDCA7 orthologs. We also define class II zf-4CXXC_R1 (as those with a substitution at ICF- associated glycine residue). Since no clear CDCA7 orthologs can be found in fungi, we now call fungi proteins with class II zf-4CXXC_R1 “CDCA7F”, implying its ambiguous orthology assignment.

      Under this heading there is also no mention of DNMTs. Instead, the authors introduce DNMTs under the heading "Classification of DNMTs in eukaryotes" - Please clarify.

      This is now corrected.

      • page 9: Sentence containing "... presence of DNMT1, UHRF1 and CDCA7 outside of Viridiplantae and Opisthokonta is rare". What does "rare" mean? How is UHRF1 relevant here?

      Among the 32 species outside of Viridiplantae and Opisthokonta, only the Acanthamoeba castellanii genome encodes clear orthologs of DNMT1, UHRF1 and CDCA7. Although it is often difficult to deduce if the selected panel of species is a reasonable representation, we think that it is not unreasonable to state that Acanthamoeba is a rare case to encode this set of proteins outside of Viridiplantae and Opisthokonta. We include UHRF1 since it is a well-established activator of DNMT1, and indeed our CoPAP analysis showed a tight coevolution of UHRF1 with DNMT1. Outside of Viridiplantae and Opisthokonta, only Acanthamoeba castellanii and Naegleria gruberi encode UHRF1. Interestingly, these two species also encode CDCA7 and HELLS.

      Having said that, we rephrased this sentence, which reads; “Species that encode a set of DNMT1, UHRF1, CDCA7 and HELLS are particularly enriched in Viridiplantae and Metazoa.”

      • page 11: Sentence containing "..., that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1,..." What do the authors mean with "the function of CDCA7-like proteins"? And what happened to DNMT3?

      Our observation that almost all species that contain CDCA7 (including fungal CDCA7F) also have DNMT1 and HELLS, despite the frequent loss of these genes in species that do not contain CDCA7, indicates “that the function of CDCA7-like proteins is strongly linked to HELLS and DNMT1”. We found only 2 species that possesses CDCA7 (class I or class II) but not DNMT1 among the panel of 180 species. These 2 exceptional species, Naegleria gruberi and Taphrina deformans, do encode UHRF1-like proteins and a DNMT (an orphan DNMT in N. gruberi and DNMT4 in T. deformans). In contrast, we found 26 species that possess CDCA7 (or CDCA7F) but not DNMT3 (Table S1), so the linkage between CDCA7 and DNMT3 is weaker.

      • page 11: Sentence containing "..., CDCA7 is lost from this gene cluster in parasitoid wasps, including Ichneumonoidea wasps and chalcid wasps". This sentence is confusing because already in an earlier paragraph the authors say that "Microplitis demolitor lost CDCA7" and in the following sentence they say "...among Ichneumonoidea wasps, CDCA7 appears to be lost in the Braconidae clade, ...". It would greatly help this reader if the authors could streamline these sentences and also decide on whether CDCA7 is lost in M. demolitor or CDCA7 appears to be lost in M.demolitor.

      The confusion was in part due to the difficulty in differentiating between the true loss of a gene versus its apparent absence in a species due to an incomplete genome assembly, including for of M. demolitor. To verify that the loss of CDCA7 was not due to gaps in the genome assembly, we executed the synteny analysis. However, we edited this section to improve the readability (Page 12-13).

      What could be the role for HELLS/CDCA7 in insect DNA methylation? In several cases, the authors analyses reveal co-evolutionary links between DNMT3 (DNMT3A?) and CDCA7/HELLS. I do not understand why this finding is not really discussed by the authors. Instead there is a strong focus on replication-uncoupled DNA methylation maintenance. Could the authors elaborate why?

      The role of DNA methylation in insects is largely unclear, so discussion must be highly speculative. A recent finding in the clonal raider ant, showing that DNMT1 is not essential for development but is critical for oogenesis, pointed toward a possible more universal role of DNA methylation in meiosis. Stimulated from a finding in Neurospora, where DNA methylation is required for homolog pairing during meiosis, we discuss a speculative model that DNA methylation status acts as a hallmark to distinguish between healthy/young DNA and old/mutated (or competitive/pathogenic) DNA at homolog pairing during meiosis (page 14).

      Regarding the cases where CDCA7 and DNMT3 are co-lost, we had discussed about this phenomenon at the last section of Result, stating, “This co-loss of CDCA7 and DNA methylation (together with either DNMT1-UHRF1or DNMT3) in braconid wasps suggests that evolutionary preservation of CDCA7 is more sensitive to DNA methylation status per se than to the presence or absence of a particular DNMT subtype.” Please note that we found several lineages that lacks CDCA7 but has DNMT1 (and DNMT3), whereas almost all species that has CDCA7 also has DNMT1 (but not necessarily DNMT3). Supported with our CoPAP analyses, our results indicate the tight functional link between CDCA7 and DNMT1, but it does not necessarily mean that CDCA7 does not play any role related to DNMT3 or de novo methylation. Clarification of this point and our speculation of how CDCA7 loss is linked to reduced requirement of DNA methylation are discussed in page 13 and 14 with additional texts.

      Discussion

      • page 12: Where is the data supporting. "... the red flour beetle Tribolium castaneum possesses DNMT1 and HELLS, but lost DNMT3 and CDCA7"?

      Figure 5, Figure S2 and Table S1. This is now noted in the text.

      • page 14: Based on which parts of their analyses or evidence from the literature can the authors speculate that "...the evolutionary arrival of HELLS-CDCA7 in eukaryotes might have been required to transmit the original immunity-related role of DNA methylation from prokaryotes to nucleosome-containing (eukaryotic) genomes"? Please clarify.

      This is inferred from the well-known role of DNA methylation in bacteria for defending against phage viruses. However, it was not correct to state that such a function was inherited from prokaryotes. It should be stated that it was inherited from the last universal common ancestor (LUCA). We also admit that it is not clear if such an immunity-related role was inherited from LUCA, or if it emerged through convergent evolution. Therefore, we amended this description to emphasize our hypothesis that the advent of CDCA7 was “a key step to transmit the DNA methylation system from the LUCA to the eukaryotic ancestor with nucleosome-containing genomes”.

      Supplementary Figures/Tables

      • page 26: Table S2 and Table S3, it seems that these tables show data that supports what is shown in Figure 7 and not Figure 5.

      You are correct. Thank you for pointing out the typos.

      Has the methylation status been assessed in C. glomerata, C. typhae, Chelonus insularis, Diachasma alloeum or Aphidius gifuensis? Please clarify in Table S2.

      Not to our knowledge. However, as we realized that absence of DNA methylation in Aphidius ervi was previously reported (Bewick et al 2017), we now included this data together with presence/absence analysis of DNMT1, UHRF1, DNMT3, CDCA7 and HELLS. Known presence/absence of DNA methylation is now shown in Fig.7.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation to strengthen the paper:

      1) Phylogenetics:

      • Test and report the appropriateness of the substitution model used in protein alignments/trees.

      • Use Maximum likelihood methods and/or MCM Bayesian inference to build and report trees with well supported topologies. This is required to properly assign orthology (shared ancestry). This will avoid false interpretation due to technical limitation of similarity-based phylogenies (without statistical support). Figure S1, S3, S4 and S6.

      To address these points, we made new multisequence alignments using MUSCLE v6 and generated phylogenetic trees using the maximum likelihood-based IQ-TREE 2, where multiple models were screened. A consensus tree was generated after 1000 bootstrap replicates from the best alignment and model. The topology and assignment of these new trees were largely consistent with the original trees, except for some corrections in DNMT assignment as discussed below.

      1. We realized that we erroneously missed DNMT5 orthologs of Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata., and DNMT6 orthologs from Fragilariopsis cylindrus reported in Huff et al 2014 (PMID 24630728). They are now included in the new list and CoPAP analysis.

      2. DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana by Huff et al 2014 (PMID 24630728), but in our original phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4. The new tree and classification are more consistent with Huff et al, so we present the new tree in Fig. S6 and conducted the classification based on this tree.

      Beside Fig. S6, we decided to maintain original Fig. S1, S3 and S4 (with a few adjustments) for better visibility, but we included the results of IQ-TREE analysis as Dataset S1-S3.

      The CoPAP analysis based on the revised assignment slightly changed the topology of coevolutionary linkages. In addition, we obtained a slightly different result depending on whether fungal specific CDCA7 with class II zn-4CXXC_R1 (now referred to as CDCA7F) is included as a CDCA7 ortholog or not. Despite this difference, we reproducibly observed the coevolutionary linkage between CDCA7 and DNMT1- UHRF1.

      • Be more careful with wording: RBH is not sufficient to call gene/proteins orthologs (e.g. Page 8). The above mentioned method will help you support this claim (+ synteny when you can).

      We were aware of this issue. This is why we conducted phylogenetic tree building based on sequence alignment of full-length HELLS (Fig. S3) and SNF2 domain only (Fig. S4), as explained in the text. We found that the RBH criterion is robust in Metazoa; orthologs are easily recognizable with very low E-value (0.0) and extensive homology over the full length of the protein, while synteny is not practical to employ in the diverse set of species.

      • Also, use "co-retention" or "co-evolution" but not "co-selection" when describing CoPAP results - as CoPAP does not test for signature of natural selection.

      This is a good point and is now corrected.

      • The statistics (p-val...) underlying the CoPAP analyses should be explained.

      The explanation is now added in Methods section.

      “A method to calculate p-value for CoPAP was described previously (Cohen et al., 2012, PMID 22962457). Briefly, for each pair of tested genes, Pearson's correlation coefficient was computed. Parametric bootstrapping was used to compute a p-value by comparing it with a simulated correlation coefficient calculated based on a null distribution of independently evolving pairs with a comparable exchangeability (a value reporting the likelihood of gene gain and loss events across the tree).”

      2) Figure S2 and S3 could be improved for readability

      After consideration of this criticism, we decided to keep their original formats for following reasons.

      Figure S2. The purpose of this list is to better visualize the comprehensive list shown in Table S2. A consolidated list is already shown in Figure 5. An alternative choice is to make a diagram where individual species names are unreadable. This kind of presentation is seen in many published papers, but we found that they are not helpful to check the details. As this is a supplementary figure, we prefer to show the detailed data that can be visible without a specialized software.

      Figure S3. This figure is included to show which SNF2 family proteins are more likely to be misassigned as HELLS/DDM1 orthologs. We believe that the figure serves this purpose.

      3) What is the meaning of the coloring patterns of ICF residues in znf?

      ICF residues are highlighted as light blue in the schematics to indicate its conservation. In the alignment, the coloring reflects the level of conservation within the shown set of proteins, and the choice of coloring was set by Jalview.

      4) To improve clarity: the introduction could be more focused on evolutionary considerations and functional link between CDCA7-HELLS and DNMTs.

      We revised the first paragraph of the introduction to illustrate this point.

      5) Could indicate the CDC7A loss / DNA methylation hypothesis in the abstract.

      We now included this hypothesis in the Abstract.

    1. Selectiondevices of this sort willsoon be speeded up fromtheir present rate ofreviewing data at a fewhundred a minute.

      We see these selection devices being used today for more important purposes such as looking at resumes and selecting applicants for job interviews. I think it's important to note that although technology is being utilized for something that has such a big impact on people, it still isn't perfect. These algorithms will not always pick out the best candidates and will exclude those who may be best qualified for the job.

    1. Author Response

      Evaluation Summary:

      The manuscript shows that retinal ganglion cell light responses in awake mice differ substantially from those under two forms for anesthesia and previously attained ex vivo recordings. This difference is central to our understanding of how ganglion cell responses relate to behavior. There are a few technical issues and issues about how the work is presented that could be strengthened.

      We thank the reviewers for their constructive comments. We have addressed all the issues, and added substantially more data and analysis results in the revised manuscript, further supporting our findings that awake responses are larger, faster, and more linearly decodable in the mouse retina than those responses under anesthesia or ex vivo.

      Reviewer #1 (Public Review):

      This paper compares output signals from the mouse retina in three conditions: awake mice, anaesthetized mice, and isolated retinas. The paper reports substantial differences, particularly between awake and either of the other conditions. Retinal signaling has been well studied using ex vivo preparations, with an assumption that the findings from those studies can be carried over to how the retina operates in vivo. The results from this paper at a minimum indicate a need to be cautious about that assumption. There are several technical issues that need testing or further explanation, and several issues about the presentation that could be clarified.

      Spike sorting

      The paper does not describe any control analyses that test for contamination in spike sorting. These are needed to evaluate the work.

      We have reported the details of our spike sorting procedure in the revised manuscript (Data Analysis section in Methods and Figure 1). In short, single-units were identified by clustering in principal component space, followed by manual inspection of spike waveform (triphasic as expected from axonal signals; e.g., revised Figure 1F-H; Barry, 2015) as well as auto- and cross-correlograms (minimal inter-spike interval above 1 ms for a refractory period; e.g., revised Figure 1I-K). A small fraction of visually responsive cells (20/282, awake; 21/325, isoflurane; 1/103, FMM) had a small fraction of interspike intervals below 2 ms; but, whether or not including them in the analysis did not affect our main conclusions.

      Light levels

      The paper argues that differences in light level cannot account for the results. According to the methods, light levels were about two-fold higher at the retina in array recordings as compared to the front of the eye for in vivo recordings. The main text indicates that they differ less, it's not clear why the numbers in the main text and methods are different. Aside from this issue, this comparison does not consider the loss of light between the front of the eye and the retina. It is crucial that the paper provide a more detailed description of light levels. This should include converting those light levels to units that include the spectral output of the light source used (e.g. to isomerizations per rod or cone per second).

      The maximum light intensity of our in vivo setup was 31.3 mW/m2 (with 15.9 mW for UV LED and 15.4 mW/m2 for blue LED). Following the suggestion by the reviewer, we calculated the photon flux on the mouse retina in vivo by taking into account the loss of light by the eye optics. In short, assuming 50% and 68% transmittance at 365 nm and 454 nm, respectively (Jacobs & Williams 2007), the pupil size of 1 mm and the retinal diameter of 4 mm with the stimulus covering 73° in azimuth and 44° in elevation, we obtained the photon flux on the mouse retina in vivo as 3.81×103 and 6.64×103 photons/s/μm2 for UV and blue light, respectively. Assuming a total photon collecting area of 0.2 μm² for cones and 0.5 μm² for rods (Nikonov et al. 2006), and a relative sensitivity of rods, S- and M-cones to be [UV, Blue]=[25, 60], [90, 0], [25, 60]%, respectively (Jacobs & Williams 2007), we then estimated the photoisomerization (R) rate as: 2.5×103 R/rod/s, 0.7×103 R/S-cone/s, and 1.0×103 R/M-cone/s.

      In contrast, the maximum light intensity of the in vitro set up was 36 mW/m2 as reported in Vlasiuk and Asari (2021). The photon flux on the isolated retina was then estimated to be around 9×104 photons/s/μm2 (under the assumption that the white light from a CRT monitor is centered around 500 nm). Assuming the sensitivity of rods, S- and M-cones to be 40, 2 and 40%, respectively, we then obtained 4×104 R/rod/s, 2×103 R/S-cone/s, and 4×104 R*/Scone/s.

      Thus, the light intensity level was about ten times larger for the in vitro recordings than for the in vivo recordings. The amount of light reaching the retina in the awake condition should also be somewhat smaller than that under anesthesia due to pupillary reflexes. Past studies suggest that the darker the stimulus is, the slower the kinetics is and the smaller the response is for RGCs in an isolated retina (Wang et al 2011). Thus, the light intensity difference cannot simply account for the higher firing and faster kinetics in the awake condition than ex vivo or in the anesthestized condition.

      We have revised the manuscript accordingly.

      Comparison with other work

      The authors accurately point out that there is not much prior work on retinal outputs in awake animals. The paper, however, minimally describes the work that does exist. The Hong et al. (2018) paper, in particular, should be discussed. There are several differences between the results of that paper and the present paper. These include the fraction of recorded cells that are DS cells, and the maintained firing rates (though this does not appear to be studied systematically in Hong et al.).

      In the discussion section of the revised manuscript, we clarified connections to the existing studies on the retinal activity in vivo. To our knowledge, none of the past studies provided descriptive statistics on the awake RGC response properties (Hong et al., 2018; Schroeder et al., 2020; Sibille et al., 2022). Nevertheless, consistent with our study, we can see high baseline activity in the reported examples from C57BL6 mice (Figure 3C, Schroeder et al. 2020; Figure S7h, Sibille et al. 2022).

      Hong et al (2018), in contrast, reported somewhat different as pointed out by the reviewer. Firstly, they found a relatively low baseline activity in RGCs of albino CD1 mice. We think that this is likely due to general impairments of the vision/retina associated with albinism. While equipped with normal electroretinogram signals, CD1 mice showed no optomotor response and a reduced number of rods (Abdeljalil et al 2005; Brown et al 2007). This suggests a certain level of retinal dysfunction in these mice. Secondly, Hong et al (2018) reported a higher fraction of direction-selective RGCs in their recordings (>50% at a DS index threshold of 0.3). This is even higher than one would expect from anatomical and physiological studies ex vivo on BL6 mice (about a third; Sanes and Masland, 2015; Baden et al., 2016; Jouty et al 2013). Besides the effect of albinism, we think that this overrepresentation of DS cells in Hong et al (2018) arose as a consequence of the low baseline activity. As discussed above, the higher the baseline activity, the lower the DS/OS index by definition (Eq.(3) in Methods). Indeed we found much more cells with high DS/OS index values in our anesthetized data than in awake ones (42-54% vs 17% at an index value threshold of 0.15; Figure 2), even though these recordings were done in the same experimental set up.

      A related issue is that there are a few comparisons of ex vivo RGC responses with behavioral sensitivity. Smeds et al. (2019) is one example. More generally, the long-standing observation that dark-adapted sensitivity approaches limits set by Poisson fluctuations in photon absorption, and that prior RGC measurements are consistent with this result, is hard to explain if the RGCs are firing at high spontaneous rates under these conditions. RGC responses will certainly change with light level, but this merits discussion in the paper.

      As the reviewer pointed out, the retina may employ different coding principles under different light levels. In a scotopic condition, ex vivo studies reported a high tonic firing rate for OFF RGC types (~50 Hz, OFF sustained alpha cells in mice; Smeds et al 2019; ~20 Hz, OFF parasol cells in primates; Ala-Laurila and Rieke, 2014), while a low tonic firing for ON cell types (<1Hz for both ON sustained alpha in mice and ON parasol in primates). These ON cells were shown to be responsible for light detection by firing in the silent background, hence compatible with the sparse feature detection strategy. In contrast, our recordings were done in a high mesopic / low photopic range where both rods and cones are supposedly active. Unlike the scotopic condition with rod vision, we then found high firing in awake recordings in general, indicating that no visual feature can be readily detectable as brief firing events in the silent background. To explore the implications of such firing patterns on visual coding, we took a modelling approach in the revised manuscript. We found that a latency-based temporal code was not preferable in the awake condition (Figure 7); and that a linear decoder worked significantly better with the population responses in the awake condition to capture the presented random fluctuation of the light intensity (Figure 8). While we have not tested any behavioural relevance in our study besides correlation to locomotion/pupil size, it is then possible that the retina may work in different modes under different light intensity regimes (Tikidji-Hamburyan et al 2015).

      We clarified these points in the revised discussion section.

      Sampling bias

      The paper argues that sampling bias is not likely to contribute substantially to the results because of the wide variety of cell types recorded (line 431). This does not seem like a particularly strong argument, especially given the large degree of overlap in the distributions of most quantities across preparations. The argument about many cell types could be made more strongly if the distributions were completely separated, but that is not the case.

      We cannot deny the presence of a sampling bias in our datasets, and as the reviewer pointed out, we made comparisons only at a population level, but not at the level of individual cells or cell-types. However, the anesthetized and awake recordings were done with the same recording setup and techniques, and thus subject to the same sampling bias. Hence, the difference in the RGC response properties between these conditions cannot be explained by the sampling bias per se.

      Sensitivity

      The firing rates in response to 10% contrast sinusoids are quite low, as are the maximal firing rates for high contrast sinusoids. Relatedly, the modulation produced by the noise stimuli, particularly for the array recordings, is weak. This raises concerns about the health of some of the preparations.

      To our knowledge, in vivo contrast responses reported here were comparable to ex vivo data in previous reports (mouse, Jouty et al 2018, Pearson and Kerschensteiner 2015; rat, Jensen 2017, 2019). Likewise, the static nonlinearity and its upper bound for ex vivo responses were comparable between this study and previous reports (Santina et al. 2013; Kerschensteiner et al 2008; Cantrell et al 2010; Trapani et al 2022).

      We also examined batch effects in the response to the noise stimuli. We found certain variabilities across preparations in each recording condition, but not to the extent to discard any particular data as an obvious outlier (Figure 6 – figure supplement 1). While it is difficult to tell the health status of preparations retrospectively, we thus believe that the effects were negligible.

      Efficient coding

      Sparse firing is not a universal property of retinal ganglion cell responses. Primate midget RGCs, for example, have pretty high maintained firing rates as shown in many past studies. Mouse RGCs have also been reported to operate in a mode similar to the high firing rate On cells reported here (Ke et al. 2014). A more balanced discussion of this past work is needed.

      As the reviewer pointed out, some retinal ganglion cells show high firing under certain conditions. In a scotopic condition, for example, OFF cells have high firing rates, while ON cells fire virtually nothing unless a light stimulus is presented (Ke et al 2014; Smeds et al 2019). At the behavoural level, a single-photon detection above chance level nevertheless relies on the information from the ON but not the OFF pathway (Smeds et al 2019). Thus, the sparse coding framework still works as a valid strategy here, if not universal.

      This is, however, very different from what we report here. In a high-mesopic/low-photopic light level, we found a general increase of firing across all cell categories in the awake condition, compared to the anesthetized or ex vivo recordings (Figures 3 and 6). While this lowers information transfer rate (bits/spike; Figure 7), we found that the awake responses were more linearly decodable than the responses in the other conditions (Figure 8). We also ran a simulation and showed that a latency-based temporal code is not preferable for the awake responses (Figure 7 – figure supplement 1). These results suggest that the retina in awake condition is in favor of a rate code, though we have not tested all light levels or any behavioural relevance here.

      We clarified these points in the revised manuscript.

      Role of eye movements

      Could eye movements be at least partially responsible for the differences in response properties? Specifically, small fixational eye movements might produce a constantly varying input that could modulate firing.

      As described above (Essential Review item #2), eye movements were rarely observed during the head-fixed awake recordings. Eliminating those events from the analysis did not change our overall conclusions, and thus their contributions should be minimal in this study. It should also be noted that we mainly used full-field stimulation, and thus microsaccades should not substantially affect the amount of light impinging on the retina. We clarified these points in the revised manuscript.

      Reviewer #2 (Public Review):

      The technical achievements presented in the manuscript represent a tour de force, as optical tract recordings in awake mice have only rarely been done before. The substantial number of neurons recorded in both awake and anaesthetized conditions form a precious and worldwide unique dataset. However, since the recordings represent a non-standard approach, it would be, in my view, highly beneficial to show more details about the success of the method. How did the authors post-hoc identify electrode contacts located in the optical tract, how did the spike waveforms look like, what were the metrics of spike sorting quality, etc.

      We added more details about our recording and analysis methods in the revised manuscript. Below are answers to the reviewer’s specific questions:

      • The probe was coated with a fluorescent dye (DiI stain) and its location was verified histologically after the recordings (Figure 1E).

      • Spike waveforms typically had a triphasic shape (e.g., Figure 1F-H) as expected from axonal signals (Barry, 2015).

      • Single-units were identified by clustering in principal component space, followed by manual inspection of spike shape as well as auto- and cross-correlograms. Most units had a minimum interspike interval above 2 ms (93%, awake; 94%, isoflurane; 99%, FMM); and no units had the interspike intervals below 1 ms for a refractory period (e.g., Figure 1I-K), except for 1 (out of 103) for FMM-anesthetized recordings.

      We then selected visually responsive cells (SNR>0.15; see Eq.(1) in Methods) for the analyses.

      The authors go a long way in characterising the functional response properties of the recorded neurons and relating them to previous ex-vivo recordings. Based on the responses they find, the authors claim that they identified "... a new response type [which] likely emerged due to high baseline firing in awake mice". Regarding this claim, how do the authors rule out that it corresponds to any of the previously described cell types? For instance, the very sharp transient or brief modulations by the contrast part of the stimulus might have been missed in previous classifications based on calcium responses (e.g. Baden et al. 2016), where a number of cell types seem to respond equally strong to grey and white and have an elevated response throughout the sinusoidal modulation of contrast. I acknowledge that the authors touch upon the possibility that the newly described OFFsuppressive ON cells correspond to a known cell type in the discussion, but I would recommend changing the phrasing of the results to avoid potential misunderstandings.

      We agreed with the reviewer and revised the manuscript accordingly. Here we have two possibilities. Firstly, as the reviewer pointed out, this kind of response dynamics could be overlooked previously because of a difference in the recording modality (Ca imaging; Baden et al 2016) or clustering methods (Jouty et al 2019). Secondly, these cells may belong to one of the cell-types described in the past ex vivo studies, but exhibited distinct response dynamics in vivo as an emerging property of the awake condition. This is an interesting topic to pursue in future studies.

      The manuscript makes the interesting suggestion that "the retinal output characteristics [...] observed in vivo, [...] provide a completely different view on the retinal code". Given that this conclusion would change the way we should think about and do retinal neuroscience, in my view, the authors should take a few more steps to quantitatively demonstrate the implications of their findings on retinal coding, e.g. how much lower is the information transmitted per spike, how much does a temporal code based on spike timing suffer with the latencies observed in vivo. If the authors could quantify through computational modelling approaches the consequences of the observed differences, they might also be able to revise their title / main message, i.e. that "Awake responses SUGGEST inefficient dense coding in the mouse retina".

      To explore functional implications of our findings, we performed three more analyses as suggested by the reviewer. Specifically,

      1) We showed that the information transmitted per spike was significantly lower in awake condition, while the total information rate was comparable (Figure 7).

      2) We tested the performance of a linear decoder applied on the firing rate in response to full-field noise, and showed that it worked significantly better for the awake population responses (Figure 8).

      3) We simulated RGC responses to a full-field contrast change at different intensities in different conditions, and showed that a latency coding did not work well with awake responses, compared to ex vivo or anesthetized responses (Figure 7 – figure supplement 1).

      These results strengthened our conclusion that awake response dynamics were different from anesthetized or ex vivo responses, all arguing against the sparse efficient coding principles at least at a light level we examined. We nevertheless kept the title as is because we have not explored the retinal coding properties per se. Our main claim stays on the visual response characteristics of retinal outputs in awake mice.

      Reviewer #3 (Public Review):

      The manuscript by Boissonnet, Tripodi, and Asari compares retinal ganglion cell (RGC) light responses in awake mice (recorded in the optic nerve) with those under two forms for anaesthesia and previously attained ex vivo recordings. This is a well motivated study looking at a question that is really critical to the field.

      The presentation is generally clear and compelling. My suggestions are relatively minor and aimed at improving an already very strong article.

      1) More cells in the awake condition would help strenghten the conclusions. Only 51 cells are reported, and mouse RGCs comprise more than 40 different types. The authors are well aware of the possible confound of sampling bias, and the best way to mitigate this issue in this experimental paradigm is simply to record more cells. The anesthsia conditions each have about 100 cells, which is better.

      We made substantially more recordings in the awake condition, reaching 282 cells (in 15 animals) in total in the revised manuscript. This does not yet allow for a full cell-type classification as in the past ex vivo studies. Nevertheless, we did our best to broadly classify visual responses, and showed that the overall conclusions remained the same: awake RGCs had higher baseline firing and faster response kinetics in general. For details, see above our response to the Essential Revision item #1.

      2) It took me longer than it should have (had to look up the previous paper cited) to figure out that the ex vivo comparison data were recorded at 37{degree sign}C. This is an important detail since most ex vivo recordings are at 32{degree sign}C. The authors should make this clear in the text and perhaps say something in the Discussion about comparisons to the larger body of literature of ex vivo studies at 32{degree sign}.

      We are aware that most ex vivo studies on the retina were performed at 32 °C, which is lower than physiological body temperature (37 °C). However, the temperature of the ocular surface is around 37 °C (Vogel et al 2016), suggesting that the retina should operate at 37 °C in vivo. This is why we decided to perform ex vivo experiments at 37 °C in our previous study (Vlasiuk and Asari, 2021), allowing us to make a fair comparison between the ex vivo and in vivo recordings.

      We clarified the point in the revised manuscript.

      3) Direction and orientation selectivity should be separated in Fig. 2 and not combined into the confusing term "motion sensitive." Motion sensitivity has another meaning in the literature for RGCs that respond preferentially to moving over static stimuli without direction or orientation preference (Kuo et al., 2016; Manookin et al., 2018)

      We agree with the reviewer. In the revised manuscript, we separated the direction and orientation selective cells (Figure 2), and avoided the term “motion sensitive.”

      4) While I am certainly sympathetic to the argument that the RGC spike code is "inefficient" in the sense that it does not conform to efficient coding theory (ETC), I think it's oversimplified to claim that the present data is a key argument against ETC. Plenty of ex vivo data has already shown ETC to be incomplete at best, and misguided at worst, since it includes the implicit assumption that image reconstruction is the retina's objective function (or even that the experimenter has any idea what that objective function is). For example, OFF sustained alpha (OFF delta in guinea pig) RGCs are not quite sparse feature detectors even ex vivo, and they seem to be optimized to transmit contrast with high SNR (Homann and Freed, 2017). In general, the enormous coverage factor of the RGC population seems to make ETC untenable to begin with, as discussed in (Schwartz, 2021) and elsewhere. I realize that there are still people attached to simplistic forms of ETC as a key principle of retinal computatiion, so I am not asking for the authors to completely remove this angle. Rather, a more nuanced treatment of the issue both in the introduction and the discussion is warranted.

      We totally agree that we are not the first to argue against the efficient coding principles in the retina (Schwartz, 2021). The main argument in this study is that certain aspects of the RGC activity are distinct in an awake condition, such as the baseline firing and response kinetics, and thus we cannot simply translate our knowledge obtained from ex vivo studies into awake animals. To explore the implications on retinal computations, we showed in the revised manuscript that 1) awake responses have a comparable total information transfer rate (in bits per second; Figure 7A) but are less efficient (i.e., lower bits per spikes; Figure 7B); 2) awake responses are not in favor of a latency-based temporal code (Figure 7 – figure supplement 1); and 3) a linear decoder worked significantly better with awake responses (Figure 8), even though an image reconstruction is not necessarily the objective function of the retina. These results point out a need to rethink about retinal function in vivo, including the efficient coding theory.

      We thank the reviewer for the suggestion, and revised the manuscript accordingly.

      References

      Homann, J., and Freed, M.A. (2017). A mammalian retinal ganglion cell implements a neuronal computation that maximizes the SNR of its postsynaptic currents. Journal of Neuroscience 37, 1468-1478.

      Kuo, S.P., Schwartz, G.W., and Rieke, F. (2016). Nonlinear Spatiotemporal Integration by Electrical and Chemical Synapses in the Retina. Neuron 90, 320-332.

      Manookin, M.B., Patterson, S.S., and Linehan, C.M. (2018). Neural Mechanisms Mediating Motion Sensitivity in Parasol Ganglion Cells of the Primate Retina. Neuron 97, 13271340.e4. Schwartz, G.W. (2021). Retinal Computation (Academic Press).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary of the major findings -

      1) The authors used saturation mutagenesis and directed evolution to mutate the highly conserved fusion loop (98 DRGWGNGCGLFGK 110) of the Envelope (E) glycoprotein of Dengue virus (DENV). They created 2 libraries with parallel mutations at amino acids 101, 103, 105-107, and 101-105 respectively. The in vitro transcribed RNA from the two plasmid libraries was electroporated separately into Vero and C6/36 cells and passaged thrice in each of these cells. They successfully recovered a variant N103S/G106L from Library 1 in C6/36 cells, which represented 95% of the sequence population and contained another mutation in E outside the fusion loop (T171A). Library 2 was unsuccessful in either cell type.

      2) The fusion loop mutant virus called D2-FL (N103S/G106L) was created through reverse genetics. Another variant called D2-FLM was also created, which in addition to the fusion loop mutations, also contains a previously published, evolved, and optimized prM-furin cleavage sequence that results in a mature version of the virus (with lower prM content). Both D2-FL and D2-FLM viruses grew comparably to wild type virus in mosquito (C6/36) cells but their infectious titers were 2-2.5 log lower than wild type virus when grown in mammalian (Vero) cells. These viruses were not compromised in thermostability, and the mechanism for attenuation in Vero cells remains unknown.

      4) Next, the authors probed the neutralization of these viruses using a panel of monoclonal antibodies (mAbs) against fusion loop and domain I, II and III of E protein, and against prM protein. As intended, neutralization by fusion loop mAbs was reduced or impaired for both D2-FL and D2-FLM, compared to wild type DENV2. D2-FLM virus was equivalent to wild type with respect to neutralization by domain I, II, and III antibodies tested (except domain II-C10 mAb) suggesting an intact global antigenic landscape of the mutant virion. As expected, D2-FLM was also resistant to neutralization by prM mAbs (D2-FL was not tested in this batch of experiments).

      5) Finally, the authors evaluated neutralization in the context of polyclonal serum from convalescent humans (n=6) and experimentally infected non-human primates (n=9) at different time points (27 total samples). Homotypic sera (DENV2) neutralized D2-FL, D2-FLM, and wild type DENV similarly, suggesting that the contribution of fusion loop and prM epitopes is insignificant in a serotype-specific neutralization response. However, heterotypic sera (DENV4) neutralized D2-FL and D2-FLM less potently than wild type DENV2, especially at later time points, demonstrating the contribution of fusion loop- and prM-specific antibodies to heterotypic neutralization.

      Impact of the study-

      1) The engineered D2-FL and D2-FLM viruses are valuable reagents to probe antibodies targeting the fusion loop and prM in the overall polyclonal response to DENV.

      2) Though more work is needed, these viruses can facilitate the design of a new generation of DENV vaccine that does not elicit fusion loop- and prM-specific antibodies, which are often poorly neutralizing and lead to antibody-dependent enhancement effect (ADE).

      3) This work can be extended to other members of the flavivirus family.

      4) A broader impact of their work is a reminder that conserved amino acids may not always be critical for function and therefore should not be immediately dismissed in substitution/mutagenesis/protein design efforts.

      Evaluating this study in the context of prior literature -

      The authors write "Although the extreme conservation and critical role in entry have led to it being traditionally considered impossible to change the fusion loop, we successfully tested the hypothesis that massively parallel directed evolution could produce viable DENV fusion-loop mutants that were still capable of fusion and entry, while altering the antigenic footprint."

      ".....Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence. Otherwise, it has not been generated in DENV or other mosquito-borne flaviviruses"

      The above claims are a bit overstated. In the context of other flaviviruses:

      • A previous study applied a similar saturation mutagenesis approach to the full length E protein of Zika virus and found that while the conserved fusion loop was mutationally constrained, some mutations, including at amino acid residue 106 were tolerated (PMID 31511387).

      • The Japanese encephalitis virus (JEV) SA14-14-2 live vaccine strain contains a L107F mutation in the fusion loop (in addition to other changes elsewhere in the genome) relative to the parental JEV SA14 strain (PMID: 25855730).

      • For tickborne encephalitis virus (TBEV-DENV4 chimera), H104G/L107F double mutant has been described (PMID: 8331735)

      There have also been previous examples of functionally tolerated mutations within the DENV fusion loop:

      • Goncalvez et al., isolated an escape variant of DENV 2 using chimpanzee Fab 1A5, with a mutation in the fusion loop G106V (PMID: 15542644). G106 is also mutated in D2-FL clone (N103S/G106L) described in the current study.

      • In the context of single-round infectious DENV, mutation at site 102 within the fusion loop has been shown to retain infectivity (PMID 31820734).

      We thank the reviewer for these comments. We have adjusted the text above to better reflect and credit the prior literature. Text is modified as follows in the discussion session.

      “Previous reported mutations in the fusion loop are mainly derived from experimental evolution using FL-Ab to select for escape mutant or by deep mutational scanning (DMS) of the Env protein for Ab epitope mapping. Mutations in the FL epitope were observed in a DENV2-NGC-V2 (G106V)39, attenuated JEV vaccine strain SA14-14-2 (L107F)40, attenuated WNV-NY99 (L107F)41. While most of the mutations, including the double mutations reported here lead to attenuation of the virus. A recent DMS study showed that Zika-G106A has no observable impact on viral fitness42. Interestingly, we also recovered a mutation G106L, suggesting position 106 and 107 might be the most tolerable position for mutation in mosquito borne flavivirus FL. On the other hand, tick borne flavivirus as well as vector only flavivirus show a more diverse FL composition. The inflexibility of mosquito borne flavivirus might be due to the evolution constraint of the virus to switch between mosquito and vertebrate hosts.”

      Appraisal of the results -

      The data largely support the conclusions, but some improvements and extensions can benefit the work.

      1) Line 92-93: "This major variant comprised ~95% of the population, while the next most populous variant comprised only 0.25% (Figure 1C)".

      What is the sequence of the next most abundant variant?

      The sequence of the next most abundant variant has been added to the text.

      2) Lines 94-95: "Residues W101, C105, and L107 were preserved in our final sequence, supporting the structural importance of these residues." L107F is viable in other flaviviruses.

      We acknowledge that the L107F mutation has been described in other flaviviruses, including the tick-borne flaviviruses DTV and POWV. This mutation in JEV is associated with viral attenuation. This sentence is referring to the fact that, in our libraries, we did not recover variants with mutations at these positions, in contrast to D2-FL with variants at N103 and G106, indicating less mutational tolerance. However, we want to re-direct the focus of this manuscript to engineer a viable DENV that is antigenically different in the FL epitope, but not which residue is more tolerance for mutation.

      3) Figure 2c: The FLM sample in the western blot shows hardly any E protein, making E/prM quantitation unreliable.

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mbio). The methods and figure legend have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should have minimal effect by the fusion loop mutations.

      4) Lines 149 -151: "Importantly, D2-FL and D2-FLM were resistant to antibodies targeting the fusion loop. While neutralization by 1M7 is reduced by ~2-logs, no neutralization was observed for 1N5, 1L6, and 4G2 for either variant (Figure 3 A)".

      a) Partial neutralization was observed for 1N5, for D2-FL.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      b) Do these mAbs cover the full spectrum of fusion loop antibodies identified thus far in the field?

      We did not test every known fusion loop antibody that has been described, instead focusing on 1M7, 1N5, 1L6, and 4G2, which were previously described by Smith et al and Crill et al. We also modified the text in discussion to reflect the possibility of other FL-Ab that are not affected by out mutations.

      “We have tested a panel of FL-Ab; however, we cannot exclude the possibility that other FL-Abs may not be affected by N103S and G106L. However, we have shown that saturation mutagenesis could generate mutants with multiple amino acid changes, and we are currently using D2-FLM as backbone to iteratively evolve additional mutations in FL to further deviate the FL antigenic epitope.”

      c) Are the epitopes known for these mAbs? It would be useful to discuss how the epitope of 1M7 differs from the other mAbs? What are the critical residues?

      Critical residues for these antibodies have been described. They are as follows: 1M7: W101R, W101C, G111R; 1N5: W101R, L107P, L107R, G111R; 1L6: G100A, W101A, F108A; 4G2: G104H, G106Q, L107K. The critical residues for 1M7 are slightly different than the others, perhaps explaining the residual binding to D2-FL. Note that the critical residue identified previously for 1M7 and 1N5 do not overlap with D2-FLM mutations, suggesting the FL mutations has extending effect on the antigenic FL epitope.

      d) Maybe the D2-FL mutant can be further evolved with selection pressure with fusion loop mAbs 1M7 +/-1N5 and/or other fusion loop mAbs.

      We agree that it may be possible to further evolve D2-FL using antibody selection, although we have not yet performed these experiments, we are currently performing iterative saturation mutagenesis and directed evolution to further evolve away from the natural FL.

      5) It would have been useful to include D2-M for comparison (with evolved furin cleavage sequence but no fusion loop mutations).

      Neutralization data for some of the mAbs against D2-M can be found in our previous study (Tse et al. 2022 mBio), in which no difference in neutralization was observed compared to DV2 wildtype. Given the limited resources of the anti-DENV NHP and human serum, we did not add D2-M for comparison. Although some insight can be deduced from the D2-FL vs D2-FLM comparison, we agree future studies that are designed to delineate CR-Ab population between prM, FL and other CR-epitopes should include D2-M for comparison.

      6) Data for polyclonal serum can be better discussed. Table 1 is not discussed much in the text. For the R1160-90dpi-DENV4 sample, D2-FL and D2-FLM are neutralized better than wild type DENV2? The authors' interpretation in lines 181-182 is inconsistent with the data presented in Figure 3C, which suggests that over time, there is INCREASED (not waning) dependence on FL- and prM-specific antibodies for heterotypic neutralization.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which has above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values is should not be

      In conclusion, we do not think serum from Table 1 is potent enough to shows difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      As the reviewer pointed out, the dependence of FL-Ab in later time points increased (the difference between DV2 and D2-FL at 20dpi vs 60dpi vs 90dpi), suggesting non-FL CR-Ab is waning but not prM- and FL-Abs. We rewrote the sentence as follow:

      “These data suggest that after a single infection, many of the CR Ab responses target prM and the FL and the reliance on these Abs for heterotypic neutralization increase overtime (Figure 3C).”

      Suggestions for further experiments-

      1) It would be interesting to see the phenotype of single mutants N103S and G106L, relative to double mutant N103S/G106L (D2-FL).

      2) The fusion capability of these viruses can be gauged using liposome fusion assay under different pH conditions and different lipids.

      3) Correlative antibody binding vs neutralization data would be useful.

      We thank the reviewer for the suggestions; we agree these would be of interest and, indeed, these studies are currently underway. In regard to single mutants, these were present in the initial plasmid library but did not enrich after viral production and passage. Two possible explanations can be drawn, 1) The stochastic of directed evolution prevents a single mutant with similar fitness to enriched. 2) The two mutations are compensatory to each other to make a functional mutant. The 2nd hypothesis highlights the difference between saturation mutagenesis (this study) and DMS (in previous studies).

      Fusion capability is indeed very interesting, however, the mechanistic difference or not between wildtype FL and the mutated FL in supporting fusion is not the focus of this study. Instead, we are currently working on adapting the D2-FLM in mammalian cells. If successful, the difference in fusion mechanism between the Vero adapted and D2-FLM in different lipid, insect vs mammalian would be of interest.

      We are currently developing whole virus ELISA; we avoid using rE monomer for the study as it might neglect the conformation Ab.

      Reviewer #2 (Public Review):

      Antibody-dependent enhancement (ADE) of Dengue is largely driven by cross-reactive antibodies that target the DENV fusion loop or pre-membrane protein. Screening polyclonal sera for antibodies that bind to these cross-reactive epitopes could increase the successful implementation of a safe DENV vaccine that does not lead to ADE. However, there are few reliable tools to rapidly assess the polyclonal sera for epitope targets and ADE potential. Here the authors develop a live viral tool to rapidly screen polyclonal sera for binding to fusion loop and pre-membrane epitopes. The authors performed a deep mutational scan for viable viruses with mutations in the fusion loop (FL). The authors identified two mutations functionally tolerable in insect C6/36 cells, but lead to defective replication in mammalian Vero cells. These mutant viruses, D2-FL and D2-FLM, were tested for epitope presentation with a panel of monoclonal antibodies and polyclonal sera. The D2-FL and D2-FLM viruses were not neutralized by FL-specific monoclonal antibodies demonstrating that the FL epitope has been ablated. However, neutralization data with polyclonal sera is contradictory to the claim that cross-reactive antibody responses targeting the pre-membrane and the FL epitopes wane over time.

      Overall, the central conclusion that the engineered viruses can predict epitopes targeted by antibodies is supported by the data and the D2-FL and D2-FLM viruses represent a valuable tool to the DENV research community.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 51-52: "Currently, there is a single approved DENV vaccine, Dengvaxia." Line 56-57: "Other DENV vaccines have been tested or are currently undergoing clinical trial, but thus far none have been approved for use."

      It should be specified for the global audience that this applies to the United States. Takeda's DENV vaccine, QDENGA is approved in Indonesia, European Union, and Brazil.

      The text has been modified to include this information.

      2) Line 62-63: - "The core fusion loop-motif DRGWGNGCGLFGK is highly conserved..." Lines 78-80: - We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1) and 79 DRGXXXXXGLFGK (Library 2).

      It may be useful for the readers if the amino acid numbers are stated. The core fusion loop motif DRGWGNGCGLFGK (Eaa98-110) is highly conserved. We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1; Xaa 101,103, 105-7) and DRGXXXXXGLFGK (Library 2; Xaa 101-105).

      This information has been added to the text.

      3) Line 91-92: "Bulk Sanger sequencing revealed an additional Env-91 T171A mutation outside of the fusion-loop region."

      It looks like the mutation T171A is in domain I of the E protein and does not seem to interface with the fusion loop. Is that why it wasn't pursued further?

      The E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      4) Lines 82-85: "Saturation mutagenesis plasmid libraries were used to produce viral libraries in either C6/36 (Aedes albopictus mosquito) or Vero 81 (African green monkey) cells and passaged three times in their respective cell types."

      a) What was the size of the libraries? How does one make sure that the experimental library actually has all the amino acid combinations that were intended?

      Each library has 5 randomized amino acids, so there are 205 = 3.2 million combinations. In these experiments, sequencing of the plasmid libraries revealed about 2 million unique amino acid sequences, or approximately 62.5% library coverage. The actual plasmid diversity is expected to be higher than 2 million as our deep sequencing has limited coverage.

      b) The wild type sequence was excluded from the libraries, correct?

      The wild-type sequence was not specifically excluded from the libraries, as there is no easy method to do so. Wild-type sequence was detected in the plasmid libraries but was not selected in the C6/36 library. However, in the Vero library, we recovered WT virus.

      5) Table 1: - Please include in the table description, what the colors indicate.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added all relevant information in the table legend.

      6) Lines 246-248: "Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence."

      It may be worthwhile to mention the WNV mutation (L107F) as some readers may be curious about where this mutation is relative to the ones described in this study.

      This information has been added to the text. We also included the previously described FL mutations in flaviviruses in the text.

      Reviewer #2 (Recommendations For The Authors):

      Major Critique:

      • There is a disconnect between Fig 2A and 2C. FL and FLM viruses have much lower levels of prM-E expression in the viral supernatants based on the western blot in 2C. Why isn't E being detected in the Western? Is the particle-to-pfu ratio skewed in the mutant viruses? Is it possible that the polyclonal is targeting the cross-reactive prM and FL epitopes, and if so would using a monoclonal antibody targeting a known DIII-epitope (2D22) yield a different western result? Also, the legend and methods for Fig 2C are not clear. What is actually being tested in the Western blot? Were equivalent volumes of the different viral preps used?

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mBio) and the methods have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should not be affected by the fusion loop mutations. 2D22 is a conformational antibody and does not work in western blot.

      • Table 1: The data within Table 1 is ignored in the text, and some of this data contradicts the central conclusions of the manuscript.

      o A.) Some of the convalescent data contradicts the hypothesis. DS0275 had an equivalent neut between DV2 and D2-FLM, DS1660, and R1160 (90) had better neut against the D2-FLM than DV2. Discussion of these samples is warranted.

      o C.) The description in the legend does not adequately describe the table. What do the colors represent? What are the numerical values being displayed? What is in parentheses, (I assume the challenge strain)? The limit of detection is reported as 1:40; 0.25. 1:40 is 0.025 which matches most of the data? There is inadequate description of these experiments in the materials and methods.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added discussion for Table 1 and clarify the difference between the three cohorts of serum in the text with the corresponding references.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which was above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values are not reliable.

      In conclusion, we do not think sera from Table 1 is potent enough to show difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      Minor critique:

      Figure 1C: Legend is not clear for this panel. What is on the x-axis of the bubble plots? Are these mutations across the entire viral genome or is this just the prM-E sequence?

      The X-axis is a scatter of all of the sequences contained in the library, similar to graphs used for plotting CRISPR screen results. These represent individual sequences from the saturation mutagenesis libraries in the fusion loop of E as described in Figure 1B.

      The wording in Lines 92-94 is not clear. It looks like the T171A mutation was present in 95% of the sequences (Line 92). Yet this sequence was not incorporated into the variant virus. What is the rationale for omitting this mutation in downstream variant virus generation?

      The 95% in Line 92 refers to the variant containing N103S/G106L mutations as seen in Figure 1C. The high-throughput sequencing approach did not include residue 171, so the presence of the T171A mutation in combination with fusion loop mutations cannot be determined. However, the E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      The authors discuss the potential of the D2-FL or D2-FLM virus as a potential vaccine platform in the abstract, introduction, and conclusion. This is a good idea, but the authors provide no evidence of feasibility in this manuscript.

      The ultimate goal to engineer a viable DENV with distinct FL antigenic epitope is for it use as live attenuated vaccine. As this is the rationale for the study, we introduce the concept throughout the manuscript. The current study demonstrated the possibility to mutate a novel fusion loop motif in DENV and provided evidence to show the favorable antigenic properties of D2-FLM. We agree with the reviewer that definitive work in animal to show vaccine efficacy need to be done and are currently undergoing. To avoid misleading our audience, we tone down the emphasis of vaccine use in the text.

      Line 150-153: Figure 3A demonstrates that the FL-specific antibodies broadly do not neutralize the mutant viruses. However, the conclusions are overstated in the text. 1N5 neutralizes the D2-FL variant.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      Lines 175-182: The authors make a lot of assumptions about the target of the polyclonal target without any evidence.

      These lines reference studies that showed greater enhancement by antibodies targeting the fusion loop and prM as compared to other cross-reacting antibodies. The assumption that both our manuscript and others have drawn was that Abs that are cross-reactive and weakly neutralizing are more prone for ADE. As discussed, other groups have attempted to mutate the FL from recombinant E protein to achieve similar goal to remove the fusion loop epitope to reduce ADE. We have re-written the sentence in the followings:

      “As FL and prM targeting Abs are the major species demonstrated to cause ADE in vitro, we and others hypothesized these Abs are responsible for ADE-driven negative outcomes after primary infection and vaccination,10–12,32 we propose that genetic ablation of the FL and prM epitopes in vaccine strains will minimize the production of these subclasses of Abs responsible for undesirable vaccine responses. Indeed, covalently locked E-dimers and E-dimers with FL mutations have been engineered as potential subunit vaccines that reduce the availability of the FL, thereby reducing the production of FL Abs.33–36”

    1. Author Response

      Reviewer #2 (Public Review):

      Please note that I am not a structural biologist and cannot critically evaluate the details of figures 1 to 3; my review focuses on the cell biology experiments in figures 4 and 5.

      Paine and colleagues investigated structural requirements for the interaction between the ESCRT-III subunit IST1 and the protease CAPN7. This is a continuation of previous work by the same group (Wenzel et al., eLife 2022), which showed that Capn7 is recruited to the midbody by Ist1 and that Capn7 promotes both normal abscission and NoCut abscission checkpoint function. In this article, the structural determinants of the Ist1-Capn7 interaction are characterised in more detail, focusing on the structure of Capn7 MIT domains and their binding to Ist1. Notably, point mutations in Capn7 MIT domains known to mediate binding to Ist1 and midbody recruitment are shown here to be required for abscission functions, as expected from the authors' previous paper. Furthermore, the report shows that a Capn7 point mutant lacking proteolytic activity behaves as a loss-of-function in abscission assays, despite showing normal midbody localisation. These are important results that will help in future studies to understand how the Capn7 protease regulates abscission mechanistically.

      The report is clearly written and the results support the main conclusions. Some technical limitations and alternative interpretations of the data should be discussed in the text, as outlined below.

      1) It is not always clearly stated how the results presented in this report relate to those in the Wenzel paper. For example, the finding that Ist1 recruits Capn7 to midbodies (p. 6 and figure 4) was first shown in the Wenzel paper. The novelty here is not that Capn7 MIT mutants fail to localise to midbodies, but that they phenocopy the previously described knockdown of Capn7, failing to support normal abscission and NoCut function (fig. 5). This supports and extends the findings of Wenzel et al. It is important to make this explicit and explain the conceptual advances shown here more clearly.

      We take the reviewer’s point and we have now clarified this issue in the text (e.g., page 7, lines 4-5).

      2) The NoCut checkpoint can be triggered by chromatin bridges, DNA replication stress, and nuclear basket defects, but only basket defects are tested here. Therefore, it is not clear if NoCut is still functional in Capn7-defective cells after replication stress and/or with chromatin bridges. Ideally, this should be tested experimentally, or alternatively discussed in the text, especially since the molecular details of how NoCut is engaged under different conditions remain unclear. For example, "abscission checkpoint bodies" proposed to control abscission timing form in response to nuclear basket defects and aphidicolin treatment, but not in the presence of chromatin bridges (Strohacker et al., eLife 2021).

      We appreciate the reviewer’s excellent suggestion. We have now performed the requested experiments and added a new figure showing that CAPN7 is also required to maintain the NoCut checkpoint when it is triggered by DNA bridges (new Figure 6A) or by replication stress (new Figure 6B).

      3) The current data suggest that Capn7 is a regulator of abscission timing, but in my opinion do not quite establish this, for two main reasons. First, abscission timing is not directly measured in this study. Time-lapse imaging would be required to rule out alternative interpretations of the data in figure 5. For example, a delay in an earlier cell cycle stage could in principle lead to a decrease in the overall fraction of midbody-stage cells. Second, the absence of the midbody is not necessarily a marker of complete abscission. Indeed, midbody disassembly is associated with the completion of abscission in unchallenged HeLa cells, but not in cells with chromatin bridges (Steigemann et al, Cell 2009). Midbodies remain a useful marker for pre-abscission cells, but the absence of midbodies should not be immediately interpreted as completion of abscission without further assays. Formally, a direct measurement of abscission timing would require imaging of the plasma membrane, for example using time-lapse phase-contrast microscopy (Fremont et al., 2016 Nat Comm). These limitations should be mentioned in the text.

      We note that midbody numbers are not our only measure of abscission delay/failure - we also measure the numbers of multinucleate cells and sum the two. Nevertheless, we understand the reviewer’s point and have therefore noted that we are using increased frequencies of cells with midbody connections and multiple nuclei as surrogate markers for abscission defects and NoCut-induced abscission delays (page 7, lines 13-14 and line 17).

      4) IST1 plays a role in nuclear envelope sealing by recruiting the co-factor Spastin (Vietri et al., Nature 2015), a known IST1 co-factor also confirmed in the previous interactome screen (Wenzel et al. 2022). CAPN7 could have a role in maintaining nuclear integrity upon the KD of Nup153 and Nup50 (Mackay et al. 2010) instead of/in addition to its proposed role in delaying abscission as part of the NoCut checkpoint at the midbody. I don't think the authors can differentiate between these two possibilities, and it would be interesting to consider their possible implications on how the "NoCut" checkpoint is triggered.

      The reviewer again makes good points, and we agree that in addition to participating in abscission, CAPN7 may be involved in closure of the nuclear envelope and that nuclear envelope closure may, in turn, be linked to satisfaction of the NoCut checkpoint. This involvement would nicely explain our observations that both SPAST and CAPN7 participate in both NoCut and abscission. We are in an unusual situation, however, because other colleagues in our field have told us in private communications that they observe that CAPN7 does, in fact, participate in nuclear envelope closure. We find that observation interesting and exciting but it is their discovery, not ours, and we have therefore refrained from doing analogous experiments ourselves. As a compromise, we have added the following text to the penultimate section of our paper (page 8, lines 34-35 through page 9, lines 1-11):

      “Our discovery that both CAPN7 and SPAST participate in the competing processes of cytokinetic abscission and NoCut delay of abscission may appear counterintuitive, but we envision that the MIT proteins could participate in both processes if they change substrate specificities or activities when participating in NoCut vs. abscission; for example, via different sites of action, post-translational modifications, and/or binding partners. We note that, in addition to its well documented function in clearing spindle microtubules to allow efficient abscission (Yang et al., 2008), SPAST is also required for ESCRT-dependent closure of the nuclear envelope (NE) (Vietri et al., 2015). The relationship between NE closure and NoCut signaling is not yet well understood, and it is therefore conceivable that nuclear membrane integrity is required to allow mitotic errors to sustain NoCut signaling. It will therefore be of interest to determine whether or not CAPN7, in addition to its midbody abscission functions, also participates in nuclear envelope closure and, if so, whether that activity is connected to its NoCut functions.”

      We think that this additional text explains what we (and the reviewer) consider to be an attractive model, but leaves open the question of CAPN7 involvement in nuclear envelope closure to be resolved by our colleagues.

      5) Figure 5 should include images of representative cells, highlighting midbody-positive and multinucleated cells. Without images, it is not possible to evaluate the quality of these data.

      We appreciate this suggestion and have now added images showing midbody-positive and multinucleated cells from the quantified datasets to allow assessment of our data quality (new Figures 5B and 5D).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Unfortunately, this paper adds only a little to our understanding of uptake in to the flagellar pocket of trypanosomes. It tends to add only detail to information that has been well characterised elsewhere and indeed, as the authors themselves point out, (lines 92-98) it is rather incremental.

      We were disappointed that the reviewer was so unsupportive of the work presented here. It seems possible that the reviewer is partly objecting that the title - which emphasised the main finding of the paper - does not fully capture the content of the paper. We have therefore modified the title to emphasise that the paper is principally a characterisation of TbSmee1 rather than an investigation of the flagellar pocket, with the insight into cargo entry being the most notable finding.

      Not only has Tbsmee1 been studied before but this data in bloodstream forms is not particularly novel since it gives much the same information as the canonical hook protein TbMORN. This work follows the pattern of conclusions made previously with the protein TbMORN. It focusses on the protein TbSmee where RNAi mutants are interpreted to show flagellar pocket enlargement and impaired access by surface bound cargo. Unfortunately, there is little mechanistic or functional conclusion to the study in terms of how TbSmee operates naturally in the cell.

      This is deliberately downplaying the value of the work. TbSmee1 has not previously been characterised in bloodstream form cells, and neither TbMORN1 nor the hook complex are as well-characterised as other cytoskeletal components such as the flagellum and basal body. To criticise the paper for not providing a molecular mechanism of TbSmee1's function is unreasonable given the volume of work provided and the fact that this is a first characterisation of the protein in this life cycle stage. Expectation of a complete molecular mechanism is setting a very high bar for a first characterisation.

      It is also possible that the reviewer has not grasped the main thrust of the argument - when TbMORN1 was characterised it was the first protein shown to have this cargo entry defect. We show here that not only does TbSmee1 share this defect, but that it is in fact a previously-unacknowledged feature of all phenotypes of this type, exemplified by clathrin. We have modified the text to make this finding more clearly emphasised (see for example lines 654-661 in the tracked-changes version of the manuscript).

      There are other possible explanations for the phenotype. That would need to be studied. This large flagellar pocket phenotype is seen with RNAi mutants of many different types of proteins in the trypanosome and so pleiotropic effects are highly likely. Also, there are a good number of alternative possibilities to account for reduced access to the pocket in these mutants and this data could be usefully added.

      This is another statement that seems intended primarily to disparage the paper rather than attempt to improve it. It would have been extremely helpful if the reviewer highlighted what these other possible explanations are instead of making vague allusions. The widespread prevalence of this kind of phenotype means that our insight into restricted cargo access to the flagellar pocket is of general relevance in the trypanosome field.

      Specific points<br /> 1. The transient location for the TbSmee at the FAZ tip - or in this case the groove region - was seen in procyclics (Perry, 2018) so this bloodstream indication merely confirms that concept.

      The reviewer is again downplaying the value of the work rather than providing constructive criticism. While FLAM3 has been shown to be at the tip of the new flagellum in bloodstream form cells (Sunter et al., 2015), at the time of the preprint being published Smee1 was actually the first protein (besides the DOT1 antigen) shown to localise to the groove region in bloodstream form cells. It is also worth noting that procyclic form cells and bloodstream form cells are fairly different in this regard - in procyclic cells, there is an entire flagellar connector structure that is not present in bloodstream form cells, and so demonstrating that Smee1 was present in the groove region was an important experiment. Since this preprint was published, Smithson et al. have identified 13 additional proteins localising to the groove (Smithson et al., 2022) - we have modified the text to include these points (see lines 542-545 of the tracked-changes manuscript).

      1. The C terminal region required for targeting is a reasonable deletion analysis of regions of the protein. But can this data (line 228) be said to "mediate targeting" - or is it just required. For instance, targeting might be OK but it might be needed for stable association, etc etc.

      We have changed the text to say "required" for targeting instead of "mediating" targeting (line 312 of tracked-changes manuscript).

      1. This protein has already been shown to be phosphorylated and the sites and cell cycle possibilities have been mapped by Urbaniak. So that section adds little. https://doi.org/10.1371/journal.ppat.1008129

      The reviewer is again disparaging the significance of the work rather than critiquing it. This is after all only a single panel of a figure and ~15 lines of text, and therefore a minor but still noteworthy element of the manuscript. This also misunderstands what the Urbaniak study does and does not show - while that work showed that Smee1 is phosphorylated, it remained possible that other post-translational modifications were occurring. This experiment shows that the "fuzzy" appearance (variable electrophoretic migration) of TbSmee1 in gels can be solely attributed to phosphorylation as opposed to other post-translational modification. We contacted Dr. Urbaniak to confirm this - his answer is below.

      "__I think your approach to look at the fuzzy banding is actually rather elegant; our data shows that phosphorylation occurs but we did not look for any other PTMs that could influence migration on a gel and probably wouldn't see them without a different enrichment and analysis method. We often see a fuzzy pattern with glycosylation due to the heterogeneity, and I suspect other modifications will also results in a smear. Given that the band collapses to a single band after phosphatase treatment and not with an inhibitor present it is fair to conclude that phosphorylation is responsible for the fuzzy band, not other undefined PTMs like glycosylation.__"

      1. Essentiality in BS forms and pocket enlargement. This is not surprising. A very large number of cytoskeletal proteins show this in RNAi knockdown. Flagella mutants (extensive publications from many groups (Hill, Bastin, Gull, etc) over last 15 years show this very well and so this protein is just one more example.

      This appears to be another comment aimed at downplaying the value of the manuscript rather than providing constructive feedback. The fact that we have demonstrated something previously unobserved in a common phenotype makes the data of general interest to the community, we feel.

      1. I didn't find that the explanations for flagella pocket enlargement are soundly based. The experiments focus on endocytosis and uptake and ignore other plausible reasons and some evidence in literature.

      Again, the reviewer's feedback would be considerably more constructive if they had taken the time to specifically cite the evidence in the literature that they are alluding to, and present some of the "other plausible reasons" they are aware of. We have consulted widely in the community and have not been able to find anybody who knew what work the reviewer is referring to here.

      Lines 84/85. Enlarged pockets may be indicative of endocytosis failure. Presumably the rationale is that endocytosis fails, but exocytosis still occurs and the pocket membrane enlarges. What evidence is there that exocytosis of membrane still occurs? This simple concept might indeed operate in a clathrin mutant but is surface membrane/content exocytosis is maintained in these cytoskeleton mutants? There is good evidence for glycoconjugates within the flagellar pocket. Are these depleted or present still?

      The reviewer is correct that we have not specifically assayed for exocytosis, but the fact that we are able to make the same observations in both the clathrin RNAi (where exocytosis has been assayed - Allen et al., 2003) and the Smee1 RNAi means that this is not a problematic omission. The effect of the enlarged flagellar pocket phenotype on the glycoconjugates in the flagellar pocket is an interesting question but far outside the current focus of the paper.

      1. There are also a number of other publications indicating that clathrin pits are still present on the enlarged pockets of various mutants when viewed by EM. The authors have looked at the flagellar pockets by EM but the EM methods described have extensive washings and centrifugations before fixation. This is a very poor approach and will mean that endo and exocytic traffic is disturbed (extensive references in literature in other systems? This is not a useful approach for exo/endocytosos studies where flux of traffic demands fast chemical or freezing fix in media.

      The reviewer has misunderstood the aim of the experiments described in Figure 5D, which was to observe the morphological changes caused by depletion of TbSmee1. As the reviewer is no doubt aware, high-pressure freezing of trypanosomes gives much better morphological preservation than chemical fixing in media, so the choice of method is not "very poor" but tailored to the experimental aims. We have modified the text to make this point more clearly (lines 355-358 of tracked-changes version). Once again, the referee offers no citation to back up their assertion that endo- and exocytic traffic is disturbed by wash steps, either in trypanosomes or elsewhere.

      1. The EMs and Light microscopy does show that the mutant pockets are substantially abnormal in their cytoskeletal arrangement. They have multiple flagella profiles, flagella structures have not connected with the membrane and are sometimes in the cytoplasm (see a glance of the paraflagellar rod in the cytoplasm in FigS5C and internalised FAZ attachment plaques in Fig 4 D bottom right cell). Given these extensive (and expected) cytoskeletal abnormalities it is highly likely that these pocket abnormalities are a result of motility, cell division/developmental issues and the differential uptake phenotypes merely consequential.

      This is another misinformed argument that is seeking to disparage the data. The reviewer has apparently overlooked the fact that the same phenotype is seen in clathrin RNAi, when flagellar pocket enlargement precedes any downstream effects on cell division cycle progression. We have gone to great lengths (Fig 6) to demonstrate that the enlargement of the flagellar pocket almost certainly precedes the onset of the growth defect in the TbSmee1 RNAi, and it is therefore likely to precede the cytoskeletal abnormalities that the reviewer has highlighted. An effect on cellular motility is possible and would be interesting to investigate in future work.

      1. The authors speak about early phenotypes , but these are often at 15-24 hours. That is probably a couple of cell cycles and so not early.

      To be informative, the analyses of RNAi phenotypes have to be done as soon as possible after the onset of the growth defect, and we have gone to great lengths (Figure 5) to define this point as being at 21 hours. This is already difficult as the number of phenotypic cells at the onset of the growth defect will not be high. We have clarified the text to emphasise that "early" refers to soon after the onset of the phenotype (lines 388-389 of tracked-changes version).

      In relation to the above question of comparison to the same morphology produced by flagella mutants it would be good to know if these hook mutants produce motility phenotypes and whether these are manifest before the uptake phenotypes. There is evidence (cited here) that forward motility of the trypanosome directs material on surface into the pocket. If these cells have motility defects (primary or via failed division) then surely that would provide an alternative simple explanation for uptake differences.

      The reviewer is overlooking the observation that the surface-bound endocytic cargoes (ConA, BSA) are still being sorted/directed as far as the entrance to the flagellar pocket - what is interesting is that the cargo is apparently unable to enter the flagellar pocket. As noted above, it would certainly be interesting to look at motility effects in follow-up work.

      1. There is a general point that if studies are to have real relevance to uptake in the trypanosome then they need to deal with uptake of natural ligands rather than artificial surrogates such as dextran. Such tracers were used historically, but in the last decade a series of receptors and ligands for fluid phase and particularly membrane mediated endocytosis have been discovered. With the investment of a little time these important ligand / receptors such as haptoglobin, transferrin, etc would be much more relevant.

      Dextran is still state-of-the-art as it is an inert fluid phase marker. We are not aware - and have asked widely - of any readily-available alternative to dextran as a fluid phase marker, especially seeing as we have demonstrated in this study that BSA does not behave as a fluid phase marker in the experimental conditions used. The reviewer is also being disingenuous in suggesting that there is a panel of validated physiological reporters for trypanosomes that are readily available commercially - this is not the case. Transferrin is probably the only example, but the transferrin receptor is confined to the flagellar pocket and therefore not relevant to the question of how surface-bound material enters the flagellar pocket in the first place. As suggested by Reviewer 3 and endorsed by Reviewer 2, we have looked at the uptake of anti-VSG antibodies (which are a physiological cargo) in additional experiments and obtained evidence that the same effects are seen (Figure 9).

      **Referees cross-commenting

      this session includes comments from Reviewer 1 and Reviewer 2.<br /> *

      Reviewer 2<br /> <br /> Dear Reviewers 1 and 3:<br /> I agree with many of the points with Reviewer 1 and our divergence is partly a matter of degree. While it is true that this manuscript is incremental in its contribution to our understanding of TbSmee1, it nonetheless adds to our understanding of the role of this protein in the bloodstream life stage and because of that I find value in the work. The fact that it mirrors what was seem in other protein knockdown studies (e.g. TbMORN) doesn't negate its contribution for me. Reviewer 1 makes an important point, however, when stating that this work does not add a mechanistic or functional conclusion as to how TbSmee1 operates and for me that is the biggest shortcoming of the work. Offering mechanistic insight is a high bar and while it would make for a much more exciting story it does not discount the value of the work as presented. What I do appreciate is the speculation about this observation that endocytosis is required for entrance of surface bound material into the pocket and although they are unable to show that this is not a side affect of other processes being disrupted it is and intriguing point. These observation have the potential of stimulating further investigations into crosstalk between the entrance to the pocket and endocytosis. I also agree that the use of ligands for known receptors like transferrin would be far more informative. While I assumed the transferrin receptor was in the pocket itself it would be interesting to see if the ESAG6/7 is also located outside the pocket and transiently binds cargo before being brought inside for endocytosis.<br /> I think that Reviewer 3 brings up a great point with the focus on VSG's. I think that examining VSG turnover in these mutants can add value to the analysis and inform our view of how affecting the hook complex alters VSG endocytosis.

      We appreciate Reviewer 2 taking the time to defend the value of the work, and we concur with Reviewer 2's assessment. Reviewer 2 is also correct that the transferrin receptor appears to be primarily or wholly confined to the flagellar pocket interior, making this likely less informative in this context. Concerning the uptake of anti-VSG antibodies highlighted by Reviewer 3 and endorsed by Reviewer 2, we have carried out these experiments and obtained similar results to those published in the first version of the preprint (Figure 9).

      Reviewer 1<br /> <br /> some fair comment and agreement. This is being sent to general cell biology journals.<br /> when one looks at this area in the round it is it is nearly 50 years (1975) since Langreth and Balber published their seminal work on protein uptake and digestion in bloodstream and culture forms of T. brucei. There has been 50 years intense study and the genome has been around for nearly 20 years as well. So, put simply - for both a general science audience and the wider parasite community - if this is a paper about one protein, TbSmee1,then it has surely has to say something functional about that protein. If it is a paper about uptake in trypanosomes (where mutants are one means of interrogation) then it surely has to say something about mechanisms of uptake of physiological relevant ligands. The days of dextran etc are past.

      Hence, my comment that this does neither and so is very incremental to what is known already. It is 2022 not 1975. Langreth and Baber published their seminal work in J Protozoology for very good reasons no doubt.

      It is striking that Reviewer 1 here extends their aggressive and uncivil approach to attack Reviewer 2's assessment, again substituting forceful wording for informed argument. Reviewer 1 again inexplicably and mistakenly criticises the use of dextran when no state-of-the-art alternative exists. They then go on to needlessly disparage the work done by Langreth & Balber when this work was produced in a totally different publishing landscape. They also appear to fundamentally misunderstand the Review Commons concept, which is to provide journal-independent preprint peer review; it is also worth noting that there are specialist journals such as PLoS Pathogens in the RevComm affiliates as well as general cell biology journals. Given that the mechanism of variant surface glycoprotein (VSG) switching has not yet been fully articulated despite the efforts of multiple labs and many projects over a decades-long time period, it seems extremely unreasonable to be making such demands of this paper.

      Reviewer 2<br /> Thank you for replying and I agree with the spirit of your critique. My only comment, which could result from my own naivete, is to say that despite the incredible work that has been done in dissecting endocytosis in T. brucei over these past 50 years, it appears that we still do not understand how many fundamental of aspects of this activity works in this parasite. Even basic questions regarding how cargo, e.g. transferrin, binding to surface receptors is sensed by the parasite remains unknown and the identity of the specific signaling components which transmit this information internally to initiate endocytosis have not been characterized. In many ways it seems that we don't even understand how the parasite partitions the end/exocytic pathways in the pocket and maintains membrane homeostasis. While we know that some kinases and traditional signaling components must be involved, a high resolution understanding of this process in T. brucei seems lacking. I only say all this to suggest that the field maybe isn't yet that advanced to reject work of this type as so many mechanistic unknowns still remain to be uncovered and maybe incremental advances and phenomenology still can add value to the field. However, I respect your opinion on the matter and my perspective could be due to a lack of a full appreciation of the literature on the subject.

      We completely agree with Reviewer 2's assessment here, which neatly summarises our rationale for the present work. Reviewer 2 is, if anything, being overly accommodating by suggesting that their perspective may be due to a lack of a full appreciation of the literature - on the contrary, Reviewer 2 appears to have a very sound grasp of the topic.

      Reviewer #1 (Significance):

      Unfortunately, I did not find tis to be very significant. It covers old ground in terms of the phenotype described. Many groups have shown the differences between procyclic and bloodstream phenotypes in this enlarged pocket phenomenon. The work is rather incremental from these and other author's work on these hook proteins.<br /> There are alternative explanations for understanding the effect of flagella pocket structure and uptake of ligands into the pocket and trypanosome cell. These would need to be studied before one could see a functional, mechanistic link established.<br /> Other parts of this are of nicely done but do not move on our understanding (eg targeting/phosphorylation) from what has been done previously.

      As noted repeatedly, it appears that Reviewer 1's priority is disparaging the value of the work here and downplaying its significance rather than providing constructive feedback. The reviewer repeatedly makes unrealistic demands (a mechanistic model, use of non-standard reagents), misunderstands the aim of experiments (use of high-pressure freezing), makes vague allusions to other work in the literature but without citing anything specific to support their case, and makes strong and assertive statements that are factually incorrect (design of RNAi experiments, use of dextran). We find this approach unhelpful, uncivil, and unprofessional. It is desperately disappointing that we should have to spend the majority of our response rebutting Reviewer 1's comments rather than implementing constructive criticisms that would strengthen the manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:<br /> In this manuscript the authors have advanced our understanding of the hook complex component TbSmee1 through a detailed analysis of this protein's role in the endocytosis of surface bound proteins via the flagellar pocket in bloodstream form Trypanosoma brucei. The TbSmee1 protein, previously identified using proximity labeling using TbMORN1 and TbPLK, and characterized in procyclic T. brucei, was confirmed to target to both the shank portion of the hook complex as well as the growing end of the new FAZ in replicating cells. The protein was also shown to likely be phosphorylated as had been suggested previously due to its association with the kinase TbPLK. A domain deletion analysis demonstrated that domains 2 and 3 are important for TbSmee1's proper localization to the hook complex. Loss of TbSmee1 using RNAi based knockdown resulted in a quick cessation of growth in the bloodstream form within 24 hours in contrast to what was seen previously in procyclic cells which had only a decreased growth rate. Loss of TbSmee1 also resulted in an enlargement of the flagellar pocket and in many ways mirrored the phenotype observed with knockdown of TbMORN1. Although prior work on TbSmee1 in procyclic T. brucei demonstrated that loss of this protein altered the morphology of TbMORN1, no such change was seen in bloodstream form cells and only an alteration in the morphology of TbLRRP1 was observed. In characterizing the effect of TbSmee1 depletion on endocytosis the authors showed that the fluid phase marker Dextran could enter into the flagellar pocket of TbSmee1 depleted parasites while the surface bound ConA and BSA remained outside of the flagellar pocket suggesting that TbSmee1 may play a role in allowing larger protein components into the pocket regions. Similar observations were also previously seen with TbMORN1 depletion. Importantly, a knockdown of clathrin recapitulated the TbSmee1 knockdown phenotype suggesting that endocytosis itself was required to allow material bound at the surface to enter into the flagellar pocket. In addition to adding to our understanding of hook complex components, this work raises some interesting questions regarding the role of the hook complex in facilitating endocytosis in this important human pathogen.

      Thank you for the positive assessment.

      Major Critiques:<br /> This is a superbly written manuscript with robust high-quality data that strongly support the major conclusions made by the authors. The flow the article is logical and easy to follow making it accessible to a wide array of readers.

      We are glad that the Reviewer appreciated the effort that went into writing the paper.

      Although I appreciate the brevity of the introduction and how the article gets straight to the point, additional background information on the components and function of the flagellar pocket collar protein could help contextualize the goals of the project. The way in which the flagellar collar structures are introduced to the reader is quite abrupt (beginning on line 75) and simply states the names of TbBILBO1, the centrin arm and hook complex as simple facts without much discussion about the background of these components/regions. A graphical representation of the centrin arm or hook complexes relative to other components like the pocket itself, FAZ or axoneme could make following the story much easier. An expansion of this background could also go a long way to convince readers of the importance of this region in the basic biology and virulence of T. brucei.

      Implemented. We have added more background details on the hook complex, flagellar pocket collar, and centrin arm and added a new schematic image to Figure 1 showing these structures as well as the FAZ (Figure 1A).

      On lines 84-86 the authors cite the way in which 'small' vs 'large' macromolecules enter into the pocket without defining what exactly is meant by these terms as they are relative in nature. Setting some boundaries of size could provide some context to the reader.

      Implemented. We have provided more detail on the approximate sizes in nm (lines 110-113 of tracked-changes manuscript).

      In the domain localization analysis beginning in Figure 4 there is a missed opportunity to also assess which portions of the TbSmee1 protein are important for overall function as well. By either an examination of dominant negative phenotypes resulting from overexpression of the truncated mutant or the expression of the truncated forms designed to be RNAi resistant in the TbSmee1 knockdown cell line, one could also assess which portions of this protein are essential for endocytic function in addition to targeting. Is there a reason this was not performed?

      This is a good point; we did actually investigate overexpression of the TbSmee1(161-766) construct which can target correctly but is missing the first folded domain, but did not observe any phenotypic effects. We have added this point to the results (lines 301-302 of tracked-changes version). We agree that it would be interesting to express the truncations in a TbSmee1 RNAi background in order to simultaneously assay for targeting and function, but this was (unfortunately, perhaps) not part of the original experimental design. To do so now would require generating a completely new panel of truncation constructs with recoded DNA (in order to make them RNAi-resistant) and then generating a new panel of cell lines. While this would be informative, we feel that it would be impractical at present.

      In the analysis of viability changes due to TbSmee1 depletion (lines 237) the authors state that at "72 h post-induction showed widespread lysis, ..." This phenotype seems inconsistent with other related endocytic defect mutants. There is no further mention of this lysis phenomenon here or in the discussion and considering how unique this seems it deserves either additional data to demonstrate or further discussion as to the basis of the phenotype. It seems, at least from this study of TbStarkey1 and prior studies which result in the enlarged flagellar pocket phenotype, that having an enlarged pocket is not the cause of lysis and doesn't even naturally lead to a growth defect.

      Widespread lysis is the usual outcome of bloodstream form cells with strong endocytic defects - we have observed this directly for the clathrin, TbMORN1, and TbSmee1 RNAi cell lines, and it has been documented in a number of other publications (see for example Natesan et al., 2010, Manna et al., 2017). We have clarified this point in the text (see for examples lines 359-341, 474-478 of tracked-changes manuscript).

      The authors do not comment on what is the source for the cessation in growth following TbSmee1 knockdown. Is it nutrient depravation like in other endocytic defect mutants?

      Implemented (see for example lines 359-361, 605-610 of the tracked-changes manuscript). The source of the growth defect is likely to be due to impaired cell division cycle progression due to the gross enlargement of the flagellar pocket and subsequent steric hindrance and imbalance of membrane homeostasis.

      In the end, one of the most interesting observations made by the authors is that loss of TbSmee1 inhibits endocytosis and this has the appearance of not allowing large molecule substrates like ConA and BSA to enter into the flagellar pocket. This appeared to have nothing to do with a gatekeeping type function of the hook complex/flagellar collar and instead, as shown through clathrin knockdown, was related to the ability of the parasite to endocytose. There are a lot of potential interpretations of this phenomenon with one being a simple perturbation of the normal membrane trafficking to and from the flagellar pocket being involved. An analysis of knockdown of exocytic components might reveal whether or not this inability to enter into the pocket is also seen when exocyst proteins are also depleted. It may be impossible to tease apart these two interrelated activities but it might eliminate one side of the equation if these proteins can still enter the flagellar pocket when exocytosis if perturbed although this reviewer understands that that dimension of T. brucei membrane trafficking is poorly understood relative to endocytosis.

      This is an interesting point, and the reviewer is also correct in highlighting that exocytosis is far less characterised than endocytosis in Trypanosoma brucei. The exocyst has been characterised in bloodstream form T. brucei (Boehm et al., 2017) and shown to also have a role in endocytosis, so teasing out the relative contributions of these pathways would undoubtedly be challenging. We would prefer not to go in this direction in this present study, but it is an obvious avenue for future work.

      An intriguing possibility that the authors allude to and which if answered would make this manuscript have a far broader appeal is to determine if loss of TbSmee1 alters the lipid kinase distribution and if this is the source of the negative impact on endocytosis. One important dimension of endocytosis in T. brucei which remains poorly understood is the role of signaling machinery in triggering endocytic events. It is possible that the hook complex serves as the gatekeeping or signaling platform that recruits signaling components (like lipid kinases) that identify and/or modify the membrane lipid phosphatidylinositols harboring cargo laden receptors thus marking them for endocytosis within the pocket. It still seems unclear when in the process of endocytosis is the decision made to pull things into the pocket but it seems that the assumption is that this occurs deep within the pocket. This data suggests that there is possibly another decision point prior to being allowed entrance into the pocket. It may be that this isn't a gatekeeping decision but rather a stop vs. go activity where once cargo laden membrane reaches the collar a choice is made to pull this material in or not there and not after material is already in the pocket.

      These are all really interesting ideas and would be fascinating topics for future work.

      This obvious enigma based on the observation that loss of hook complex components affect the spatially separated site of endocytosis support the idea that the actual endocytic signaling platforms are located at the hook complex and that this area may make the membrane modifications that mark membrane as being ready to be endocytosed via clathin coated vesicles at the bottom of the pocket. This would still allow for fluid phase small molecule entrance which does not require binding to surface proteins. The obvious problems of having both endo/exocytosis occurring in the same close proximity makes the dissection of this phenomenon difficult but it is worth potentially expounding on further in the discussion as this idea is very appealing and adds an important dimension to our understanding of endocytosis in this organism.

      Implemented (lines 722-727 of the tracked-changes manuscript). We have added some more detail to these points in the Discussion. We agree with the reviewer that there are some profoundly interesting questions concerning membrane identify and membrane protein uptake here.

      Minor Critiques:<br /> The authors commit significant time to the analysis of the phosphorylation of TbSmee1, but there is little stated about the role of TbPLK in this activity or the potential connection of TbSmee1 phosphorylation to the cell cycle. Would a knockdown of TbPLK using RNAi potentially demonstrate an altered migration of TbSmee1 due to a lack of phosphorylation? An analysis of radiolabeled TbSmee1 using p32 in vivo would likely support this claim as well. Has mass spectrometry identified potential phosphorylation sites to examine? Additionally, the loss of TbSmee1 has been shown to disrupt localization of TbPLK in procyclic cells and so why this was not also assessed in bloodstream form cells subjected to RNAi was not clear.

      Partly implemented. We have added some discussion of the possible role of TbSmee1 phosphorylation in the cell cycle to the Discussion (lines 562-565 of tracked-changes manuscript), and emphasised the identification of phosphorylation sites in previous phosphoproteomics work (citations of Nett et al., 2009, Urbaniak et al., 2013). Given that the strongest and earliest effect of TbSmee1 depletion was on endocytosis and cargo uptake, we chose to focus on this angle rather than exploring its contribution to the biogenesis of cytoskeleton-associated structures and its interaction with TbPLK. For that reason we would prefer not to carry out the experiments looking at the effects of TbSmee1 depletion on TbPLK or vice versa.

      In the results section (lines 104-108) a model of the protein structure as predicted for example by AlphaFold might be informative and complement the domain analysis work depending on the quality of the prediction.

      Implemented. The AlphaFold prediction is consistent with the predictions made by the other structural analyses, and we have noted this in the text (lines 145-148 and 551 of the tracked-changes version).

      There is an arrow in the Figure 1B Western blot but I can find no mention of what it is trying to highlight in the text.

      Corrected.

      For Figure 1D there is no loading control or control for the distribution of the soluble fraction to validate the separation of the two compartments.

      Implemented. We have carried out additional experiments to show the partitioning of a cytoplasmic protein (the endoplasmic reticulum chaperone BiP) into the detergent-soluble fraction. These results are now displayed in the updated Figure 1.

      The authors fail to comment on the lack of changes in hook complex components they see to that observed by Perry et. al. 2018. This difference merits some minor comment or speculation.

      Implemented. We have added this commentary to the Discussion (lines 592-600 of the tracked-changes version).

      Line 228: domain should be capitalized.

      Implemented.

      Line 230: FigS5C should have a space and period after Fig. and S5C.

      Implemented.

      Line 244: "on" should be inserted in the sentence "...TbSmee1 protein depletion ON either side of the onset..."

      Implemented.

      Line 400: the '...20/21 h post-induction...' is slightly confusing and may read better as 20-21 h.

      Implemented.

      Line 463: a space is needed between '...2009).The...'.

      Implemented.

      Reviewer #2 (Significance):

      This manuscript advances our current conception of endocytosis in T. brucei. Although this model kinetoplastid parasite has been extensively studied with respect to endocytosis there is still a great deal we do not yet understand regarding how this process is regulated at a mechanistic level. This work has begun to connect previously unappreciated aspects of endocytosis in T. brucei by highlighting a potentially novel connection between the flagellar collar/hook complex and the physically separated endocytic events within the flagellar pocket itself. It may be that what appears as regulated entrance into the pocket is in fact the source of signaling that triggers the endocytic events carried out by clathrin. This is an interesting notion that no doubt requires further investigation which lies outside of the scope of this report. While this work appeals primarily to those studying kinetoplastids parasites it has the potential to provide insight into basic protozoan biology as well. Due to my related interest in kinetoplastid endocytosis, I find this work to be of high quality, conceptually interesting and employs many of the cutting-edge techniques currently available in the study of T. brucei.

      We are very happy that the Reviewer formed a favourable impression of the work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript begins to dissect the function of the hook complex protein SMEE1 in the mammalian infective form of T. brucei. The hook complex is a cytoskeletal structure associated with the flagellar pocket, the only site of endo/exocytosis in these cells. The authors demonstrate that SMEE1 is required for endocytosis in these cells and that this can occur with minimal change to the molecular make-up of the hook complex. The authors show that endocytosis is important for the access of large molecules e.g. ConA into the flagellar pocket.

      Major comments

      The key conclusion of this study are convincing and the data is generally well presented and clear. The interpretation of the figures matches well with the data presented - there are a few minor issues though that I have highlighted below in minor comments. The authors use a range of molecular cell biology approaches to define the role of SMEE1 and these are appropriate and are well controlled.

      Thank you.

      My major comment focuses on the use of different tracers to study endocytosis but the elephant in the room is what is happening to VSG as this is the surface protein that needs to rapidly removed from the cell surface and cleaned. Given the importance of removal of antibodies bound to the VSG - have the authors looked at this in the SMEE1 depleted cells? Do VSG-antibody complexes accumulate in this region? This is an important experiment as this would give key physiologically relevant data to this study. All the material should be readily available for this as there are a number of VSG antibodies.

      We agree with the Reviewer that the behaviour of these VSG-bound antibodies is a key test of the physiological relevance of the observations we have made using ConA and BSA, and have implemented this request - the results are in the new Figure 9. Although they sound simple, these assays turned out to be far from trivial and much more technically challenging than the other uptake assays, owing to the extremely fast kinetics (seconds) of anti-VSG uptake (Engstler et al., 2007) and the unexpectedly and incredibly high losses of bound antibodies during the assay. This might be due to shedding, as noted in the Discussion.

      Minor comments<br /> Perhaps I have been overthinking this but is surface-bound the right way to describe the cargo, as it clearly goes in both directions onto and off the surface and in fact the experiments in this manuscript are focussing on the removal of this material from the surface so is not surface-bound.

      We have clarified that "surface-bound" refers to material that binds to the surface glycoprotein coat of the trypanosomes and which is subsequently internalised, not material that is bound for (i.e being directed to) the cell surface (lines 77-78 of tracked-changes version). We hope this addresses the Reviewer's point?

      Have the authors investigated the structure of the protein using alphafold and if so how does that compare to the domain structure that was presented in this manuscript?

      Implemented (lines 145-148, 551 of tracked-changes version). We have checked the AlphaFold prediction of the three-dimensional structure of TbSmee1 and noted it in the Results; the prediction is consistent with the earlier bioinformatic analyses.

      The authors raised a number of antibodies to TbSMEE1 and TbSTARKEY1 but it was not clear in the figures which antibody was ultimately used for analysis by western and IF - could the authors clarify, as some looked to have a higher background than others. Line 150 states the same localisation was seen for all three antibodies and references S3C but I couldn't see that data presented.

      Implemented - the 304 antisera was used for most subsequent experiments and we have noted this in the M&M (lines 793-798 of tracked-changes version). Figure S3C shows that the Ty1-TbSmee1 recapitulates the localisation of the antibodies against the endogenous protein - we have clarified this point as well (lines 206-207 of tracked-changes version).

      Line 169 - can the authors provide more detail about the global correlation methodology as I was unable to follow the details in the methods? Is this a pixel per pixel correlation over the image or on a selected region over the area of potential signal overlap? In figure 2E it appears that BILBO1 signal correlates more closely with the SMEE1 signal than MORN and LRRP1 and from the images that would not seem to be the case. Have I interpreted this figure incorrectly?

      Implemented. The original analysis was a global correlation analysis that was determining whether the signals were correlated with each other regardless of spatial overlap, and we agree with the reviewer that these outputs were non-intuitive to interpret. In the revision, we have carried out a new analysis (and updated the accompanying text and M&M section), measuring the degree of spatial correlation between each pair of signals on a pixel-by-pixel basis over the area of each cell, with a total of 30 cells analysed in each pairing. We believe that this addresses the reviewer's point. See lines 223-243, 963-974 of the tracked-changes version).

      The authors have generated a number of different clones and performed experiments on these clones generally more than twice, which is clearly explained in the figure legends but in places the data is then put together and it is difficult to know which experiments/clones it comes from - for example 7C/7F what do those percentages represent? Is this the sum of all experiments? A representative experiment? How many cells per experiment were analysed?

      Implemented. We have double-checked all the figure legends and clarified this point where necessary. Quantifications were always made by compiling data from multiple independent experiments using multiple separate clones - see in particular lines 1323-1324, 1363-1365, 1380-1382 of the tracked-changes version.

      Line 200 - From the image it is not convincing that SMEE1 is slightly behind DOT1 - I agree it looks enveloped but would appear level with the distal end of the DOT1 signal.

      Implemented. We have adopted the Reviewer's wording for this text (line 271 of tracked-changes version).

      For the truncation experiments the authors should explain that these are performed with cells in which the endogenous SMEE1 will be expressed and this may influence the localisation of the truncations, especially as there is no information about whether SMEE1 forms complexes with itself or other proteins.

      Implemented (lines 296-298 of tracked-changes version).

      Figure 4D - should be 1 not T-

      We have relabelled this as "TbSmee1". The values in this column are the immunoblot signal intensities obtained for the endogenous TbSmee1 protein in the -Tet condition. We have also clarified this in the figure legend.

      Line 223 - given the low expression of constructs 2 and 9 I'm not sure it is possible to infer anything from the lack of localisation of these constructs as they appear unstable and would be unlikely to localise to a specific location.

      We have added this caveat to the text (lines 558-562 of tracked-changes version).

      Figure S7 - The images presented were not convincing that there was a reduction in the localisation of LRRP1 to the hook complex on depletion of either SMEE1 or MORN1. The difference looks particularly minor if present at all.

      Agreed, there was some debate in the group about these results. We have changed to text to fit the Reviewer's interpretation (lines 347-348 of the tracked-changes version).

      Line 264 - "implied that the lethal phenotype might be due to a loss of function" - this seems an odd thing to say as it doesn't provide any insight as of course the phenotype is due to a loss of function.

      We have clarified this point (lines 350-353 of the tracked-changes version). We would however disagree with the reviewer that RNAi phenotypes are exclusively due to a loss of individual protein's function(s) - when proteins are present in multiprotein complexes (as is often the case with cytoskeleton-associated proteins), then destabilisation of the complex due to loss of the entire protein can cause the observed phenotype, rather than the loss of the function performed by the individual protein within the complex (this may be a semantic point, however). A very good example of this is with the outer arm dynein complex component LC1 (Ralston et al., 2011) - RNAi against LC1 is lethal because the entire outer arm dynein complex is destabilised, whereas expression of non-functional mutants of LC1 produces viable cells with motility defects due to the specific loss of LC1 function.

      Line 412 - can the authors clarify what they mean by geometric problems?

      Implemented (lines 605-610 of tracked-changes version). We were referring to the fact that enlargement of the flagellar pocket will probably create difficulties for the progression of the cell division cycle.

      Throughout the manuscript can you use log scale for the growth curves.

      Implemented.

      Line 756 - add citation

      Whoops! Implemented (line 1058 of tracked-changes version).

      Line 465/66 - the authors states that the ability of the fluid phase cargo being still able to enter the pocket is evidence that the channel lumen is still open; however, I would think that despite the close apposition of the cell membrane to the flagellar membrane in the flagellar pocket neck region this would be unlikely to impede fluid/soluble material from entering the pocket, as presumably VSG protein can move through this region. This does not alter the ultimate conclusion the authors are drawing but without microscopy evidence for the state of the channel lumen it is difficult to be sure of its status.

      Fair point. We have modified this statement (line 701 in tracked-changes version).

      Reviewer #3 (Significance):

      The flagellar pocket is the key portal into and out of the trypanosome cell and as such has a vital role to play in host-parasite interactions. The flagellar pocket is supported by a number of cytoskeletal structures including the hook complex and the role of these structures in flagellar pocket function are poorly understood. The flagellar pocket is particularly important in the bloodstream form of the trypanosome parasite which infects the mammalian host as it is the route for the surface protein VSG to get onto and off the surface. The VSG is required for antigenic variation and the removal of VSG-antibody complexes helps 'clean' the surface of the parasite. SMEE1 is a component of the hook complex and the manuscript here dissects its role in the mammalian infective parasite and shows that it is vital for the endocytosis of material off the surface. Intriguingly, a block in endocytosis causes a blockage of material outside of the pocket, suggesting a multi-step process in the regulation of uptake of material from the parasite's surface.<br /> This manuscript will be of specific interest to those researchers investigating the long-term persistence of these parasites in the mammalian host. There are potentially some insights into the control of membrane domains for endocytosis that are of interest to more general cell biologists as well.

      We are very grateful to the reviewer for the supportive comments and the constructive evaluation. Many thanks!

      Expert in molecular cell biology of trypanosomes and Leishmania.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Unfortunately, this paper adds only a little to our understanding of uptake in to the flagellar pocket of trypanosomes. It tends to add only detail to information that has been well characterised elsewhere and indeed, as the authors themselves point out, (lines 92-98) it is rather incremental. Not only has Tbsmee1 been studied before but this data in bloodstream forms is not particularly novel since it gives much the same information as the canonical hook protein TbMORN.

      This work follows the pattern of conclusions made previously with the protein TbMORN. It focusses on the protein TbSmee where RNAi mutants are interpreted to show flagellar pocket enlargement and impaired access by surface bound cargo. Unfortunately, there is little mechanistic or functional conclusion to the study in terms of how TbSmee operates naturally in the cell. There are other possible explanations for the phenotype. That would need to be studied. This large flagellar pocket phenotype is seen with RNAi mutants of many different types of proteins in the trypanosome and so pleiotropic effects are highly likely.

      Also, there are a good number of alternative possibilities to account for reduced access to the pocket in these mutants and this data could be usefully added.

      Specific points

      1. The transient location for the TbSmee at the FAZ tip - or in this case the groove region - was seen in procyclics (Perry, 2018) so this bloodstream indication merely confirms that concept.
      2. The C terminal region required for targeting is a reasonable deletion analysis of regions of the protein. But can this data (line 228) be said to "mediate targeting" - or is it just required. For instance, targeting might be OK but it might be needed for stable association, etc etc.
      3. This protein has already been shown to be phosphorylated and the sites and cell cycle possibilities have been mapped by Urbaniak. So that section adds little. https://doi.org/10.1371/journal.ppat.1008129
      4. Essentiality in BS forms and pocket enlargement. This is not surprising. A very large number of cytoskeletal proteins show this in RNAi knockdown. Flagella mutants (extensive publications from many groups (Hill, Bastin, Gull, etc) over last 15 years show this very well and so this protein is just one more example.
      5. I didn't find that the explanations for flagella pocket enlargement are soundly based. The experiments focus on endocytosis and uptake and ignore other plausible reasons and some evidence in literature.<br /> Lines 84/85. Enlarged pockets may be indicative of endocytosis failure. Presumably the rationale is that endocytosis fails, but exocytosis still occurs and the pocket membrane enlarges. What evidence is there that exocytosis of membrane still occurs? This simple concept might indeed operate in a clathrin mutant but is surface membrane/content exocytosis is maintained in these cytoskeleton mutants? There is good evidence for glycoconjugates within the flagellar pocket. Are these depleted or present still?
      6. There are also a number of other publications indicating that clathrin pits are still present on the enlarged pockets of various mutants when viewed by EM. The authors have looked at the flagellar pockets by EM but the EM methods described have extensive washings and centrifugations before fixation. This is a very poor approach and will mean that endo and exocytic traffic is disturbed (extensive references in literature in other systems? This is not a useful approach for exo/endocytosos studies where flux of traffic demands fast chemical or freezing fix in media.
      7. The EMs and Light microscopy does show that the mutant pockets are substantially abnormal in their cytoskeletal arrangement. They have multiple flagella profiles, flagella structures have not connected with the membrane and are sometimes in the cytoplasm (see a glance of the paraflagellar rod in the cytoplasm in FigS5C and internalised FAZ attachment plaques in Fig 4 D bottom right cell). Given these extensive (and expected) cytoskeletal abnormalities it is highly likely that these pocket abnormalities are a result of motility, cell division/developmental issues and the differential uptake phenotypes merely consequential.
      8. The authors speak about early phenotypes , but these are often at 15-24 hours. That is probably a couple of cell cycles and so not early. In relation to the above question of comparison to the same morphology produced by flagella mutants it would be good to know if these hook mutants produce motility phenotypes and whether these are manifest before the uptake phenotypes. There is evidence (cited here) that forward motility of the trypanosome directs material on surface into the pocket. If these cells have motility defects (primary or via failed division) then surely that would provide an alternative simple explanation for uptake differences.
      9. There is a general point that if studies are to have real relevance to uptake in the trypanosome then they need to deal with uptake of natural ligands rather than artificial surrogates such as dextran. Such tracers were used historically, but in the last decade a series of receptors and ligands for fluid phase and particularly membrane mediated endocytosis have been discovered. With the investment of a little time these important ligand / receptors such as haptoglobin, transferrin, etc would be much more relevant.

      Referees cross-commenting

      This session includes comments from Reviewer 1 and Reviewer 2.

      Reviewer 2

      Dear Reviewers 1 and 3:<br /> I agree with many of the points with Reviewer 1 and our divergence is partly a matter of degree. While it is true that this manuscript is incremental in its contribution to our understanding of TbSmee1, it nonetheless adds to our understanding of the role of this protein in the bloodstream life stage and because of that I find value in the work. The fact that it mirrors what was seem in other protein knockdown studies (e.g. TbMORN) doesn't negate its contribution for me. Reviewer 1 makes an important point, however, when stating that this work does not add a mechanistic or functional conclusion as to how TbSmee1 operates and for me that is the biggest shortcoming of the work. Offering mechanistic insight is a high bar and while it would make for a much more exciting story it does not discount the value of the work as presented. What I do appreciate is the speculation about this observation that endocytosis is required for entrance of surface bound material into the pocket and although they are unable to show that this is not a side affect of other processes being disrupted it is and intriguing point. These observation have the potential of stimulating further investigations into crosstalk between the entrance to the pocket and endocytosis. I also agree that the use of ligands for known receptors like transferrin would be far more informative. While I assumed the transferrin receptor was in the pocket itself it would be interesting to see if the ESAG6/7 is also located outside the pocket and transiently binds cargo before being brought inside for endocytosis.<br /> I think that Reviewer 3 brings up a great point with the focus on VSG's. I think that examining VSG turnover in these mutants can add value to the analysis and inform our view of how affecting the hook complex alters VSG endocytosis.

      Reviewer 1

      some fair comment and agreement. This is being sent to general cell biology journals.<br /> when one looks at this area in the round it is nearly 50 years (1975) since Langreth and Balber published their seminal work on protein uptake and digestion in bloodstream and culture forms of T. brucei. There has been 50 years intense study and the genome has been around for nearly 20 years as well. So, put simply - for both a general science audience and the wider parasite community - if this is a paper about one protein, TbSmee1,then it has surely has to say something functional about that protein. If it is a paper about uptake in trypanosomes (where mutants are one means of interrogation) then it surely has to say something about mechanisms of uptake of physiological relevant ligands. The days of dextran etc are past. Hence, my comment that this does neither and so is very incremental to what is known already. It is 2022 not 1975. Langreth and Baber published their seminal work in J Protozoology for very good reasons no doubt.

      Reviewer 2<br /> Thank you for replying and I agree with the spirit of your critique. My only comment, which could result from my own naivete, is to say that despite the incredible work that has been done in dissecting endocytosis in T. brucei over these past 50 years, it appears that we still do not understand how many fundamental of aspects of this activity works in this parasite. Even basic questions regarding how cargo, e.g. transferrin, binding to surface receptors is sensed by the parasite remains unknown and the identity of the specific signaling components which transmit this information internally to initiate endocytosis have not been characterized. In many ways it seems that we don't even understand how the parasite partitions the end/exocytic pathways in the pocket and maintains membrane homeostasis. While we know that some kinases and traditional signaling components must be involved, a high resolution understanding of this process in T. brucei seems lacking. I only say all this to suggest that the field maybe isn't yet that advanced to reject work of this type as so many mechanistic unknowns still remain to be uncovered and maybe incremental advances and phenomenology still can add value to the field. However, I respect your opinion on the matter and my perspective could be due to a lack of a full appreciation of the literature on the subject.

      Significance

      Unfortunately, I did not find tis to be very significant. It covers old ground in terms of the phenotype described. Many groups have shown the differences between pro cyclic and bloodstream phenotypes in this enlarged pocket phenomenon. The work is rather incremental from these and other author's work on these hook proteins.

      There are alternative explanations for understanding the effect of flagella pocket structure and uptake of ligands into the pocket and trypanosome cell. These would need to be studied before one could see a functional, mechanistic link established.

      Other parts of this are of nicely done but do not move on our understanding (eg targeting/phosphorylation) from what has been done previously.

    1. AbstractRecent advances in genome-wide association study (GWAS) and sequencing studies have shown that the genetic architecture of complex diseases and traits involves a combination of rare and common genetic variants, distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results.Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape, with the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects. These plots, which we call ‘trumpet plots’, can help to provide new and valuable insights into the genetic basis of traits and diseases, and can help prioritize efforts to discover new risk variants. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex diseases and traits, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we have developed an R package ‘TrumpetPlots’ and R Shiny application, available at https://juditgg.shinyapps.io/shinytrumpets/, that allows users to explore these results and submit their own data.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.89) and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Clara Albiñana **

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. Although there are no explicit guidelines for contribution in the manuscript or website, it is true that by placing the project on gitlab it is possible to contribute to the project / open issues.

      Is the code executable?

      No. Unfortunately, I wasn't able to install the R package. I have now opened an issue on the gitlab page so that it can hopefully get solved.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. It is very common for new R packages to just use devtools for installation.

      Is the documentation provided clear and user friendly?

      Yes. The requirements for generating a trumpet plot just involve providing a set of GWAS summary statistics with column-specific names, together with the GWAS sample size. This is very common for GWAS summary statistics-based tools. I think it is fine for the R package to require re-naming the columns to fit the format, as one already needs to upload the file into R. However, I find it inconvenient to have to re-save the summary statistics file with different name-columns for the shinyapp tool. Providing e.g. column indexes alone would be much more user-friendly.

      Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?

      No. I cannot answer this question until I can install the tool.

      Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

      Not applicable. There are no existing comparable tools.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      Yes. I can see there is a toy dataset included with the R package.

      Additional Comments:

      I think the manuscript is very clear and good at making the point of the utility of the software. The proposed trumpet plots are very visually appealing and can be useful to characterise the genetic variation of diverse phenotypes. The novelty of the trumpet plots, as compared to previously proposed effect size vs. allele frequency plots, is the use of positive and negative effect sizes, making it look like a trumpet. I also appreciate the style decisions in the standard generated plots, with a nice visually-appealing color scheme and design.

      On the use of the software, I have focused my testing on the R package, which I was not able to install. The shinyapp is very useful for visualising the existing, pre-computed trumpet plots, but I do not find it very useful for generating user-uploaded summary statistics for the reasons I mentioned above. Another comment on the ShinyApp is that I appreciate the possibility to download the plots but it would be very useful to include the name of the visualized phenotype as the plot title, for example, to avoid confusion when downloading multiple plots.

      I also found an incorrect sentence in the abstract, which is think should be reversed: " The proposed plots have a distinctive trumpet shape, with the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects".

      **Reviewer 2. Wentian Li **

      Is the documentation provided clear and user friendly?

      No. Many aspects of Fig.1 are not explained.

      Overall Comments: Plots with allele frequency as x axis and effect size (e.g. odds ratio) as y axis is a very common display of the contribution from both common and rare alleles to genetic association. A schematic form of this plot is practically on almost everybody's presentation slides when introducing this topic (to see an example, see, e.g. Science (23 Nov 2012), vol 338(6110), pp.1016-1017 ). Considering how many people have already been familiar with this type of plot, I feel that very little new is added in this paper: maybe only a new name ("trumpet"), and/or the power lines. The other methods contributions (log-x, one variant per LD, avoiding gene-level statistics) are rather straightforward. People without experience with "shiny" (R package) can still use ggplot2 or plot in R to get the same result. Generally speaking, I think the paper is weak, though OK as a program/package announcement.

      Major comments: * I think the trumpet shape (increase of "effect size" for rare variant) is probably a direct consequence of using odds-ratio as a measure of effect size. If the allele frequency in normal population is p0, that in disease population is p1, [p1/(1-p1)]/[p0/(1-p0)] ~ p1/p0 tends to be large for small p0's, simply because the denominator is small. On the other hand, if population attributable risk (p0(RR-1)/(1+p0(RR-1))) is used as the y-axis, I am uncertain what the shape of the plot would be.

      • A risk allele has these pieces of information:
      • allele frequency,
      • effect size (e.g. odds ratio),
      • type-I error/p-value,
      • type-II error/power. The plot in this paper show #1 vs #2 and #4 being added as extra. In another publication with a proposal to plot genetic association results (Comp Biol. and Chem. (2014), 48:77-83 doi: 10.1016/j.compbiolchem.2013.02.003), #2 is against #3 with #1 being an added extra. I'm sure using other combinations could lead to other types of plots. The authors should discussion/compare these possibilities.

      Minor comments: In Fig.1, the size of the dots, the brown vs cyan color, the discontinuity of scatter dots around 0.01, are not explained.

      Re-review:

      I have read authors' response and I'm mostly satisfied. Only two minor comments: * Witte 2014 Nature Rev. Genet. article summarizes the point I tried to make well. I understand that rare variants should have a relatively higher effect from an evolutionary perspective, but since these are rare, their individual or even collective contribution to a disease in the population is still small. A casual reader may not realize this point and I think it would be helpful to cite Witte's article. * My minor comment on Fig.1 is still not addressed: there seem to be more points on the right side of p=0.01 line than the left side. Why this discontinuity? (the added text in Revision is about the color and size of the dots, not about this discontinuity)

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. However, the claim that these findings are of relevance to psychosis risk and have implications for policy changes are only partially supported by the results.

      We appreciate the feedback and insightful suggestions from the editor and reviewers, which aided us to improve the manuscript. We believe the concerns initially raised were mostly due to areas that needed further clarification, which we have now clarified in this revised version. Our primary contribution lies in our meticulous analytical approach aimed at minimizing confounding effects and providing more precise estimates of the genetic and environmental impact on children's cognition and psychology. This method differs from the widely used general linear modeling in the field, which, in our opinion, may not be the optimal strategy for large-scale data analysis. Our comprehensive, tutorial-style description of the methods might serve as a valuable resource for the community.

      Regarding the critique that our findings 'partially support the relevance to psychosis risk,' we have updated our manuscript to more accurately reflect this feedback. We have altered the narrative to indicate that psychotic-like experiences (PLE) are associated with the risk for psychosis, a connection substantiated by prior studies cited in our manuscript.

      Similarly, in response to the comment that our findings 'partially support implications for policy changes,' we have nuanced our conclusion. However, we would like to emphasize our discovery that a negative genetic predisposition impacting cognitive development (i.e., low polygenic scores for cognitive phenotypes) can be counteracted by a positive school and familial environment. We believe that this finding could have meaningful implication for policy making and is robustly supported by our analyses.

      We hope this revised manuscript more accurately reflects our research findings and its significances. Lastly, we would like to express our gratitude for your fair and detailed review process. Our experience working with eLife has been incredibly rewarding, and we commend your dedication to an encouraging and progressive publishing culture.  

      Public Reviews:

      Reviewer #1

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the longitudinal paths tested in this study in Intro [line 149~159]. We also added a figure of the proposed path model (Figure 1) [Methods: line 231~238].

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We adjusted and moderated the use of causal languages throughout the manuscript.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We clearly specified the limitations of our study. These included concerns about the representativeness of the ABCD samples, of the limited scope of longitudinal data, and the use of non-randomized, observational data [line 524~544].

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      Thanks for the clarification question. We report the Pearson’s correlation coefficients between the PLEs [line 198~200]. (The Reviewer #1 may have referred to the prior version of our manuscript submitted elsewhere, for this point has been already addressed in our initial submission to eLife).

      • What was the correlation between CP and EA PGSs?

      The Pearson’s correlation between CP and EA PGS was 0.4331 (p<0.0001). We added the statistics to the manuscript. [line 214]

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We appreciate the reviewer’s insightful feedback. Acknowledging the role of both CP and EA PGSs in our study, we agree with the observation that EA PGS goes beyond gauging cognitive aptitude—it also serves as an indicator of societal influences and inequalities. The multifaceted nature of EA PGS could be the reason underlying the stronger correlation with PLEs compared to CP PGS. In response to this feedback, we revised our introduction to articulate the multifaceted role of EA PGS in more precise terms. For supporting our assertions, we have included references to prior studies (Abdellaoui et al., 2022) [line 131~142].

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We thank the reviewer for the thoughtful comments. We acknowledge that the correlation between schizophrenia PGS and PLE may not be exceedingly high, as evidenced by previous work, including analyses from the ABCD study. However, we would like to emphasize our rationale for adjusting schizophrenia PGS in the sensitivity analyses. Our study design stemmed from the established associations between PLEs and increased risk for schizophrenia. Existing studies have reported significant associations between schizophrenia PGS and cognitive deficits in both psychosis patients (Shafee et al., 2018) and people at risk for psychosis (He et al., 2021). Notable, the PGS for schizophrenia has shown significant associations with PLEs, arguably more so than PGS for PLEs itself (Karcher et al., 2018). Our updated manuscript has incorporated these references to improve clarity. [line 307~309]. By adding this layer of adjustment, we believe that our mixed linear model more precisely examines the relationship between the cognitive phenotype PGS and PLEs, in terms of both sensitivity and specificity.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      Please note that we have clarified our FDR correction in the methods

      As detailed in the method section [line 254~255], we applied False Discovery Rate (FDR) correction for multiple testing across nine key variables in the study: PGS (CP or EA), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was made in our additional sensitivity analysis, where we included schizophrenia PGS in the linear mixed model for adjustment, thus the FDR correction was applied across ten key variables instead. Overall, the application of FDR correction had minimal impact on our findings. Most associations between the key variables and the outcomes that were originally marked as highly significant sustained their significance after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the positive feedback and constructive suggestions, which only serve to improve and strengthen our manuscript. We have incorporated the suggested corrections and clarifications in response to the reviewer's suggestions. We believe that these changes will not only enhance the overall readability but also more effectively emphasize the significance and implication of our work.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We appreciate the reviewer’s concern regarding the role of cognitive ability in relation to schizophrenia symptoms. We are aware that cognitive ability often serves as a mediator of psychotic-like experiences. However, to our best knowledge, a growing body of research has proposed that cognitive ability can mediate positive symptoms in schizophrenia including psychotic-like experiences. The studies by Howes & Murray (2014) and Garety et al. (2001) suggested that deficits in cognitive ability can potentially contribute to the manifestation of positive symptoms such as psychotic-like experiences. We have elaborated on this aspect in the Introduction section [line 104-115].

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      We appreciate the reviewer’s observation. Indeed, we were unable to utilize the most recent or the largest GWAS for cognitive performance, educational attainment, and schizophrenia due to the timeline of our study. Regrettably, the commencement of our study preceded the publication of the ‘currently’ the largest or most recent GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). To eliminate any potential confusion, we adjusted the text to specify that our study used 'a GWAS of European-descent individuals for educational attainment and cognitive performance' rather than the largest GWAS [line 206~208].

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Thank you for pointing this out. Consistent with the illustration of neighborhood SES in the Methods, higher values of neighborhood SES indicate risk [line 217~228]. In the original Figure 2, higher value of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method [line 266~268].

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement: We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more clear and digestible format. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      We believe that with the improvement we made in this revised manuscript, this concern may have been successfully mitigated.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We appreciate the feedback provided in the reviewer's impact statement. We added as study limitations [line 524~544] that the impact of our findings may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above (and also in more detail in the Reviewer #2’s Recommendations For The Authors section below), we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

      Nevertheless, we would like to discuss some points. We sincerely hope this following response does not come across as argumentative to the reviewer and the editor. We fully understand the reviewer's perspective on this matter, and we agree that the issues raised about the ABCD study are absolutely valid. However, when evaluating the overall impact of a study, other factors, such as how the field has been assessing the impact of similar studies, should also be considered.

      Firstly, the potential selection bias and attrition in the ABCD data may not necessarily limit the conclusions of this study. While recognizing the potential issues with the ABCD data is important, we feel that judging the impact of our findings as "limited" based on these issues may not be entirely fair. This is because no study, particularly those of a nationwide scale such as the UK Biobank, IMAGEN, HEAL, HBCD, etc., is completely free of limitations. Typically, the potential limitations of the data don't undermine the impact of individual studies' findings. Numerous studies using ABCD data have been published in top-tier journals—despite the limitations of the ABCD study—underscoring the scientific merit of the findings. For example, the study by Tomasi, D., & Volkow, N. D. (2021), entitled "Associations of family income with cognition and brain structure in USA children: prevention implications," published in Molecular Psychiatry, might be highly relevant to the limitations of the ABCD study raised by the reviewer. The scientific community, including editors, reviewers, and readers, may have appreciated the impact of this study despite the acknowledged limitations of the ABCD data.

      Secondly, the two-year time window of our longitudinal analysis might not impact the aim of this study—an iterative assessment of the associations between genetic and environmental variables with cognitive intelligence and mental health, with a focus on PLE, in preadolescents. Had we aimed to test the developmental trajectory from childhood to adolescence, perhaps a longer timeframe would have made more sense. So, we do not agree with the reviewer’s assessment that the short time window limits the impact of our study.

      Suggested revisions based on the combined reviewer feedback:

      1) The terminology used should be carefully reviewed and revised

      • Please use the correct terminology for the key concepts assessed in this study. For example, authors sometimes conflate PLEs and psychosis, two related but separate constructs. Furthermore, the terms 'good parenting' and 'good schooling' are vague and subjective.

      • The authors use multiple terms to refer to cognitive ability (cognitive capacity, intelligence, cognitive intelligence, etc). The term 'cognitive development' in the title and manuscript does not seem to be justified given the focus on different measures of cognitive ability at a single time point (i.e. baseline).

      • Please avoid causal language and using statements that cannot be entirely substantiated (e.g. unbiased estimates, free from unmeasured confounding)

      Thank you for suggesting this point. We revised all key terminologies used throughout our manuscript.

      Per your suggestion, we specified that PLEs indicate the risk of psychosis and often precede schizophrenia. We checked all misused cases of the term ‘psychosis’ and corrected them as ‘PLEs’. We also changed the terms 'good parenting' and 'good schooling' to ‘positive parenting behavior’ and ‘positive school environment’.

      We changed the term ‘cognitive development’ to ‘cognitive ability’ throughout our manuscript. We also changed the title to ‘Gene-Environment Pathways to Cognitive Intelligence and Psychotic-Like Experiences in Children’ because we used ‘cognitive intelligence’ for NIH toolbox variable in the text.

      We corrected and tone-downed all causal languages used in our manuscript. As mentioned by the reviewers, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      2) A stronger rationale for the focus on PLEs, and the potential mediating role of cognitive ability in genetic and environmental effects on PLES, should be provided

      We appreciate the raised concerns that cognitive ability may serve as a mediator of psychotic-like experiences. To our best knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included an additional justification in Intro [line 104~115] where we highlighted that cognitive ability has been proposed as a potential mediator of genetic and environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      3) As described in more detail by the reviewers, more information should be provided about the measures used in the study and how they relate to one another (e.g. correlations between PQ-BC and CBCL; PGS-CA and PGS-EA).

      Thank you for your suggestion. Although this information was already provided in our initial submission, it appears that the Reviewer #1’s might have referred to the prior version of our manuscript submitted elsewhere before eLife.

      To clarify, our findings reveal significant Pearson’s correlation coefficients between PLEs across all time-points (baseline year: r=0.095~0.0989, p<0.0001; 1-year follow-up: r=0.1322~0.1327, p<0.0001; 2-year follow-up: r= 0.1569~0.1632, p<0.0001) and we added this information in the Method section [line 198~200]. We also added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods for PGS [line 214].

      4) More details are needed regarding the analytical strategies used (e.g. how imputation was performed, why PGS were not based on the largest and most recent GWASes, whether latent or observed variables were examined, what exactly the supplementary materials show and how they relate to information provided in the main text).

      We appreciate your feedback. We acknowledge the concerns about the GWAS sources utilized for the study. Unfortunately, our study commenced prior to the publication of the ‘currently’ most recent or largest GWAS by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the largest GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). We have now clarified this point in the manuscript. [line 206~208]

      Also, we specified the use of composite indicators for the PGS, family SES, neighborhood SES, positive family and school environment, and PLEs, while latent factors were used for cognitive intelligence [line 269~285].

      We highly appreciate the reviewer’s comments regarding the supplementary materials. We regret overlooking the citation of Table S2 in the main manuscript, and this has now been rectified in the Results section for the IGSCA (SEM) analysis [line 376]. The remaining supplementary tables (Table S1, S3~S7) have been correctly referenced within the manuscript. We acknowledge that the supplementary materials are extensive due to the comprehensive array of study variables and intricate results from each analysis. However, given that our analyses encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too crucial to exclude from the supplementary materials. That said, we are open to any further suggestions to make our supplementary results more accessible and digestible. In order to improve the accessibility and clarity of our presentation, we are fully committed to making any necessary changes and look forward to any further recommendations.

      5) The limitation section should be expanded and statements regarding the implications of the study findings should be qualified accordingly (e.g. short follow-up period, potential for attrition and selection bias, reverse causation, etc)

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data (baseline year, 1-year, and 2-year follow-up), possible sample selection bias, and the use of non-randomized, observational data [line 524~544].

      6) Please ensure that the references provided support the statements in the text to which they are linked to.

      Thank you for pointing this out. We thoroughly went over all citations and corrected the inaccurately or vaguely cited references for each statement.

      Reviewer #2 (Recommendations For The Authors):

      1) Please use terms consistently and correctly. E.g., 'cognitive capacity' is not the same as 'educational attainment'.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      2) The authors study 'cognitive performance using seven instruments', but it is not clear how fluid and crystalline intelligence was defined/operationalized.

      Thank you for pointing this out. We specified the NIH Toolbox tests used for composite scores of fluid and crystallized intelligence, respectively. “We utilized baseline observations of uncorrected composite scores of fluid intelligence (Dimensional Change Card Sort Task, Flanker Test, Picture Sequence Memory Test, List Sorting Working Memory Test), crystallized intelligence (Picture Vocabulary Task and Oral Reading Recognition Test), and total intelligence (all seven instruments) provided in the ABCD Study dataset” [line 180~187].

      3) I don't think Lee 2018 is the largest GWAS for educational attainment. That would be Okbay 2022. It needs to be described how cognitive performance was defined in Lee 2018. Why did the authors not use the Trubetskoy 2022 schizophrenia GWAS?

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for CP, EA and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS [line 206~208].

      4) It is unclear how neighbourhood SES was coded. The authors seem to suggest that higher values indicate risk, but Figure 2 suggests that higher values links to higher intelligence and lower PLE.

      Thank you very much for pointing this out. Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~-0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      5) Also, the 'year of residence' variable is unclearly defined. Does this mean that a shorter duration of residency (even in a good neighbourhood) indicate risk?

      Thank you for mentioning this point. Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      6) Please provide information on how correlated the two PGSes were.

      Thank you for your suggestion. We added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods section for PGS [line 214].

      7) Information on the outcome variable in the 'linear mixed models' section is missing. I assumed it was PLE.

      Thank you for notifying us of this point. We added the information on the outcome variables in the section for linear mixed models [line 242~244].

      8) In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?

      Thank you for your comment on the need to elaborate the IGSCA method. We added that different from standard SEM methods which only uses latent factors, the IGSCA method can use components as well as latent factors as constructs in model estimation. This allows the IGSCA method to control bias more effectively in estimation compared to the standard SEM [line 261~268].

      9) The sentence starting line 229 is unclear. Does this mean variables were not used to generate latent factors. And if not, what weights were used to create a 'weighted sum'?

      Thank you for mentioning this point. The sentence means that we treated PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables), while general intelligence was represented as a latent factor.

      It has been suggested from prior studies that these variables (PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. Based on this literature, we used components for these variables.

      The IGSCA determines weights of each observed variable to maximize the variances of the endogenous indicators and components [added in line 265~268].

      On the other hand, we treated general intelligence as a latent factor/variable underlying fluid and crystallized intelligence. This is based on the extensive literature of classical g theory of intelligence [added in line 269~284].

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      10) It is overall not clear when genetically and when self-reported information of ethnicity was used. This needs to be clearer throughout.

      Thank you for mentioning this point. We only used genetically defined ethnicity, and we have not mentioned that we used self-reported ethnicity. Per your suggestion, we clarified that we used ‘genetic ethnicity’ throughout the paper.

      11) The sentence starting line 253 is also unclear. How is schizophrenia PGS a 'more direct genetic predictor of PLE' and compared to what other measure?

      Thank you for pointing this out. Please note that our adjustment (or sensitivity analyses) was based on the reported associations between PLEs and the risk for schizophrenia: schizophrenia PGS is associated with a cognitive deficit in psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and psychotic-like experiences (more so than PGS for psychotic-like experiences) (Karcher et al., 2018). We added these references for clarification [line 307~309]. We believe that because of the adjustment our results from the mixed linear model show the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      12) Please include a statement on the assumptions made when using the method used in this study and developed by Miao 2022, explain what evidence you have to support these assumptions and how this method, which I believe was developed for RCTs, can be applied to observational data.

      We specified the assumptions for the causal inference method proposed by Miao et al. (2022) and why it is applicable to our study. Also, we noted that this novel method was developed to identify the causal effects of multiple treatment variables within non-randomized, observational data [line 309~319].

      13) Some of the statements are potentially misleading. E.g., I would be very cautious to claim that the methods applied allowed the authors to estimate 'unbiased associations again potential (even unobserved) confounding variables'. There are many concerns such as selection bias, attrition, reverse causation, genetic confounding, etc that cannot be addressed satisfactorily using these data and methods.

      Thank you for pointing this out. We deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      Nevertheless, please note that due to some limitations in the data (e.g., confounders), an analytic approach should be robust enough to handle potential violations of assumptions. This was the point we wanted to emphasize--In contrast to the majority of studies using the ABCD study, which employ simplistic GLM or conventional SEM with only latent variable modeling, our study provides less biased, thus more accurate, estimates through the use of sophisticated modeling for confounding effects (instead of simplistic GLM) and IGSCA (instead of conventional simplistic SEM). We hope our study may help improve our analytical approach in this field.

      14) I would be equally cautious to claim that the ABCD study is representative. Please add information on the whole ABCD cohort to Table 1 and describe any relevance with respect to attrition effects or representativeness.

      Thank you for highlighting this issue. We previously characterized the ABCD Study as representative of the US population, given its aim to ensure representativeness by recruiting from a broad range of school systems located near each of its 21 research sites, chosen for their geographic, demographic, and socioeconomic diversity. Using epidemiological strategies, a stratified probability sample of schools was selected for each site. This procedure took into account sex, race/ethnicity, socioeconomic status, and urbanicity to reduce potential sampling biases at the school level. Based on these strategies, previous research (e.g., Thompson et al., 2019; Zucker et al., 2018) has referred to the ABCD Study as ‘representative.’ However, we overlooked the fact that “not all 9-year-old and 10-year-old children in the United States had an equal chance of being invited to participate in the study,” and therefore, it should not be deemed fully representative of the US population (Compton et al., 2019). Heeding your suggestion, we have removed all descriptions of the ABCD Study being representative.

      Compton, W. M., Dowling, G. J., & Garavan, H. (2019). Ensuring the Best Use of Data: The Adolescent Brain Cognitive Development Study. JAMA Pediatrics, 173(9), 809-810. doi:10.1001/jamapediatrics.2019.2081

      Thompson, W. K., Barch, D. M., Bjork, J. M., Gonzalez, R., Nagel, B. J., Nixon, S. J., & Luciana, M. (2019). The structure of cognition in 9 and 10 year-old children and associations with problem behaviors: Findings from the ABCD study’s baseline neurocognitive battery. Developmental Cognitive Neuroscience, 36, 100606. doi:10.1016/j.dcn.2018.12.004

      Zucker, R. A., Gonzalez, R., Feldstein Ewing, S. W., Paulus, M. P., Arroyo, J., Fuligni, A., . . . Wills, T. (2018). Assessment of culture and environment in the Adolescent Brain and Cognitive Development Study: Rationale, description of measures, and early data. Developmental Cognitive Neuroscience, 32, 107-120. doi:https://doi.org/10.1016/j.dcn.2018.03.004

      15) The imputation methods need to be explained in more detail / more clearly. What concrete variables were included? Why was 50% of the sample excluded despite imputation? How similar is the study sample to the overall ABCD cohort - and to the US population in general (i.e., is this a representative dataset)?

      Thank you for mentioning this point. We clarified the method and detailed processes of the imputation (e.g., R package VIM, number of missing observations for each study variables such as genotypes, follow-up observations, and positive environment) [Methods; line 167~176].

      The final samples had significantly higher cognitive intelligence, parental education, family income, and family history of psychiatric disorders, lower Area Deprivation Index, percentage of individuals below -125% of the poverty level, and family’s financial adversity (p<0.05). As you have noted above, these results also show the limited representativeness of the data used in our study. We fully acknowledge that our study sample, as well as the overall ABCD cohort, is not representative of the US population in general.

      16) There are a range of unclear statements (e.g., 'Supportive parenting and a positive school environment had the largest total impact on PLEs than genetic or environmental factors' - isn't parenting an environmental factor?).

      Thank you for mentioning this point. We clarified seemingly vague expressions and unclear statements. We corrected the sentence you noted as ‘Supportive parenting and a positive school environment had the largest total impact on PLEs than any other genetic or environmental factors’ [line 57~58].

      17) The authors' conclusion (that these findings have policy implications for improving school and family environmental) are not fully supported by the evidence. E.g., genetic effects were equally large.

      Thank you for pointing this out. Our description should be clearer. Our models consistently show that the combined environmental effects of positive family/school environment, and family/neighborhood SES exceeds the genetic effects. We suggest that these findings may have policy implications for “improving the school and family environment and promoting local economic development” [line 62~64].

      To clarify, we newly added “Despite the undeniable genetic influence on PLEs, when we combine the total effect sizes of neighborhood and family SES, as well as positive school environment and parenting behavior (∑▒〖|β|〗=0.2718~0.3242), they considerably surpass the total effect sizes of cognitive phenotypes PGSs (|β|=0.0359~0.0502)” [line 510~513]. Based on these results, we suggest that our findings hold potential policy implications for “preventative strategies that target residential environment, family SES, parenting, and schooling—a comprehensive approach that considers the entire ecosystem of children's lives—to enhance children's cognitive ability and mental health” in the Discussion [line 507~510].

      Admittedly, our results do not directly demonstrate a causal effect wherein an intervention in the school or family environmental variables would necessarily lead to a significantly meaningful positive impact on a child's cognitive intelligence and mental health. We do not make such a claim in this paper. However, we anticipate that further integrative analyses akin to ours might help identify potential causal or prescriptive effects. We hope this perspective will be recognized as one of the contributions of our study. We leave the final decision to the discerning judgment of the editors and reviewers.

      18) Many citations do not support the statements made and are sometimes used rather vaguely. For example, I believe Judd 2020 and Okbay 2022 did not use a PGS of cognitive capacity, but of educational attainment. Plomin 2018 and Harden 2020 are reviews, but the primary studies should be cited instead. Which reference exactly is supporting the statement that cognitive capacity PGS links to brain morphometry?

      Thank you very much for your precise observations. We thoroughly checked all citations and updated the references for each statement.

      We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant original research articles (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) used the PGS of cognitive intelligence (which presented the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      19) Citations are formatted inconsistently.

      We apologize for the inconsistency of the citation formatting. We formatted all citations in APA 7th style, using EndNote v20. We checked that all citations maintain consistency according to the reference style.

      20) Re line 281, I believe effect sizes are 'up to twice as large', but not consistently twice as large as suggested in the text.

      Thank you for mentioning this point. We corrected the sentence as ‘The effect sizes of EA PGS on children's PLEs were larger than those of CP PGS’ [line 342~343].

      21) Please add to the results a short statement on what covariates these analyses were controlled for.

      Thank you for giving us this comment. We added that we used sex, age, marital status, BMI, family history of psychiatric disorders, and ABCD research sites as covariates in the Results section [line 329~331].

      22) Cho 2020 does not provide recommendations on FIT values (line 315). Please provide another reference and explain how these FIT values should be interpreted.

      Thank you for mentioning this point. We added the correct reference for FIT values (Hwang, Cho, & Choo, 2021). We also added that the FIT values range from 0 to 1, and a larger FIT value indicates more variance of all variables is explained by the specified model (e.g., FIT=0.50 denotes that the model explains 50% of the total variance of all variables) [line 291~293].

      23) Regarding Figure 2, please add factor loadings to this figure and explain what the difference between the hexagon and circular shapes are. Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?

      Figure 2B needs reworking.

      It is unclear what the x-axis of Figure 2C represents. Proportion of R2 or effect size? SM table 2 provides key information, which should be added to Figure 2.

      Thank you for pointing this out. We added factor loadings to the corrected figure (Figure 3A and 3B). We also added that the X-axis of Figure 3C represents standardized effect sizes.

      24) I suggest adding units directly to Table 1, not in the legend. Was genetic or self-reported ethnicity used in this table? List age in years, not months?

      Thank you for your suggestion. We added the units of age and family history of psychiatric disorders directly inside Table 1. We used genetic ethnicity in Table 1, as we only used genetic ethnicity (but not self-reported ethnicity) throughout our study. This is noted on the last row of Table 1. We listed age in chronological months, which is how each child’s age at each point of data collection is coded in the ABCD Study.

      25) Please include exact p-values in Table 2.

      Thank you for your suggestion. We highly appreciate the reviewer’s comment on the importance of showing exact p-values in the analysis results. Unfortunately, we cannot estimate the standard errors based on normal-theory approximations to obtain the exact p-values of our IGSCA model results. This is described in detail in the original paper of the IGSCA method (Hwang et al., 2021): “Like GSCA and GSCAM, IGSCA is also a nonparametric or distribution-free approach in the sense that it estimates parameters without recourse to distributional assumptions such as multivariate normality of indicators. As a trade-off of no reliance on distributional assumptions, it cannot estimate the standard errors of parameter estimates based on asymptotic (normal-theory) approximations. Instead, it utilizes the bootstrap method (Efron, 1979, 1982) to obtain the standard errors or confidence intervals of parameter estimates nonparametrically.”

      Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26. http://dx.doi.org/10.1214/aos/1176344552

      Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM. http://dx.doi.org/10.1137/1.9781611970319

      Hwang, H., Cho, G., Jung, K., Falk, C. F., Flake, J. K., Jin, M. J., & Lee, S. H. (2021). An approach to structural equation modeling with both factors and components: Integrated generalized structured component analysis. Psychological Methods, 26(3), 273-294. doi:10.1037/met0000336

      26) There are way too many indigestible tables presented in the supplementary materials, which are also not referenced in the main manuscript.

      We appreciate your insightful observation. As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      27) Figure S1 is unclear, possibly due to the journal formatting. Is this one figure presented on two pages? Clarify which PGS is listed in Figure S1 and in any case, please add both PGSs.

      Thank you for mentioning this point. Figure S1 presents two correlation matrices: the first one is the correlation matrix of component / factor variables in the IGSCA model and the second one is the that of observed variables used to construct the relevant component / factor variables in the IGSCA model. We noted each matrix as Figure S1-A and Figure S1-B. We also corrected the figure legend as “A. Correlation between all component / factor variables of the IGSCA model. B. Correlation between all observed variables used to construct the relevant component / factor variables in the IGSCA model.” Since Figure S1-A presents correlations between the components and latent factors, it lists a single PGS variable constructed from the CP PGS and EA PGS. On the other hand, Figure S1-B presents correlations between the observed variables. Thus, both CP PGS and EA PGS are listed in this correlation matrix.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study expands on current knowledge of allosteric diversity in the human kinome by C-terminal splicing variants using as a paradigm DCLK1. The authors provide solid evolutionary and some mechanistic evidence how C-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as protein-protein interactions. The data will be of interest to researchers in the kinase and signal transduction field.

      We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on bioRxiv and have included the link: We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on biorxiv and have included the link here: https://www.biorxiv.org/content/10.1101/2023.03.29.534689v2.

      Reviewer #1

      Summary

      In the study by Venkat et al. the authors expand the current knowledge of allosteric diversity in the human kinome by c-terminal splicing variants using as a paradigm DCLK1. In this work, the authors provide evolutionary and some mechanistic evidence about how c-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as interfering with protein-protein interacting interfaces that altogether provides evidence of c-terminal isoform specific regulation of the catalytic activity in protein kinases.

      The paper is overall well written, the rationale and the fundamental questions are clear and well explained, the evolutionary and MD analyses are very detailed and well explained. The methodology applied in terms of the biochemical and biophysical tools falls a bit short in some places and some comments and suggestions are given in this respect. If the authors could monitor somehow protein auto-phosphorylation as a functional readout would be very useful by means of using phospho-specific antibodies to monitor activity. Overall I think this is a study that brings some new aspects and concepts that are important for the protein kinase field, in particular the allosteric regulation of the catalytic core by c-terminal segments, and how evolutionary cues generate more sophisticated mechanisms of allosteric control in protein kinases. However a revision would be recommended.

      Major Comments

      The authors explain in the introduction the role of T688 autophosphorylation site in the function of DCLK1.2. This site when phosphorylated have a detrimental impact on catalytic activity and inhibits phosphorylation of the DCX domain. allowing the interaction with microtubules. In the paper they show how this site is generated by alternative splicing and intron skipping in DCLK1.2. However there is no further functional evidence along the functional experiments presented in this study.

      1) What is the effect of a non-phosphorylable T688 mutant in terms of stability and enzymatic activity? What would be the impact of this mutant in the overall auto-phosphorylation reaction?

      The role of T688 phosphorylation on DCLK1 functions has been explored in previous studies (Agulto et al, 2020: PMID: 34310279), although only relevant to DCLK1.2 splice variants, since this site is lacking in DCLK1.1. These studies showed that mutation of T688 to an alanine increases total kinase autophosphorylation (ie autoactivity) and the subsequent phosphorylation of DCX domains, which in turn decreases microtubule binding. Given this information, our goal was to use an evolutionary perspective to investigate this, alongside less-well characterized aspects of DCLK autoregulation, including co-conserved residues in the catalytic domain and C-terminal tail. However, to address the reviewers question of a non-phosphorylatable T688 mutant, we performed MD simulations of T688A and T688E (a phosphomimic) mutant and include a new supplementary figure (Figure 5-supplement 3) which show the two mutants slightly destabilize the C-tail relative to wt (1 and 2 angstrom increase in RMSF for T688E and T688A respectively), but by themselves cannot dislodge the C-tail from the ATP binding pocket. Thus, other co-conserved interactions as revealed by our analysis, are likely to contribute to the autoregulation of the kinase domain by the C-terminal tail. We have incorporated these observations into the revised results section.

      Furthermore, to address the reviewer’s question in terms of site-specific autophosphorylation as a marker of DCLK1.2 activity, we have now performed a much-more detailed phosphoproteomic analysis of a panel of purified DCLK1.2 proteins after purification from E.coli (Figure 8-figure supplement 2). This showed that we are only able to detect Thr 688 phosphorylated in our ‘activated’ DCLK1.2 mutants, and not in the autoinhibited WT DCLK1.2 version of the protein. This apparent contradiction does not necessary discount Thr 688 as an important regulatory hotspot, but, together with the MD simulations, may imply a decreased contribution of pThr 688 in facilitating/maintaining DCLK1.2 auto-inhibition than previously anticipated, especially in the context of the numerous other stabilizing amino acid contacts that we describe between the C-tail and the ATP-binding pocket. We do, however, propose a mechanism for pThr688 as a potential ATP mimic based on MD analysis. However, we only found MS-based evidence for phosphorylation at this (and other sites in the same peptide) in highly active DCLK1.2 mutants, in which the C-tail remains uncoupled from the ATP-binding site, even in the presence of this regulatory PTM. We acknowledge that better understanding of DCLK biology will require a detailed appraisal of how the DCLK auto-inhibited states are subsequently physiologically regulated (PTMs, protein-protein interaction etc.), but this is beyond the scope of our current evolutionary investigation, and the absence of phosphospecific antibodies makes this challenging currently. We intend to expand upon our current work by assessing the relative contribution of multiple DCLK phosphorylation sites (including, but not limited to, Thr 688) with regard to cellular DCLK auto-regulation in future studies, in part by generating such site-specific phospho-antibodies.

      2) Have the authors made an equivalent T687/688 tanden in DCLK1.1 instead of the two prolines?

      This is a good point. We have not considered introducing a T687/688 tandem mutation into DCLK1.1 (at the equivalent position to that of DCLK1.2), primarily because the amino acid composition of their respective C-tail domains are so highly divergent across the tail (due to alternative splicing, as discussed in our paper). As discussed in our present study, there are numerous contacts made between specific amino acids in the regulatory C-tail and the kinase domain of DCLK1.2, which functionally occlude ATP binding, and thus change catalytic output. It is these contacts, which are determined by the specific amino acid sequence identity, and not the extended length of the DCLK1.2 C-tail per se, that drives autoinhibition. The alternate amino acid sequence identity of the C-tail of DCLK1.1 does not enable such contacts to form, which we believe explains the different activities of the two isoforms.

      Furthermore, our mutational analysis reveals clearly that Thr688 and several other sites are more highly autophosphorylated in the artificially activated DCLK1.2 constructs than WT DCLK1.2, and as such it remains our hypothesis that introduction of the tandem phosphorylation sites into DCLK1.1 is unlikely to be sufficient to impose an auto-inhibitory conformation of the enzyme.

      3) Could T688 autophosphorylation be used as a functional readout to evaluate DCLK1.2 activity?

      We agree with the reviewer’s suggestion about using autophosphorylation (including potentially Thr688 for DCLK1.2) as a functional read out for DCLK1 activity. In our present study, we identify phosphorylated peptides containing pThr688 only in the mutationally activated DCLK1.2 variants. We have now taken this analytical approach further and performed a detailed comparative phosphoproteomic characterisation of all of our DCLK1 constructs, where we observe marked differences in the overall phosphorylation profiles of the mutant DCLK1.2 (and DCLK1.1) proteins relative to the less phosphorylated WT DCLK1.2 kinase. This manifests as a depletion in the total number of confidently assigned phosphorylation sites within the kinase domain and C-tail of WT DCLK1.2, and also as a depletion in the abundance of phosphorylated peptides for a given site. To help visualise this, individual phosphorylation sites have been schematically mapped onto DCLK1, which has been included as a new extended supplementary figure (Figure 8-figure supplement 2). For comparative analysis of phosphosite abundance, we could only select peptides that could be directly compared between all mutants (identical amino acid sequences) and those found to be phosphorylated in all proteins (these are Ser660 and Thr438); these are now shown in figure supplement 2 as a table. These site occupancies follow what we see with respect to the increased catalytic activity between DCLK1.1 and DCLK1.2 mutants versus DCLK1.2. We also detect increased phosphorylation of DCLK1.1 and activated DCLK1.2 mutants in comparison to (autoinhibited) DCLK1.2, supporting the hypothesis that these mutants are relieving the autoinhibited conformation.

      4) What are the evidences of the here described c-terminal specific interactions to be intra-molecular rather than inter-molecular? Have the authors looked at the monodispersion and molecular mass in solution of the different protein evaluated in this study? Basically, are the proteins in solutions monomers or dimers/oligomers?

      Analysis of symmetry mates in the crystal structure of DCLK1.2 (PDB ID: 6KYQ) provide no evidence for inter-molecular interactions. Furthermore, to evaluate oligomerization status in solution, we conducted an analytical size exclusion chromatography (SEC) and our analysis reveals that both DCLK1.1 and DCLK1.2 predominantly exist as monomers in solution (Figure 3-Supplements 1-3). These results suggest that the C-terminal tail interactions are primarily intra-molecular.

      5) (Figure 3) Did the authors look at the mono-dispersion of the protein preparation? The sec profile did result in one single peak or multiple peaks? Could the authors show the chromatogram? how many species do you have in solution? Was the tag removed from the recombinant proteins or not?

      Yes, as mentioned above, the SEC profile resulted in a single peak for both DCLK1.1 and DCLK1.2, which was confirmed as DCLK1 by subsequent SDS-PAGE. We have included the chromatogram and gels in supporting data (Figure 3-supplements 1-3) in the revised manuscript and updated the Methods section. ‘The short N-terminal 6-His affinity tag present on all other DCLK1 proteins described in this paper was left in situ on recombinant proteins, since it does not appear to interfere with DSF, biochemical interactions or catalysis.’

      6) Authors should do Michaelis-Menten saturation kinetics as shown in Figure 3C with the WT when comparing all the functional variant analysed in the study. So we can compared the catalytic rates and enzymatic constants (depicted in a table also) kcat, Km and catalytic efficiency constants (kcat/Km)

      Thank you for your suggestion. We have performed the requested comparative kinetics analyses for selected functional DCLK1 variants at the same concentration as suggested, using our real-time assay to determine Vmax for peptide phosphorylation as a function of ATP, but at a fixed substrate concentration (we are unable to assess Vmax above 5 µM peptide for technical reasons). The results of these analyses have been included in the revised version of Figure 8-Supplement 1, where they support differences in both Vmax and Km[ATP]; the ratio of these values very clearly points to differences in activities falling into ‘low’ or ‘high’. This kinetic analysis fully supports our initial activity assays, where mutations predicted to uncouple the auto-inhibitory C-tail rescue DCLK1.2 activity to levels similar to DCLK1.1 towards a common substrate.

      Minor Comments

      It is very interesting how the IBS together with the pT688 mimics ATP in the case of DCLK1.2 to reach full occupancy of the active site. On Figure 8 you evaluate residues of the GRL and IBS interface to probe such interactions.

      1) Did the authors look at the T688 non-phosphorylable mutant?

      See our response to Major Comment 1 above. In addition, due to the absence of T688 in DCLK1.1, we did not look at the T688A mutant of DCLK1.2 biochemically, partially because it has been characterized in previous studies, but partially because this site is preceeded by another Thr residue. The lack of a selective antibody towards this site makes it difficult to evaluate the role of T688 phosphorylation specifically with respect to DCLK cellular functions and interactions. Therefore, we focused our in vitro efforts to understand how mutations in the IBS impact the catalytic activity of DCLK1.2 by comparing different variants to DCLK1.1.

      2) Classification of DCLK C-terminal regulatory elements.

      It would be useful to connect the different regulatory elements described in this study to a specific functional and biological setting where these different switches play a role e.g. microtubule interactions and dynamics, cell cycle, cancer, etc..

      While the primary focus of our paper is on the mechanism of allosteric regulation of DCLK1, we have indeed touched upon the potential implications of the various regulatory elements of the tail on functions such as microtubule binding and phenotypic effects like cancer progression. However, we acknowledge that a comprehensive understanding of these effects would necessitate a more detailed investigation. This could potentially involve the integration of RNA-seq data with extensive cell assays to evaluate phenotypic effects. We believe that such a future study would be a valuable extension of our current work and could provide further insights into the functional roles of DCLK1.

      3) (Figure 3) Could the authors explain the differences in yield between the WT and the D531A mutant. Apparently, it [the yield] does not appear to be caused by a lower stability as indicated by the Tm. Could the authors comment on this? It is important to compare different samples in parallel, in the same experiment and side by side. This applies to the thermal shift data comparing WT and a D531A mutant on panel D and also on panel C a comparison between WT and D531A as negative control should be shown.

      WT and D533A (kinase-dead) were indeed analysed in parallel, but have been split in two panels to make the data easier to interpret. The modest differences in yield is likely explained by experimental prep-to-prep variations. Our experience shows that many protein kinase yields vary between kinase and kinase-dead variants, likely due to bacterial toxicity related to enzyme activity. In regards to thermal stability, we would like to emphasize that Differential Scanning Fluorimetry (DSF) is to our mind a more informative and quantitative measure of protein stability than yield from bacteria, because both assess purified proteins at the same concentration. We believe that the DSF data provide a more accurate representation of the real stability differences between the WT and D533A mutant.  

    1. Background

      Ilan Gronau: This manuscript describes updates made to GADMA, which was published two years ago. GADMA uses likelihood-based demography inference methods as likelihood-computation engines, and replaces their generic optimization technique with a more sophisticated technique based on a genetic algorithm. The version of GADMA described in this manuscript has several important added features. It supports two additional inference engines, more flexible models, additional input and output formats, and it provides better values for the hyper-parameters used by the genetic algorithm. This is indeed a substantial improvement over the original version of GADMA. The manuscript clearly describes the different added features to GADMA, and then demonstrates them with a series of analyses. These analyses establish three main things: (1) they show that the new hyper-parameters improve performance; (2) they show how GADMA can be used to compare performance of different approaches to calculate data likelihood for demography inference; (3) showcase new features of GADMA (supporting model structure and inbreeding inference). Overall, the presentation is very clear and the results are interesting and compelling. Thus, despite being a publication about a method update, it shows substantial improvement, provides interesting new insights, and will likely lead to expansion of the user base for GADMA.The only major comment I have is about the part of the study that optimizes the hyperparameters. The hyper-parameter optimization is a very important improvement in GADMA2. The setup for this analysis is very good, with three inference engines, four data sets used for training and six diverse data sets used for testing. However, because of complications with SMAC for discrete hyperparameters, the analysis ends up considering six separate attempts. The comparison between the hyper-parameters produced by these six attempts is mostly done manually across data sets and inference engines. This somewhat beats the purpose of the well-designed set up. Eventually, it is very difficult for the reader to asses the expected improvement of the final suggested values of hyperparameters (attempt 2) to the default ones. I have two comments/suggestions about this part.First, I'm wondering if there is a formal way to compare the eventual parameters of the six attempts across the four training sets. I can see why you would need to run SMAC six separate times to deal with the discrete parameters. However, why do you not use the SMAC score to compare the final settings produced by these six runs?Second, as a reader, I would like to see a single table/figure summarizing the improvement you get using whatever hyper-parameters you end up suggesting in the end compared to the default setting used in GADMA1. This should cover all the inference engines and all the data sets somehow in one coherent table/figure. Using such a table/figure, you could report improvement statistics, such as the average increase in log-likelihood, or average decrease in convergence times. These important results get lost in the many improved figures and tables.These are my main suggestions for revisions of the current version. I also have some more minor comments that the authors may wish to consider in their revised version, which I list below.Introduction:===========para 2: the survey of demography inference methods focuses on likelihood-based methods, but there is a substantial family of Bayesian inference methods, such as MPP, Ima, and G-PhoCS. Bayesian methods solve the parameter estimation problem by Bayesian sampling. I admit that this is somewhat tangential to what GAMDA is doing, but this distinction between likelihood-based methods and Bayesian methods probably deserves a brief mention in the introduction.para 2,3: you mention a result from the original GADMA paper showing that GADMA improves on the optimization methods implemented by current demography inference methods. Readers of this paper might benefit of a brief summary of the improvement you were able to achieve using the original version of GADMA. Can you add 2-3 sentences providing the highlights of the improvement you were able to show in the first paper?para 3: The statement "GADMA separates two regular components" is not very clear. Can you rephrase to clarify?Materials and methods - Hyper-parameter optimization:==============================================I didn't fully understand what you use for the cost function in SMAC here. Seems to me like there are two criteria: accuracy and speed. You wish the final model to be as accurate as possible (high log likelihood), but you want to obtain this result with few optimization iterations. Can you briefly describe how these two objectives are addressed in your use of SMAC? It's also not completely clear how results from different engines and different data sets are incorporated into the SMAC cost. Can you provide more details about this in the supplement?para 2: "That eliminate three combinations" should be "This eliminates three combinations".para 3: "Each attempt is running" should be "Each attempt ran"para 3: "We take 200×number of parameters as the stop criteria". Can you clarify? Does this mean that you set the number of GADMA iterations to 200 times the number of demographic model parameters? Why should it be a linear function of the number of parameters? The following text explains the justification, butTable 1: I would merge Table S2 with this one (by adding the ranges of all hyper-parametres as a first column). It's important to see the ranges when examining the different selections.Materials and methods - Performance test of GADMA2 engines:=====================================================para 2: "ROS-STRUCT-NOMIG" should be "DROS-STRUCT-NOMIG" Also, "This notation could be read" - maybe replace by "This notation means" to signal that you're explaining the structure notation.Para 4 (describing comparisons for momi on Orangutan data): "ORAN-NOMIG model is compared with three …". You also consider ORAN-STRUCTNOMIG in the momi analysis, right?Results - Performance test of GADMA2 engines:========================================Inference for the Drosophila data set under model with migration: you mention that the models with migration obtain lower likelihoods than the models without migration. You cannot directly compare likelihoods in these two models, since the likelihood surface is not identical. So, I'm not sure that the fact that you get higher likelihoods in the models without migration is a clear enough indication for model fit. The fact that the inferred migration rates are low is a good indication for that. It also seems like despite converging to models with very low migration rates, the other parameters are inferred with higher noise. For example, the size of the European bottleneck is significantly increased in these inferences compared to that of the NOMIG. So, potentially the problem here is that more time is required for these complex models to converge.Inference for the Drosophila data set under structured model (2,1): the values inferred by moments and momentsLD appear to neatly fit the true values. However, it is not straightforward to compare an exponential increase in population size to an instantaneous increase. Maybe this can be done by some time-averaged population size, or the average time until coalescence in the two models? This will allow you to quantify how good the two exponential models fit the true model with instantaneous increase.Inference for the Orangutan data set under structured model (2,1) without migration: you argue that a constant population size is inferred for Bor by moments and momi because of the restriction on population sizes after the split. You base this claim on a comparison between the log-likelihoods obtained in this model (STRUCT-NOMIG) and the standard model (NOMIG) in which you add this restriction. I didn't fully understand how you can conclude from this comparison that the constant size inferred for Bor is due to the restriction on the initial population size after the split. I think what you need to do to establish this is run the STRUCT model without this restriction and see that you get exponential decrease. Can you elaborate more on your rationale? A detailed explanation should appear in the supplement and a brief summary in the main text.Inference for the Orangutan data set with models with pulse migration: This is a nice result showing that the more pulses you include, the better the estimates become. However, your main example in the main text uses the inferred migration rates. This is a poor example, because migration rates in a pulse model cannot be compared to rates in a continuous model. If migration is spread along a longer time range, then you expect the rates to decrease. So, there is no expectation of getting the same rates. You do expect, however, to get other parameters reasonably accurate. It seems like this is done with 7 pulses, but not so much with one pulse. This should be the main the focus of the discussion of these results.Results - inference of inbreeding coefficients:======================================When you describe the results you obtained for the cabbage data set, you say "the population size for the most recent epoch in our results is underestimated (6 vs 592 individuals) for model 1 without inbreeding and overestimated (174,960,000 vs. 215,000 individuals) for model 2 with inbreeding". The usage of under/overestimated is not ideal here, because it would imply that the original dadi estimates are more correct. You should probably simply say that they are lower/higher than estimates originally obtained by dadi. Or maybe even suggest that the original estimates were over/underestimated?Supplementary materials:=====================Page 4, para2: "Figure ??" should be "Figure S1"Page 4, para 4: Can you clarify what you mean by "unsupervised demographic history with structure (2, 1)"?Page 22, para 2: "Compared to dadi and moments engines momentsLD provide slightly worse approximations for migration rates". I don't really see this in Supplementary Table S16. Estimates seem to be very similar in all methods. Am I missing anything? You make the same statement again in the STRUCT-MIG model (page 23).Page 22, para 4: "The best history for the ORAN-NOMIG model with restriction on population sizes is -175,106 compared to 174,309 obtained for the ORAN-STRUCT-NOMIG mod". There is a missing minus sign before the second log likelihood. You should also specify that this refers to the moments engine. Also see comment above about this result.

  3. learn-us-east-1-prod-fleet01-xythos.content.blackboardcdn.com learn-us-east-1-prod-fleet01-xythos.content.blackboardcdn.com
    1. FreedomfortheFilipinoschallenging USoccupatio

      I think this is a really blunt message regarding the different meanings of freedom to people within different situations, and how the different ideas can conflict. It reveals that a differing idea of freedom can be seen as an attack on freedom accepted by someone else. In this case, the US soldiers see the Filipino's desire for independence as an infringement upon their ideas of freedom. It is up to society and each individual which side of the conflict they wish to be on. I think today, we can agree with the Filipinos side, but there may be some who think otherwise. Personally, I struggle to understand the soldiers, but their social development and ideologies are entirely different from my own. So, in that regard, historical circumstances play a big role in understanding the many complexities of freedom. It's very complex and quite hard to articulate.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Kagermeier et al. present a novel and interesting study that attempts to model a severe neurodevelopmental disorder, pontocerebellar hypoplasia type 2a, using neocortical and cerebellar organoids. Brain organoids are an appropriate and promising approach to elucidate disease mechanisms in neurodevelopmental diseases. The authors show a reduction in the size of the organoids which is more pronounced in the cerebellar compared to neocortical organoids. While this finding is interesting and reminiscent of the clinical PCH2a phenotype, i.e., cerebellar hypoplasia, the study is very preliminary and the conclusions of the manuscript are not supported by the data. Additional information and further experiments are necessary to support the claims made.

      Major concerns:

      1. hiPSC lines show considerable inter- and intra-individual variability and therefore the size differences observed between these control and patient-derived organoids may arise from differences in the hiPSC lines used. While the data sufficiently demonstrates the pluripotency of the multiple novel hiPSC lines, major concerns remain as to the appropriateness of the control hiPSC lines. The manuscript should include a table describing the age and sex matching as well as mode of reprogramming for all control and patient lines. Patient and control lines should be matched as closely as possible. Furthermore, figure legends should clearly indicate which clones and lines are shown in the various figure panels.

      We agree with the reviewer that hiPSC variability is an important concern in the field. In order to minimize such effects, all iPSCs lines used in this study were generated following the same protocol in the same lab. All cell lines are derived from male donors, thus, eliminating sex-based variability. Further, there is no report of sex-based variance in the clinical phenotype of PCH2a children and this finding is further corroborated by a currently on-going natural history study in our research team. While it would be ideal to also have age-matched controls, this is not possible for ethical reasons as skin biopsies from healthy children cannot easily be obtained to match the pediatric PCH2a cases. However, based on the literature, we believe that epigenetic age is erased upon reprogramming (Strassler et al 2018, Studer et al 2015). Following the reviewer’s recommendation, we provide a table that clearly indicates the origin of all six cell lines used (see Methods section) and information of respective lines was added to the figure legends as suggested by the reviewer.

      As the hiPSC lines used are not isogenic, it is important that the authors characterise these lines further. This should include a quantification of the rates proliferation and apoptosis in all used hiPSC lines, as these might impact the growth rate of the embryoid bodies / organoids.

      We thank the reviewer for raising this concern. To address the variability of hiPSC lines, we performed an extensive characterization of pluripotency, proliferation and cell cycle dynamics of all six hiPSC lines through immunocytochemistry against pluripotency marker OCT4, proliferation marker Ki-67 and EdU incorporation experiments. We further assessed the apoptosis rate of hiPSCs by staining against apoptotic marker cCas3. These experiments were carried out in three consecutive passages of all iPSC lines providing statistical power to the analyses. All experiments did not result in significant differences between PCH2a and control iPSC lines (see Figure 2).

      The authors state that the hiPSC lines have been characterised by SNP arrays to show that no genomic / chromosomal aberrations have been accrued due to reprogramming. The manuscript should include information as to when the SNP array was performed (i.e., immediately after reprogramming, after initial passaging, etc) and also include the results of the SNP array as additional information. What passage were the hiPSC when the presented experiments were carried out?

      In agreement with this comment, we provide data of SNP arrays that were performed to ensure the chromosomal integrity of all cell lines (see supplement). Further, we added details on passages of the cell lines in the respective figure legends as suggested by the reviewer. In brief, all cell lines were kept below passage 20 and were subjected to pluripotency testing before differentiations were started.

      Given that TSNE54 is broadly and strongly expressed in the developing nervous system, the very limited staining of the organoids for TSNE54 in Figure 2 is surprising. Can the authors provide an explanation for the fact that TSNE54 is only expressed in a small subset of cells? Which cell types are these? Moreover, high-magnification images should be shown to demonstrate subcellular staining pattern of TSNE54. Quantification of TSNE54 protein levels by immunoblotting would also be beneficial.

      Related to this observation, it is puzzling that the large size differences that the authors observe in their organoids would be driven by such a small number of TSNE54-expressing cells. How do the authors explain this discrepancy?

      We thank the reviewer for this comment. We have carefully assessed human cerebellar development transcriptomic datasets which demonstrate that TSEN54 is in fact not strongly but moderately expressed in the human developing nervous system. Additionally, TSEN54 expression is expressed in various different cell types (not limited to a subset of cell types) (Aldinger et al 2021, Sepp et al 2021). We agree with this reviewer and reviewer 3 that Western Blotting or other types of quantification would be informative as well as investigation of the subcellular localization of the protein. However, these questions go beyond the scope of the current manuscript, which aims to present a disease model. We have therefore decided to remove the characterization of TSEN54 expression in organoids from our revised manuscript.

      The generated organoids need to be better characterised with a broader range of markers using both qPCR and immunostaining. At the moment, their identity as "cortical" and "cerebellar" organoids remain unconvincing. This is particularly true for cerebellar organoids, which are challenging to generate and are not widely used. The authors should include additional markers (for example, see PMIDs 25640179, 29397531, 32117945) and immunostaining should clearly show expected staining patterns.

      In Figure 5, it appears that some markers (e.g., SATB2) are expressed differently between control and patient lines, yet this is not commented on by the authors who conclude that control and patient lines show differentiation into organoids.

      We thank the reviewer for this suggestion. We performed further immunostainings using the markers that were used in other cerebellar organoid papers (Muguruma et al 2015, Silva et al 2020, Watson et al 2018) as the reviewer suggested. In detail, we added immunohistochemistry experiments on Day 30 and Day 50 of differentiation for early Purkinje cell markers OLIG2 and SKOR2. We also included ATOH1 as a marker for rhombic lip-derived granule cells. For the neocortical organoids, we believe that the performed characterization is sufficient since the protocol we used is well-established and widely used as also indicated by the reviewer. We agree that the cellular composition of the organoids should be investigated in detail (for instance using single-cell transcriptomics). However, we believe this is out of the scope of this manuscript, which describes the establishment of a brain-region specific model platform.

      The authors attempt to look into a potential mechanism for the size differences observed between control and patient organoids. However, only cleaved caspase-3 is used as a marker for apoptosis and no differences were observed. The authors should include further markers for potential cell death. In addition, immunostaining for proliferation markers (i.e., KI67) should be performed to evaluate whether the difference in organoid size could stem from decreased proliferation rather than increased cell death.

      We agree with the reviewer and included a quantification of the proliferation marker Ki-67 within the SOX2 positive population of cerebellar and neocortical organoids as well as the quantification of SOX2 positive areas within the organoids (Figure 6). We observed significant differences in proliferation between PCH2a and control cerebellar organoids. Moreover, we also analyzed the morphology of organoids and quantified the thickness and number of rosettes and find significant differences between control and PCH2a cerebellar organoids corroborating the notion that proliferation is altered in cerebellar organoids. Neocortical organoids do not show any significant differences in proliferation and Sox2+ structures. Only the thickness of the Sox2+ areas is slightly decreased in neocortical PCH2a organoids compared to controls. In order to deepen our analysis of a possible increased apoptosis in PCH2a organoids, we also quantified cCas3 in Sox2+ structures (Figure 5) as also suggested by Reviewer 2. These analyses did not show any significant differences between PCH2a and control organoids. We therefore suggest that at the early stages of differentiation studied here, proliferative differences are the main reason for the size differences between PCH2a and control organoids.

      Reviewer #1 (Significance (Required)):

      The authors present an innovative approach to study neurodevelopmental disorders using brain organoids and should be of interest to researchers and clinicians working on neurodevelopmental diseases. However, the data presented are too limited to support any conclusions about the phenotype observed. Furthermore, questions remain about the used methodology and more work is needed to demonstrate the successful generation of both cortical and cerebellar organoids.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Please find enclosed my recommendation for the paper submitted by Kagermeier et al entitled' Human organoid model of PCH2a recapitulates brain region-specific pathology'. It describes the development of a human model for PCH2a and its characterization. My overall assessment of the paper is 'Major revision' which is explained below.

      Although the paper is very well written and clearly interesting in that it describes the generation and initial analyses of a human organoid model for PCH2a it should be revised such that it will proof the points it is trying to make. The authors are meticulous in their studies combining cellular characterization and a thorough initial screen of organoid (both cerebellar as well as cortical) integrity, yet hardly any mechanistic data is provided. Nevertheless, if the authors are able to add additional experiments and are able to address the points raised, the reviewer may be willing to consider a more positive outcome.

      Major concerns

      1) The overall quality of the figures is poor. There is a lot of overexposure such that often cellular or tissue structures are blended. It starts with Figure 1 G and H but can be observed throughout the manuscript. Deconvolution would greatly enhance their results.

      We are thankful for this comment and we have improved the quality of all microscopy images.

      2) Especially figure 4 and 5 could have been complemented with quantitative data. It furthermore seems more supplemental figure as these are just proof-of-principle stainings. No conclusions can be drawn from the panels except that all markers are there in the various conditions. And while they are showing a neural rosette in Fig 4A, just tiny ones can be observed in 4B. It is also not clear what the whole mount IHC ads in comparison to the IHC on sections. It is also strange that there is still a lot of SOX2 in the CALB/MAP2-positive area, but again with this magnification hard to appreciate.

      We agree with the reviewer that so far we presented qualitative proof-of-principle stainings that demonstrate cerebellar and neocortical differentiation, respectively. In order to address the comment of the reviewer, we improved the quality of the images and also provided higher magnification and enhanced resolution. Additionally, we now provide detailed quantifications of SOX2+ and Ki67+ neural progenitor cells and show that differences observed between PCH2a and control cerebellar organoids may explain the size differences observed between organoids (Figure 6). Our study provides the basis for more in-depth analysis of differences in differentiation and cell type composition between PCH2a and control organoids in the future, for example through single-cell RNAseq.

      3) If the authors would like to proof the point that cerebellar/cortical development is hampered, more functional assays could have been done. Nothing is analyses on the fraction of progenitor cells present (such as the percentage of Tbr2+ IPC in VZ/CP). Furthermore, if there is a suspicion that the number of cells is affected (which is also not shown), proliferation/cell cycle exit experiments using BrdU/EdU should have been performed. Early cell cycle exit still cannot be rules out and should have been tested by the combination of Ki67-/EdU+ percentage of a certain faction of progenitor cells (eg PAX6+ pool).

      We thank the reviewer for this valuable suggestion and agree that it would be interesting to carry out respective experiments. In this study, we show the establishment of a brain-region-specific organoid platform as a disease model for PCH2a and are only at the beginning of deciphering the underlying mechanism. In the revised manuscript, we quantified Ki-67+/Sox2+ cells in proliferative zones in the organoids. We believe that future studies including BrdU / EdU incorporation assays as well as scRNA-seq will answer the questions raised here and decipher the disease-causing mechanism on both cellular and molecular levels but are beyond the scope of this manuscript.

      4) Instead the author chose to only perform a cCas3 staining. From the panels in Figure 6 it is hard to appreciate which cells are actually cCas3+. Also the analyses were performed on the total pool of cell while it might have been more interesting to look for cell death of the various progenitor pools (eg the SOX2+ pool).

      We agree with the reviewer that a more in-depth analysis of apoptotic cell populations is interesting and performed cCas3/Sox2+ quantification for cerebellar and neocortical organoids. We did not observe significant differences of cCas3 expression within the SOX2+ cell population. (Figure 5)

      Minor concerns

      1) It would greatly enhance the review process if line numbers are added

      We have added line numbers to the manuscript.

      2) On general concepts (such as the generation of organoids in the context of disease) more references could have been added

      We have added more references and discussed the topic of brain organoids as disease models as suggested by this reviewer (Eichmüller & Knoblich 2022, Khakipoor et al 2020, Velasco et al 2020).

      Figures

      Fig. 1: In A, the square is clearly visible and not similar to B. An annotation of which is the control and which is the patient is missing in the figure. The arrows are hardly visibly, would make them slightly bigger and remove the black outer lining. Figure 1C can easily go to the Supplemental material. Fig 1 D is hard to appreciate the staining, a close-up with bright field microscope will help. E-I Most of the panels but especially G and H are overexposed. In J, it is hard to appreciate the TSEN54 staining. Maybe separate channels and a merge?

      We thank the reviewer for bringing these details to our attention. We have changed the arrows in the figure to enhance their visibility. Further we have adjusted the quality of the images overall. Lastly, we have made a comment in the figure legend clearly stating which scan came from which child. The described square was added to hide facial features of the imaged individuals hence they are not identical.

      Fig. 3: Usually go into the supplementals.

      Since organoid size is a major first readout when modeling a disorder that is characterized by a reduction of the volume of specific brain regions, we decided to keep this readout in the main text.

      Fig 4/5: Lack of quantitative data and poor quality of figures (overexposure).

      Fig 6: Many of the SOX2 panels are overexposed

      We thank the reviewer for the suggestions on the figures and addressed the concerns in the revised manuscript.

      CROSS-CONSULTATION COMMENTS

      I completely agree with reviewers #1 and #3. It is good to notice that we are overall on the same page.

      Reviewer #2 (Significance (Required)):

      The authors definitely made an excellent start to model PCH2a. Three controls and three patient lines are good to begin with but isogenic controls using one parental line and a patient line where the mutation is fixed would have been ideal. It is interesting that there seem to be a brain area specific pathology of the phenotype. Yet, more thorough analyses could have been performed such as proliferation and differentiation and cell cycle exit experiments. As for now the mostly descriptive data are only scratching the surface and little can be concluded on the molecular framework they are trying to solve.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this study Kagermeier et al. use human cerebellar and neocortical organoids to investigate the effects of the PCH2a-causing homozygous TSEN54c.919G>T variant on the neurodevelopment of different brain regions. They reveal a substantial growth defect in both neocortical and cerebellar regions with a more profound phenotype in the cerebellum. They continue to investigate major cell types of neurodevelopment in both regions and briefly potential mechanisms underlying the phenotypes. The study is well conceived and addresses the current gap of disease-modeling in cerebellar organoids; nevertheless, some major claims are not sufficiently substantiated in the current version. Below, I provide suggestions on how to improve the manuscript with some additional minor comments that might help with readability and accessibility of the work.

      Major comments:

      1. TSEN54 expression levels: The authors compare RNA and protein expression levels for TSEN54 to investigate the mutation's effect. For this the authors use qPCR on iPSCs and organoids of different age and immunostainings and conclude "we did not find differences in expression between cell and tissue types". There are some issues with this analysis as explained below:

      -The qPCR data (Fig. 2B) is first normalized to a housekeeping gene (GAPDH), however, then all organoid data are additionally normalized to the respective iPSC line. Thus, in case there is already a difference on iPSC level, this normalization might mask any difference in the organoids. It is unclear why this approach was chosen, and it seems more appropriate to show the data just normalized to GAPDH than additionally normalizing to the iPSCs, or at least to show first that iPSCs do not have differences in TSEN54 expression. Furthermore, even though apparently not statistically significant there seems to be a strong trend of lower TSEN54 levels in PCH2a in neocortical organoids, but even more so in cerebellar organoids. In my view this would fit very well with the study and should be further explored before concluding there is no statistical difference. Considering the high error bars of the cerebellar organoid samples, a higher N-number might be necessary to reach statistical significance in the difference in expression. Most importantly, it would be appropriate to show single data points where possible and to mark the different cell lines (as done in other figures), as otherwise it is not possible to judge whether there is a cell line bias in the data.

      -The evidence for protein expression of TSEN54 is immunofluorescence stainings for all conditions. As there is no quantification, the authors should not conclude differences, or the lack thereof, based on this qualitative data. Furthermore, in fact in the on example shown the PCH2a cerebellar condition (Fig 2D) seems to show lower expression levels compared with other conditions. This could be due to the selected image, as all other examples include large neural rosettes with strong staining in the center of the rosettes. Furthermore, it is unclear what cell line these stainings come from, even whether the PCH2a cerebellar and neocortical stainings come from the same cell line. Thus, the authors should select comparable examples for all conditions, and ideally provide staining examples (e.g., as supplementary data) for the other replicates to ensure expression in all replicates. If the authors want to comment on differences in protein expression, maybe a quantitative approach (e.g., quantitative western blot) would be more appropriate. Otherwise, the statements should be adjusted to not conclude whether TSEN54 protein levels differ or not.

      -Irrespective of the above comments the conclusion of the section "TSEN54 expression in cerebellar and neocortical organoids", that currently reads "we did not find differences in expression between cell and tissue types" should be changed, as the authors did not investigate whether there are cell type-specific differences of TSEN54 expression.

      We thank the reviewer for this comment. We agree that the provided data is not suitable for quantitative analysis of TSEN54 expression. Please also see our related response to the similar concern raised by reviewer 1. Thanks to these suggestions, we have decided to exclude the TSEN54 expression data from the current manuscript as a detailed analysis should be part of an extensive future study.

      Organoid growth analysis:

      The organoid growth analysis in Figure 3 and supplementary Figure 2 shows the main phenotype of the study that seems to be very strong. The authors use unpaired t-tests to compare within the different timepoints. Unfortunately, I think this approach might not be appropriate as even though the Welch correction does not rely on similar SDs in the compared groups (Control vs. PCH2a), it still assumes that all data points within each group share the same variance. However, this is not the case, as e.g., the control condition includes three groups (Control-1 to -3), that between groups might have different variance as such not all datapoints are independent from each other. Potentially ANOVA analyses controlling for cell line and timepoint might be more appropriate. Or additionally, the authors could consider using the linear regression analysis in Supplementary Figure 2 to further investigate the difference in organoid growth by e.g., comparing the slope of the regression lines. This might be more appropriately reflecting the growth deficit over time than simply comparing each timepoint individually. Expanding on this analysis the regression analysis requires some more information on the fit (intercept, slope, R-squared of the model), which would help clarifying the growth dynamics in the different systems and conditions.

      We thank the reviewer for the suggestions on statistical analysis and adjusted our approach accordingly. Briefly we performed 3-way-ANOVA analysis for the growth curves which revealed no significant differences between the different lines within the groups (Control or PCH2a) at different time points. Additionally, we added the linear regression model to the results (See Figure 3 and supplementary table 2, with the information on the curve fit).

      The growth ratio analysis (Figure 3D) is essential to the major claim of the paper that the organoids replicate the region-specific differences. As the authors performed all experiments with matching cell lines this could additionally strengthen the argument by generating the ratio of size differences for each cell line separately (instead of just for all PCH2a lines together). This would allow comparison of the same genetic background in both cerebellar and neocortical condition and further corroborate the region-specific severity of the phenotype. Potentially, this would also enable to test these differences statistically.

      We appreciate the suggestion to compare the differentiation protocols by line. Below we display the line-by-line analysis between the two differentiation protocols at D30 (A), D50 (B), and D90 (C). In order to visualize the differences in size between the two protocols more clearly, we have generated ratios of the average organoid sizes between neocortical and cerebellar organoids (D). The analysis corroborates our previous visualizations and statistics (3-way ANOVA) by showing that PCH lines produce neocortical and cerebellar organoids that differ in size more than those of control lines. The differences are most pronounced at D30 and D90. However, we believe that this analysis does not add additional value to our manuscript and have therefore decided not to include it in the revised version.

      Additionally, all growth analyses for the neocortical organoids (Figure 3C, Supplementary Figure 2B and C) seem to lack the PCH-1 cell line and only contain PCH-2 and PCH-3. This cell line should be added or commented on why it was excluded from the analyses.

      We agree with the reviewer. Unfortunately, we experienced contamination in that specific differentiation and therefore cannot provide the data. We have made a related comment in the manuscript. Since all differentiations were performed in parallel, adding this line at a later time point would add additional confounders and is therefore undesirable.

      Potential mechanism of the phenotype (apoptosis analysis):

      In Figure 6 the authors investigate the hypothesis that increased apoptosis contributes to the phenotypes. In the cleaved Caspase 3 staining there appear to be no differences. Unfortunately, the analysis apparently only includes one replicate (one organoid?) per cell line and condition. Considering the variability in the data shown this seems inappropriately low and should ideally contain ~3 replicates per cell line condition to judge technical and biological variability if the authors want to make the point that there is no "significant difference between PCH2a and control organoids at any time point in both cerebellar and neocortical organoids". Otherwise, this claim does not seem to be substantiated enough by the data.

      Finally, due to the absence of a phenotype related to apoptosis the authors conclude that the phenotypes may be due to "deficits in the proliferation of progenitor cells". Although this is mentioned in the introduction and the discussion, there is no evidence in the current study that supports this interesting idea. By adding relatively straight forward co-staining experiments for e.g., SOX2 (progenitors) and Ki67 (proliferating cells), the authors could provide further evidence for this hypothesis using existing organoid sections. This would support this speculative idea and could add a more mechanistic insight to the study, thereby making it more exciting.

      To address this concern, we have now added a table to the supplement that described in detail which organoids / batches / cell lines were used for which experiment (Supplementary table 3). In addition to our previous cCas3 quantifications, we performed the quantification of cCas3 within the population of SOX2-positive cells, which was suggested by Reviewer 2 (Figure 5).

      To assess the alternative hypothesis, that proliferation deficits account for the size differences observed between organoids, we also performed quantifications of SOX2-positive zones in the organoids at D30 and D50 of differentiation as well as quantifications of Ki-67 positive cells within the SOX2-positive population. For cerebellar organoids we found significant differences in these experiments (Figure 6). We believe that this data supports the hypothesis of aberrant proliferation in PCH2a cerebellar organoids explaining the size differences.

      Minor comments:

      • Cell line and quality control: The authors recruit three male patients with PCH2a and reprogram iPSCs. These cell lines are subjected to a well performed extensive quality control. However, it is unclear what cell lines the stainings (e.g., Fig. 1D to I) originate from. Furthermore, the supplementary qPCR analysis (Supplementary Figure 1) includes only the PCH-1 line, and additionally two cell lines that are not explained (F-CO and hESC-I3). It is unclear what the relevance of showing the qPCR of these cell lines is. To ensure proper QC for all used cell lines the authors should provide data for all cell lines (PCH-1 to -3 and control-1 to -3), or at least summarize (e.g., in a table) what QC metrics were applied to which cell line. Most importantly, this information is completely lacking for the control cell lines and the QC is just mentioned in the text. Unfortunately, it is unclear where the control cell lines originate from, and some basic information would be required to judge whether they are appropriate controls: are they iPSC or ESC, were they reprogrammed with a similar paradigm as the PCH2a cells, what is the gender of the control cell lines (all PCH2a cell lines are apparently male)?

      In line with a similar comment from reviewer 1, we have included a table that provides information on the origin of all six cell lines used in the revised manuscript (methods section). Further we provide SNP-Array data on all cell lines as supplementary material. We also performed detailed characterization of pluripotency, proliferation and cell cycle dynamics of all six hiPSC lines through immunocytochemistry against pluripotency marker OCT4, proliferation marker Ki-67 and EdU incorporation experiments (Figure 2). We further assessed the apoptosis rate of hiPSCs by staining against apoptotic marker cCas3. All experiments did not result in significant differences between PCH2a and control iPSC lines (see Figure 2). In line with the suggestion of this reviewer, we removed the qPCR analysis of iPSCs from the manuscript.

      • To make the study more approachable for a medical audience and to judge the variability in phenotype presentation among the recruited patients it would be appreciated if more information on the patients would be provided. The authors write: "We identified three individuals that display the genetic, clinical and brain imaging features previously described for PCH2a.". This information including age/date of birth, as well as other medically relevant information could be provided in the supplementary figure (e.g., is there a difference in disease burden among the different patients?). This would allow judging the recruited cohort better.

      We thank the reviewer for this insightful comment. We provided a table with detailed clinical information (supplementary table 1).

      • According to the method section the cerebellar and neocortical organoids were cultured in very different medium especially at later timepoints. While neocortical organoids were kept in a neural maintenance medium based on Neurobasal-A, cerebellar organoids were kept in a medium based on BrainPhys. These media contain very different levels of nutrients, especially of glucose (25mM vs 2.5mM, Bardy et al. 2015). This can have a strong phenotype on proliferation of progenitors and proliferative phenotypes (e.g., see Eichmüller et al. 2022). Especially as the authors claim that there is a difference in the PCH2a phenotypes between brain regions, it should be excluded that this is due to medium differences at later timepoints. When investigating the growth curves of Figure 3B and C it seems like the major difference in growth speed seems to be that neocortical organoids grow faster in early timepoints (We agree that media composition can greatly influence growth dynamics of cells in 2D and 3D. However, in this study we assess the differences between two groups: the PCH2a and control iPSC-derived organoids. The differences we describe are in relation to the respective control group and iPSCs were generated following the same protocol in the same lab. We believe that by following two protocols and comparing the three PCH2a to the three control lines within each protocol predominantly, we account for different media composition possibly changing growth dynamics.

      • Staining examples shown and presentation: In several figures the authors could improve the presentation of the staining examples with some changes:

      o Cell line information for images: as the authors only ever note the condition (PCH2a or Control) but not the cell line it is unclear if the stainings all come from one cell line or from multiple different cell lines. This prevents comparing the different differentiation conditions. Additionally, for major conclusions the authors should consider including supplemental stainings or further information on how reproducible the results shown are (how many cell lines and batches were used?).

      We thank the reviewer for these suggestions. We added information on cell lines and passages for all experiments shown in this study in the figure legends. Moreover, we also added a table providing information on n-numbers for all experiments (supplementary table 3).

      o Selection of examples: in several cases (Fig 2C/D, 4A, 6A/B) the selected images depict very different regions, e.g., one condition shows a large rosette, while in the other condition no rosette can be seen. It would be more appropriate to show matching examples where possible.

      We agree with the reviewer and have chosen matched regions of interest in the figure panels in the revised version of the manuscript. Please note that for cerebellar organoids we observed a significant difference in the timepoint of appearance of these rosette-like structures. Therefore, an exact matching of regions of interest was not possible due to biological differences between the samples, which we have also quantified (Figure 6).

      o Color code of stainings: Colors do not match throughout the manuscript in immunofluorescence images. E.g., Fig. 4 uses blue, green, red, magenta and Fig. 5 uses blue, green, magenta, cyan. It would be preferable to adhere to one color code. Considering significant fraction of the population is having red-green blindness, the latter color code seems more appropriate as it should ensure readability also for color-blind audiences.

      We are thankful for this comment. We changed the color code to make figures more widely accessible.

      • Small typos:

      o Figure 1 legend: last sentence "The" instead of "Th"

      o Supplementary Figure 1B: PCH-2 is named "PCH-22"

      o Supplementary Figure 2: As in the main figure for neocortical organoids the PCH-1 condition is missing (see comment on organoid growth curves). Additionally, the color/shape code of the plots in B does not always match the legend (e.g., size in left plot is different and color of PCH-3 in middle and left plot differs from legend and right plot).

      o It is unclear why the cortical organoids are referred to as "neocortical organoids" in the figures and the text. The methods and the reference in the methods as well as all major papers rather use the word "cortical".

      We addressed these suggestions and thank the reviewer for bringing these to our attention. Unfortunately, we could not include data on PCH-01 in neocortical differentiation due to a contamination in this batch. We made sure to run all the batches presented here in parallel so that all conditions are equivalent, preventing us from including a different batch at a later time point.

      We believe that in the context of our study, it is important to highlight cortical organoids as neocortical organoids, because we are also showing cerebellar organoids and there is also a cerebellar cortex.

      References:

      Bardy, C. et al. Neuronal medium that supports basic synaptic functions and activity of human neurons in vitro. Proc National Acad Sci 112, E3312 (2015).

      Eichmüller, O. L. et al. Amplification of human interneuron progenitors promotes brain tumors and neurological defects. Science 375, (2022).

      CROSS-CONSULTATION COMMENTS

      I agree with the comments of the other reviewers and as they are mostly matching, this reinforces the importance to improve certain aspects of the manuscript. As there are no deviating issues I do not comment specifically on any reviewer comments.

      Reviewer #3 (Significance (Required)):

      This work is using organoid technology to shed light on brain region-specific phenotypes in PCH2a. Brain organoids have drastically changed the way we study human neurological diseases (Eichmüller and Knoblich 2022), however, most brain organoid research has focused on cortical organoids. Cerebellar organoid protocols exist for some time (Muguruma et al. 2015, Silva et al. 2020, Nayler et al. 2021) but were not yet applied to uncover new disease biology. Especially considering the important role of human-specific cerebellar processes in specific developmental disorders (Haldipur et al. 2021) and cancer (Hendrikse et al. 2022, Smith et al. 2022), disease modeling in human cerebellar organoids holds great potential for understanding disease biology. The work by Kagermeier et al. demonstrates that human cerebellar organoids are recapitulating brain region-specific growth deficits and thus is an important step forward for disease modeling. Therefore, this work will be interesting to researchers working on brain development and disease modeling, especially in in-vitro systems. Nevertheless, the mechanistic insight of the study is limited, as is the insight into how human-specific processes might be involved in the pathogenesis of PCH2a. Therefore, it will be interesting how this disease model will be used in future to investigate the cell types and mechanisms involved in the PCH2a phenotype.

      Personal field of expertise: Brain organoids and disease modeling in organoids especially of neurodevelopmental diseases. Analysis of organoids with stainings, as well as sequencing techniques, and bioinformatics.

      References:

      Eichmüller, O. L. & Knoblich, J. A. Human cerebral organoids - a new tool for clinical neurology research. Nat Rev Neurol 1-20 (2022) doi:10.1038/s41582-022-00723-9.

      Haldipur, P. et al. Evidence of disrupted rhombic lip development in the pathogenesis of Dandy-Walker malformation. Acta Neuropathol 142, 761-776 (2021).

      Hendrikse, L. D. et al. Failure of human rhombic lip differentiation underlies medulloblastoma formation. Nature 609, 1021-1028 (2022).

      Muguruma, K., Nishiyama, A., Kawakami, H., Hashimoto, K. & Sasai, Y. Self-Organization of Polarized Cerebellar Tissue in 3D Culture of Human Pluripotent Stem Cells. Cell Reports 10, 537-550 (2015).

      Nayler, S., Agarwal, D., Curion, F., Bowden, R. & Becker, E. B. E. High-resolution transcriptional landscape of xeno-free human induced pluripotent stem cell-derived cerebellar organoids. Sci Rep-uk 11, 12959 (2021).

      Silva, T. P. et al. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp (2020) doi:10.3791/61143.

      Smith, K. S. et al. Unified rhombic lip origins of group 3 and group 4 medulloblastoma. Nature 609, 1012-1020 (2022).

      References by the authors

      Aldinger KA, Thomson Z, Phelps IG, Haldipur P, Deng M, et al. 2021. Spatial and cell type transcriptional landscape of human cerebellar development. Nat Neurosci 24: 1163-75

      Eichmüller OL, Knoblich JA. 2022. Human cerebral organoids — a new tool for clinical neurology research. Nature Reviews Neurology 18: 661-80

      Khakipoor S, Crouch EE, Mayer S. 2020. Human organoids to model the developing human neocortex in health and disease. Brain Res 1742: 146803

      Muguruma K, Nishiyama A, Kawakami H, Hashimoto K, Sasai Y. 2015. Self-organization of polarized cerebellar tissue in 3D culture of human pluripotent stem cells. Cell Rep 10: 537-50

      Sepp M, Leiss K, Sarropoulos I, Murat F, Okonechnikov K, et al. 2021.

      Silva TP, Fernandes TG, Nogueira DES, Rodrigues CAV, Bekman EP, et al. 2020. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp

      Strassler ET, Aalto-Setala K, Kiamehr M, Landmesser U, Krankel N. 2018. Age Is Relative-Impact of Donor Age on Induced Pluripotent Stem Cell-Derived Cell Functionality. Front Cardiovasc Med 5: 4

      Studer L, Vera E, Cornacchia D. 2015. Programming and Reprogramming Cellular Age in the Era of Induced Pluripotency. Cell Stem Cell 16: 591-600

      Velasco S, Paulsen B, Arlotta P. 2020. 3D Brain Organoids: Studying Brain Development and Disease Outside the Embryo. Annu Rev Neurosci 43: 375-89

      Watson LM, Wong MMK, Vowles J, Cowley SA, Becker EBE. 2018. A Simplified Method for Generating Purkinje Cells from Human-Induced Pluripotent Stem Cells. Cerebellum 17: 419-27

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: In this study Kagermeier et al. use human cerebellar and neocortical organoids to investigate the effects of the PCH2a-causing homozygous TSEN54c.919G>T variant on the neurodevelopment of different brain regions. They reveal a substantial growth defect in both neocortical and cerebellar regions with a more profound phenotype in the cerebellum. They continue to investigate major cell types of neurodevelopment in both regions and briefly potential mechanisms underlying the phenotypes. The study is well conceived and addresses the current gap of disease-modeling in cerebellar organoids; nevertheless, some major claims are not sufficiently substantiated in the current version. Below, I provide suggestions on how to improve the manuscript with some additional minor comments that might help with readability and accessibility of the work.

      Major comments: 1. TSEN54 expression levels: The authors compare RNA and protein expression levels for TSEN54 to investigate the mutation's effect. For this the authors use qPCR on iPSCs and organoids of different age and immunostainings and conclude "we did not find differences in expression between cell and tissue types". There are some issues with this analysis as explained below: -The qPCR data (Fig. 2B) is first normalized to a housekeeping gene (GAPDH), however, then all organoid data are additionally normalized to the respective iPSC line. Thus, in case there is already a difference on iPSC level, this normalization might mask any difference in the organoids. It is unclear why this approach was chosen, and it seems more appropriate to show the data just normalized to GAPDH than additionally normalizing to the iPSCs, or at least to show first that iPSCs do not have differences in TSEN54 expression. Furthermore, even though apparently not statistically significant there seems to be a strong trend of lower TSEN54 levels in PCH2a in neocortical organoids, but even more so in cerebellar organoids. In my view this would fit very well with the study and should be further explored before concluding there is no statistical difference. Considering the high error bars of the cerebellar organoid samples, a higher N-number might be necessary to reach statistical significance in the difference in expression. Most importantly, it would be appropriate to show single data points where possible and to mark the different cell lines (as done in other figures), as otherwise it is not possible to judge whether there is a cell line bias in the data. -The evidence for protein expression of TSEN54 is immunofluorescence stainings for all conditions. As there is no quantification, the authors should not conclude differences, or the lack thereof, based on this qualitative data. Furthermore, in fact in the on example shown the PCH2a cerebellar condition (Fig 2D) seems to show lower expression levels compared with other conditions. This could be due to the selected image, as all other examples include large neural rosettes with strong staining in the center of the rosettes. Furthermore, it is unclear what cell line these stainings come from, even whether the PCH2a cerebellar and neocortical stainings come from the same cell line. Thus, the authors should select comparable examples for all conditions, and ideally provide staining examples (e.g., as supplementary data) for the other replicates to ensure expression in all replicates. If the authors want to comment on differences in protein expression, maybe a quantitative approach (e.g., quantitative western blot) would be more appropriate. Otherwise, the statements should be adjusted to not conclude whether TSEN54 protein levels differ or not. -Irrespective of the above comments the conclusion of the section "TSEN54 expression in cerebellar and neocortical organoids", that currently reads "we did not find differences in expression between cell and tissue types" should be changed, as the authors did not investigate whether there are cell type-specific differences of TSEN54 expression.

      1. Organoid growth analysis: The organoid growth analysis in Figure 3 and supplementary Figure 2 shows the main phenotype of the study that seems to be very strong. The authors use unpaired t-tests to compare within the different timepoints. Unfortunately, I think this approach might not be appropriate as even though the Welch correction does not rely on similar SDs in the compared groups (Control vs. PCH2a), it still assumes that all data points within each group share the same variance. However, this is not the case, as e.g., the control condition includes three groups (Control-1 to -3), that between groups might have different variance as such not all datapoints are independent from each other. Potentially ANOVA analyses controlling for cell line and timepoint might be more appropriate. Or additionally, the authors could consider using the linear regression analysis in Supplementary Figure 2 to further investigate the difference in organoid growth by e.g., comparing the slope of the regression lines. This might be more appropriately reflecting the growth deficit over time than simply comparing each timepoint individually. Expanding on this analysis the regression analysis requires some more information on the fit (intercept, slope, R-squared of the model), which would help clarifying the growth dynamics in the different systems and conditions. The growth ratio analysis (Figure 3D) is essential to the major claim of the paper that the organoids replicate the region-specific differences. As the authors performed all experiments with matching cell lines this could additionally strengthen the argument by generating the ratio of size differences for each cell line separately (instead of just for all PCH2a lines together). This would allow comparison of the same genetic background in both cerebellar and neocortical condition and further corroborate the region-specific severity of the phenotype. Potentially, this would also enable to test these differences statistically. Additionally, all growth analyses for the neocortical organoids (Figure 3C, Supplementary Figure 2B and C) seem to lack the PCH-1 cell line and only contain PCH-2 and PCH-3. This cell line should be added or commented on why it was excluded from the analyses.

      2. Potential mechanism of the phenotype (apoptosis analysis): In Figure 6 the authors investigate the hypothesis that increased apoptosis contributes to the phenotypes. In the cleaved Caspase 3 staining there appear to be no differences. Unfortunately, the analysis apparently only includes one replicate (one organoid?) per cell line and condition. Considering the variability in the data shown this seems inappropriately low and should ideally contain ~3 replicates per cell line condition to judge technical and biological variability if the authors want to make the point that there is no "significant difference between PCH2a and control organoids at any time point in both cerebellar and neocortical organoids". Otherwise, this claim does not seem to be substantiated enough by the data. Finally, due to the absence of a phenotype related to apoptosis the authors conclude that the phenotypes may be due to "deficits in the proliferation of progenitor cells". Although this is mentioned in the introduction and the discussion, there is no evidence in the current study that supports this interesting idea. By adding relatively straight forward co-staining experiments for e.g., SOX2 (progenitors) and Ki67 (proliferating cells), the authors could provide further evidence for this hypothesis using existing organoid sections. This would support this speculative idea and could add a more mechanistic insight to the study, thereby making it more exciting.

      Minor comments: - Cell line and quality control: The authors recruit three male patients with PCH2a and reprogram iPSCs. These cell lines are subjected to a well performed extensive quality control. However, it is unclear what cell lines the stainings (e.g., Fig. 1D to I) originate from. Furthermore, the supplementary qPCR analysis (Supplementary Figure 1) includes only the PCH-1 line, and additionally two cell lines that are not explained (F-CO and hESC-I3). It is unclear what the relevance of showing the qPCR of these cell lines is. To ensure proper QC for all used cell lines the authors should provide data for all cell lines (PCH-1 to -3 and control-1 to -3), or at least summarize (e.g., in a table) what QC metrics were applied to which cell line. Most importantly, this information is completely lacking for the control cell lines and the QC is just mentioned in the text. Unfortunately, it is unclear where the control cell lines originate from, and some basic information would be required to judge whether they are appropriate controls: are they iPSC or ESC, were they reprogrammed with a similar paradigm as the PCH2a cells, what is the gender of the control cell lines (all PCH2a cell lines are apparently male)?

      • To make the study more approachable for a medical audience and to judge the variability in phenotype presentation among the recruited patients it would be appreciated if more information on the patients would be provided. The authors write: "We identified three individuals that display the genetic, clinical and brain imaging features previously described for PCH2a.". This information including age/date of birth, as well as other medically relevant information could be provided in the supplementary figure (e.g., is there a difference in disease burden among the different patients?). This would allow judging the recruited cohort better.

      • According to the method section the cerebellar and neocortical organoids were cultured in very different medium especially at later timepoints. While neocortical organoids were kept in a neural maintenance medium based on Neurobasal-A, cerebellar organoids were kept in a medium based on BrainPhys. These media contain very different levels of nutrients, especially of glucose (25mM vs 2.5mM, Bardy et al. 2015). This can have a strong phenotype on proliferation of progenitors and proliferative phenotypes (e.g., see Eichmüller et al. 2022). Especially as the authors claim that there is a difference in the PCH2a phenotypes between brain regions, it should be excluded that this is due to medium differences at later timepoints. When investigating the growth curves of Figure 3B and C it seems like the major difference in growth speed seems to be that neocortical organoids grow faster in early timepoints (<d30), but similar at later timepoints, which would exclude effects of the media at late timepoints. Nevertheless, considering the strong effect media glucose concentration can have the authors should investigate whether there is an effect at growth speed at later timepoints by comparing control organoids. This could also strengthen the region-specific phenotype due to PCH2a.

      • Staining examples shown and presentation: In several figures the authors could improve the presentation of the staining examples with some changes: o Cell line information for images: as the authors only ever note the condition (PCH2a or Control) but not the cell line it is unclear if the stainings all come from one cell line or from multiple different cell lines. This prevents comparing the different differentiation conditions. Additionally, for major conclusions the authors should consider including supplemental stainings or further information on how reproducible the results shown are (how many cell lines and batches were used?). o Selection of examples: in several cases (Fig 2C/D, 4A, 6A/B) the selected images depict very different regions, e.g., one condition shows a large rosette, while in the other condition no rosette can be seen. It would be more appropriate to show matching examples where possible. o Color code of stainings: Colors do not match throughout the manuscript in immunofluorescence images. E.g., Fig. 4 uses blue, green, red, magenta and Fig. 5 uses blue, green, magenta, cyan. It would be preferable to adhere to one color code. Considering significant fraction of the population is having red-green blindness, the latter color code seems more appropriate as it should ensure readability also for color-blind audiences.

      • Small typos: o Figure 1 legend: last sentence "The" instead of "Th" o Supplementary Figure 1B: PCH-2 is named "PCH-22" o Supplementary Figure 2: As in the main figure for neocortical organoids the PCH-1 condition is missing (see comment on organoid growth curves). Additionally, the color/shape code of the plots in B does not always match the legend (e.g., size in left plot is different and color of PCH-3 in middle and left plot differs from legend and right plot). o It is unclear why the cortical organoids are referred to as "neocortical organoids" in the figures and the text. The methods and the reference in the methods as well as all major papers rather use the word "cortical".

      References: Bardy, C. et al. Neuronal medium that supports basic synaptic functions and activity of human neurons in vitro. Proc National Acad Sci 112, E3312 (2015). Eichmüller, O. L. et al. Amplification of human interneuron progenitors promotes brain tumors and neurological defects. Science 375, (2022).

      CROSS-CONSULTATION COMMENTS I agree with the comments of the other reviewers and as they are mostly matching, this reinforces the importance to improve certain aspects of the manuscript. As there are no deviating issues I do not comment specifically on any reviewer comments.

      Significance

      This work is using organoid technology to shed light on brain region-specific phenotypes in PCH2a. Brain organoids have drastically changed the way we study human neurological diseases (Eichmüller and Knoblich 2022), however, most brain organoid research has focused on cortical organoids. Cerebellar organoid protocols exist for some time (Muguruma et al. 2015, Silva et al. 2020, Nayler et al. 2021) but were not yet applied to uncover new disease biology. Especially considering the important role of human-specific cerebellar processes in specific developmental disorders (Haldipur et al. 2021) and cancer (Hendrikse et al. 2022, Smith et al. 2022), disease modeling in human cerebellar organoids holds great potential for understanding disease biology. The work by Kagermeier et al. demonstrates that human cerebellar organoids are recapitulating brain region-specific growth deficits and thus is an important step forward for disease modeling. Therefore, this work will be interesting to researchers working on brain development and disease modeling, especially in in-vitro systems. Nevertheless, the mechanistic insight of the study is limited, as is the insight into how human-specific processes might be involved in the pathogenesis of PCH2a. Therefore, it will be interesting how this disease model will be used in future to investigate the cell types and mechanisms involved in the PCH2a phenotype.

      Personal field of expertise: Brain organoids and disease modeling in organoids especially of neurodevelopmental diseases. Analysis of organoids with stainings, as well as sequencing techniques, and bioinformatics.

      References:

      Eichmüller, O. L. & Knoblich, J. A. Human cerebral organoids - a new tool for clinical neurology research. Nat Rev Neurol 1-20 (2022) doi:10.1038/s41582-022-00723-9.

      Haldipur, P. et al. Evidence of disrupted rhombic lip development in the pathogenesis of Dandy-Walker malformation. Acta Neuropathol 142, 761-776 (2021).

      Hendrikse, L. D. et al. Failure of human rhombic lip differentiation underlies medulloblastoma formation. Nature 609, 1021-1028 (2022).

      Muguruma, K., Nishiyama, A., Kawakami, H., Hashimoto, K. & Sasai, Y. Self-Organization of Polarized Cerebellar Tissue in 3D Culture of Human Pluripotent Stem Cells. Cell Reports 10, 537-550 (2015).

      Nayler, S., Agarwal, D., Curion, F., Bowden, R. & Becker, E. B. E. High-resolution transcriptional landscape of xeno-free human induced pluripotent stem cell-derived cerebellar organoids. Sci Rep-uk 11, 12959 (2021).

      Silva, T. P. et al. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp (2020) doi:10.3791/61143.

      Smith, K. S. et al. Unified rhombic lip origins of group 3 and group 4 medulloblastoma. Nature 609, 1012-1020 (2022).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all three Reviewers for their thorough assessment of our manuscript and their constructive comments and suggestions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors generate several variants of actin that are internally tagged with short peptide tags. They identify one particular position that is able to tolerate various tags of 5-10 amino acids and still shows largely unaltered behavior in cells. They study incorporation of their tagged actins into filaments, characterize the interactions of G-actin variants with different associated proteins and show that retrograde actin flow in lamellipodia and the wound healing response of epithelial cells is not affected by the tagged variants. They then apply the tagged actin to study subcellular distribution of different actin isoforms in mammalian and yeast cells.

      The identification of a specific site in the actin protein that tolerates variable peptide insertions is very exciting and of fundamental interest for all research fields that deal with cytoskeletal rearrangements and cellular morphogenesis. The result demonstrating the functionality of actin variants with peptides inserted between aa 229 and 230 are generally convincing and well done. In particular, the generation of CRISPR/Cas9 genome edited versions of beta- and gamma actin are impressive. I therefore generally support publication of this study. There are however several technical and conceptual issues that should be addressed to improve quality and scope of the study. I listed some specific comments below:

      We thank the Reviewer for their constructive comments and general support for publication of our study.

      Major points

      - The biggest issue I have is the last section on the application of tagged actins to study isoform functions. In principle the application is very clear as there are simply no alternative ways to study isoform distribution in live cells. However, the experimental data are simply not convincing. What the authors define as "cortex" in Fig. 5A seems to rather represent cytosolic background mixed with radial fibers. I am not convinced that even the antibody staining with a relatively clear differential distribution of beta and gamma really shows a genuine accumulation of one isoform on stress fibers. It seems to me that the beta-actin staining has as higher cytosolic background and is generally weaker (gamma nicely labels transverse arcs), which reduces signal/noise and therefore yields a relatively increased level in areas with less-bundled actin. My suggestion is to select more clearly defined actin structures and to use micro-patterned cells to normalize the otherwise obstructing variability in actin organization. Possible structures would be cortical arcs in bow-shaped cells, lamellipodial edges (HT1080 seem to make very nice and large lamellipodia) or cell-cell contacts (confluent monolayer, provided cells don´t grow on top of each other). Stress fibers are possible but need to be segmented very precisely and I did not see any details on this in the methods section. For Fig. 5D: I assume cells were used where only one isoform was tagged? This is technical weak and the double-normalization is probably blurring any difference that might be occurring. Why not use a double-tagging strategy with ALFA/FLAG or ALFA/AU5 tags to exploit the constructs introduced in the previous figures? Also, the unique selling point of the strategy is the possibility of actual live imaging of specific isoforms. Cells that have stably integrated double tags and then transiently express nanobodies for ALFA and either AU5 or FLAG (or other if those don't exist) would make this possible. Considering the work already done in this manuscript, such an approach should actually be possible - did the authors attempt this or is there are reason it is not discussed? If double tagged cells are not possible for some reason it should at the very least be possible to combine ALFA-detection with the specific antibody against the other isoform and get rid of the double normalization.

      We thank the Reviewer for the various suggestions regarding the comparison between the localization of the tagged and native isoforms. In our reply below, we will separately discuss the possibilities and our considerations for fixed samples and live cell imaging. We apologize for the lengthy response but for transparency reasons, we would like to give a thorough overview of our efforts for isoform-specific localization in cells, something for which we have limited space in the manuscript.

      Fixed samples:

      It was a significant experimental challenge to comparing the labeling of the β- and γ-actin specific antibodies with our internally tagged actin system (Fig. 5A-D). The reason for this is that the labeling of the samples with the β- and γ-actin specific antibodies requires treatment with methanol (Dugina et al., J Cell Sci, 2009), most likely to disturb the interaction of actin with actin-binding proteins that prevent the binding of the antibodies due to steric hindrance. Methanol treatment, however, precludes the co-labeling with phalloidin, likely due to changes in the tertiary/quaternary protein structure of F-actin. Initially, we have put a lot of effort in trying to simultaneously label phalloidin with the actin specific antibodies but even very brief methanol treatment (seconds), before or after phalloidin labeling, completely prevents/reverses the binding of phalloidin. Importantly, also the ALFA tag labeling was suboptimal after methanol treatment.

      The fact that we could not perform these double labelings led us to perform different ratio calculations for the β- and γ-actin antibody and the ALFA tag labeling. In the case of the antibody immunofluorescence labeling, we simply divided the signal of the β-actin and γ-actin since we could simultaneously label the isoforms in the same cell. In the case of the ALFA tag labeling, we used phalloidin for independent signal normalization and then performed a second normalization. Although this complicates the normalization procedure (ALFA tag signal of β- and γ-actin is first normalized to total F-actin and then a ratio is calculated) and understandably leads to some confusion, this was the only way forward to obtain the results presented in the manuscript.

      The Reviewer points out that “What the authors define as "cortex" in Fig. 5A seems to rather represent cytosolic background mixed with radial fibers.”. In our images, we observe very little cytosolic background from both antibody stainings. More importantly, for the quantitative analysis, the fluorescence intensity values were corrected for the background values observed in cytosolic areas so even if the signal is present, it should not affect our analysis. We do admit though that we could have been more careful with the term “cortex” since the observed signal could indeed be a mix of radial fibers and the actin cortex. The reviewer further states that “I am not convinced that even the antibody staining with a relatively clear differential distribution of beta and gamma really shows a genuine accumulation of one isoform on stress fibers.” Although the differences are small, we consistently observe a differential fluorescence intensity of β- and γ-actin in actin-based structures with a relatively stronger signal of γ-actin in stress fibers (Fig. 5C). Since we always normalize the fluorescent signal intensity per cell, this strongly indicates a genuine accumulation of one isoform over the other in specific actin-based structures. This observation is very consistent in our experiments and also aligns with many published studies where differences in the localization of β- and γ-actin are reported in various cell types (Pasquier et al., Vasc Cell, 2015; van den Dries et al., Nat Comms, 2019; Malek et al., Int J Mol Sci, 2020). As for the segmentation, we mentioned in the Methods section that we selected small regions (0.5x0.5mm) that exclusively contain stress fiber or “cortex” regions. The regions shown in Fig. 5B are therefore larger than the analyzed regions, something which we will better indicate in the revised manuscript.

      Planned revision: We will provide a more detailed explanation of our quantitative analysis in the Methods section such that it is more clear how our normalization procedure was performed. Furthermore, we will adapt Fig. 5A-B such that it better visualizes how we defined the regions for quantification. As per the Reviewer’s suggestion, we will also apply a different experimental method to show that the tagged isoforms properly localize to actin-based structures. For this, we will attempt to use micropatterned cells to induce clearly define actin-bases structures (the crossbows as suggested by the Reviewer) and also explore the possibilities of investigating the differential localization in double-tagged cells. We will also reconsider the use of the term “cortex” for the region that is pointed out in Fig. 5A-B.

      Live cell imaging:

      We agree with the Reviewer that it would be very valuable to attempt simultaneous live cell imaging of two isoforms. Yet, for this, we would need two tag/fluorophore systems that allow the visualization of internally tagged isoforms in living cells. As presented in our original manuscript, we have successfully inserted many different epitope tags (FLAG/AU1/AU5/ALFA) in the T229/A230 position to demonstrate the versatility of our tagging approach. Yet, despite significant efforts to identify a second tag/fluorophore system that would allow isoform-specific live cell imaging, we only succeeded in designing one strategy to perform live cell imaging, i.e. with the ALFA tag (Götzke, Nat Comms, 2019). Part of the reason for this is that so far, no high affinity nanobodies have been generated against the classical epitope tags (FLAG, AU5 etc.). This is an established challenge since classical epitope tags are typically linear/unstructured while nanobodies require folded secondary structures for epitope recognition such as alpha helices (the ALFA tag was specifically designed as such).

      Besides the successful ALFA tag approach we have tried the following additional approaches for live cell imaging: 1) __full-length GFP, 2) full-length GFP with linker, 3) GFP11 (to complement with GFP1-10 (Cabantous et al., Nat Biotech, 2005) 4) GFP11 with linker 5) FLAG Frankenbodies (Zhao et al., Nat Comms, 2019; Liu et al., Genes Cells, 2021) in FLAG IntAct cells and 6) __Tetracysteine/FlAsH labeling. Importantly, each of these additional internally tagged actins, except for those that contained full-length GFP, showed a high colocalization with the cytoskeleton, again demonstrating the versatility of the T229/A230 position to tag actin. Unfortunately, none of these approaches satisfactorily visualized the actin isoforms in living cells. We will therefore briefly summarize our findings here.

      (1-2, integration of full-length GFP and GFP with linker) Probably not surprisingly, but integrating the entire coding sequence of GFP or GFP flanked by linkers (each 5AA in length) within the T229/A230 position did not results in a proper localization of actin.

      (3-4, integration of GFP11 and GFP11 with linker) Next, we assessed the localization of the GFP11 tagged actin versions (GFP11: 16AA, GFP11+linker: 26AA). Because GFP11 is not visible without GFP1-10 complementation, we also tagged actin at the N-terminus simply for proof of concept where the internally tagged actins would end up. Interestingly, both GFP11-actin and GFP11+linker-actin properly integrated within the cytoskeleton as demonstrated by the FLAG staining. This again demonstrates the versatility of the T229/A230 position and strongly suggests that even the integration of 26AA within this position does only minimally affect the polymerization of actin into the cytoskeleton.

      (3-4) After confirmation of the proper integration of GP11-actin and GFP+linker-actin we continue to express the GFP1-10 in these cells. Unfortunately, this resulted in no or only very minimal localization of the actin to the cytoskeleton, demonstrating that GFP-complementation hampers the integration into the cytoskeleton.

      (5, use of FLAG Frankenbodies) We also expressed FLAG Frankenbodies into our FLAG IntAct cells in an attempt to visualize the isoforms in living cells. FLAG Frankenbodies are single chain antibodies fused to GFP and can be expressed in cells to visualize FLAG-tagged proteins (Liu et al., Genes Cells, 2021). Although a cytoskeletal labeling was indeed discernable in some cells, the FLAG Frankenbody signal overlapped much less with the total actin signal as compared to the FLAG immunofluorescence labeling, indicating that the incorporation of the FLAG-tagged actin was much less in the presence of the FLAG Frankenbody. Also, a significant fraction of the cells demonstrated a homogenous cytosolic signal.

      (6, Use of tetracysteine/FlAsH) Although the tetracysteine tag/FlAsH system is widely known to induce artefacts, we still aimed to evaluate if for live cell imaging of IntAct actins. Similar to GFP11, we first determined the integration of tetracysteine-actin into the cytoskeleton with the use of an additional N-terminal FLAG tag and demonstrate that it was properly integrated into the actin cytoskeleton. Unfortunately, after brief incubation with FlAsH-EDT2, we noted 1) a significant amount of background fluorescence, preventing proper actin visualization and 2) that the cell became static indicating toxicity of the FlAsH-EDT2 compound. Titrating down the amount of FlAsH-EDT2 did not alleviate these drawbacks and only resulted in less fluorescence.

      Overall, based on these experiments, we concluded that the T229/A230 position itself is very versatile, as demonstrated by the proper localization of the GFP11-actin variants and the TetraCys-actin. At the same time, none of these tag/fluorophore systems properly visualized actin in living cells. Although we are unsure what the reason is for this, it is easily imaginable that the on/off kinetics of the split GFP system and the FLAG Frankenbodies are suboptimal to allow for the rapid and continuous integration of actin monomers into the F-actin cytoskeleton. We therefore also concluded that currently, the ALFA tag/nanobody system is apparently unique in its ability to visualize epitope tagged actin in living cells (as shown in the manuscript). For simultaneous visualization of multiple isoforms, we rely on progress on the development of novel nanobody-based tags, something we hope the Reviewer will agree is outside the scope of the current work.

      *- The authors make a point of comparing the internally tagged actin to N-terminal tags that are mostly functional but have been shown to affect translational efficiency. I would strongly suggest to include N-terminally tagged actin as control for all assays in this study. Also for the physiological assays (retrograde flow, wound healing), a positive control is missing that shows some effect. Previous studies showed defects with transiently expressed actin with an N-terminal GFP. As retrograde flow measurements are very sensitive to the exact position of the kymographs and wound healing assays is a very crude and indirect readout, such a positive control is essential. *

      We acknowledge that N-terminally tagged actin has been used extensively for actin research (especially before the introduction of Lifeact). For our studies, however, we were specifically interested in whether the internally tagged actins show similar characteristics as compared to wildtype actin. We have not included N-terminally tagged actin in all of our experiments, since this would not affect our conclusions with respect to the functionality of our internally tagged actins. We expect that for future investigations to for example further establish the importance of actin N-terminal modifications in the differential regulation of actin isoforms, the comparison between internally and N-terminally tagged actins could be very instrumental. Yet, we consider this comparison outside the scope of the current manuscript. For now, the results in the manuscript provide evidence that our approach is unique with respect to the fact that it allows isoform-specific tagging without manipulating the N-terminus. As such, our internal tagging system complements the already existing repertoire of actin reporting methods (N-terminal fusion, Lifeact, F-Tractin, actin nanobodies) and allows researchers to study so far unknown properties of actin variants.

      *- Expression of tagged actins in yeast is a very nice idea but it would be far more informative to express the tagged forms as the only copy of actin. This can either be done by directly replacing endogenous actin gene in S. cerevisiae, or (if the tagged versions are not viable) - using the established plasmid shuffle system (express actin on counter-selectable plasmid, then knock out endogenous copy and introduce additional plasmid with tagged actin, then force original plasmid out). In the presence of endogenous S. cerevisiae actin the shown effects are very hard to interpret as nothing is known about relative protein levels (endogenous vs. introduced). Also, if constitutive expression of the ALFA nanobody is harmful for integration into cables, why not perform inducible expression of the nanobody and observe labeling after induction. For the live imaging a robust cable marker is needed, like Abp140-GFP. Finally, indicate the sequence differences between the used actin forms in yeast (supplementary figure with sequence alignment and clear indication of all variations) *

      We thank the reviewer for their positive comments and feedback regarding expression of IntAct variants in yeast. Currently, we have expressed IntAct as an extra copy in the presence of native Act1 of S. cerevisiae. All the IntAct variants have been expressed under a commonly used constitutive TEF1 promoter. We agree with the Reviewer that it would be valuable to attempt to express the tagged forms as the only copy of actin.

      Planned revisions:

      1) As per the Reviewer’s suggestion, we will attempt to make yeast strains with IntAct as the sole expressing actin copy by using the well-established 5-FOA-based plasmid shuffle system in yeast. We will use a ∆act1 strain containing wildtype act1 in a centromeric ura-plasmid described in Harrer et. al, 2007 (generously shared by Prof. Jessica and Prof. Amberg at Upstate Medical University of New York, USA) and express IntAct exogenously via additional plasmids. Shuffling of these strains on 5-FOA will cause the loss of ura-plasmid containing the wildtype act1 copy and will determine whether yeast cells will be able to survive with IntAct as the sole source of actin. If the cells do survive with IntAct as a sole copy, we will perform subsequent analysis for assessing actin cytoskeleton organization under these conditions.

      2) As the reviewer has mentioned, expression of NbALFA during live-cell imaging experiments hindered incorporation of IntAct into linear actin cables in yeast (Suppl. Fig. S13). As per the reviewer’s suggestion, we will now try to create an inducible-expression system for the NbALFA-mNG and observe its effects on incorporation into formin-made actin cables after induction. We have already created NbALFA-mNG constructs under galactose-inducible GALS and GAL1 promoters and are currently constructing yeast strains for these experiments.

      __3) __We will add an extra supplementary Figure to indicate the sequence differences of the various actin variants that we have expressed in yeast.

      - As the authors clearly show good integration of several tagged actins into filaments I would expand the structural characterization: perform alpha fold predictions of actin monomer structures including the various tags to show the expected orientation. It is striking that the only integration site that seems to work well is at the last position of a short helix, indicating that the orientation of the integrated peptide might be fixed in space and be optimal to minimize interference. Also, a docking of the tag onto the recently published cryoEM structures of the actin filament should be shown to indicate where it resides compared to tropomyosin or the major groove where most side binding proteins seem to bind.

      We already performed AlphaFold predictions of the tagged actin monomers, but we have decided to not include these predictions in the manuscript because of two reasons. First and foremost, while the prediction confidence of the non-tagged region is very high (pLDDT > 90), the prediction confidence of the tagged region is very low (pLDDT https://alphafold.ebi.ac.uk/faq), pLDDT values below 70 should be treated with caution and values below 50 should not be interpreted. Intriguingly, the low confidence aligns with the fact that for both tags, the prediction does not match with known features of the tag. The FLAG tag should be a linear/unstructured region in order to be recognized by the antibody and the ALFA tag should organize into an alpha helix (Götzke et al., Nat Comms, 2019). Yet, in the prediction, the FLAG tag partially continues as an alpha helix and the ALFA tag is only a small helix with part of the tag being unstructured. Second, more minor, reason for not including the predictions is that AlphaFold does not predict to what extend the tag is flexible, which means that even if the tagged region is predicted correctly, it is difficult to say whether the regions will interfere with binding of proteins.

      Despite the low prediction confidence, we used the published actin-tropomyosin cryoEM structure (von der Ecken et al., Nature, 2015) to replace WT actin with ALFA tag actin and the results are shown below. Again, although results should be interpreted with caution, the tag does not seem to obstruct monomer-monomer interactions within an F-actin filament and also the tropomyosin binding surface is relatively distant from the tag region, suggesting that these interactions are likely not disturbed by introducing the tag.

      - For any claims regarding usability of tagged variants for isoform research it would be very important to characterize the known posttranslational modifications of tagged actin variants - are the differences between beta and gamma maintained on this level as well?

      Planned revision: Following the Reviewer’s suggestion, we will perform a western blot analysis to compare posttranslational modification (arginylation) of tagged and wildtype actins.

      Technical issues

      - There is no scale for the color coding in Fig. 5A, B

      We deliberately did not add a numerical scale because the images are normalized which means that presenting the actual numbers might be misleading. The numbers could be interpreted as if they actually present the amount of β-actin relative to γ-actin which is not the case due to staining differences and the normalization procedure.

      - The y-scales for Fig. 5C and D need to be identical to allow direct comparison

      Planned revision: We will adapt the scale of Fig. 5D to make it identical to Fig. 5C. Following the other suggestions of the reviewer, we will also critically evaluate our normalization procedure and present those numbers in Fig. 5C-D if the values turn out to be different.

      - Pearson coefficient should not be normalized to a control value as its already a dimensionless parameter. Always report actual R-value - also remove R2 values for Pearson as this makes no sense in this context (not sure if it was a typo or intended).

      We normalized the Pearson coefficient values for visual representation of the results. The majority of the raw coefficient values (more than 80%) are between 0.20 and 0.75 (see raw values in the associated excel file). Theoretically, Pearson coefficient values are possible between 1 (or-1 for negative correlations) and 0. The much smaller window in our values as compared to the theoretical window (0.55 vs 1) led us to normalize the values such that they can be presented on a scale from “maximum expected colocalization” to “minimum expected colocalization”. In this way, the differences between the various tagged actins are much better appreciated in the Figure. As to reporting the R2, the Reviewer is correct. Reporting the R2 is an inadvertent mistake from our side and we will correct it.

      Planned revision: We will change the R2 in the text to PCC or Pearson Correlation Coefficient.

      *- All values on subcellular regions (like stress fiber or cortex) dependet critically on the way thesese regions were thresholded or identified. Provide all details on how this was done in the methods section and ensure that adequate background subtraction and normalization is applied. Optimally, an unbiased (AI or automated) approach based on simple image statistics is used for this to avoid personal bias. *

      Planned revision: As also indicated above, we will add new experiments to better compare the localization of the isoforms in tagged and parental cells. These new experiments will also be accompanied by a more detailed explanation of how the regions were selected and quantified.

      - In Fig. 2A only heterozygous FLAG-actin cells are used. Why not use a homozygous line (for both beta and gamma actin)? The nice band shift of the FLAG version would allow the precise quantification of the fraction of total actin covered by beta and gamma actin, which then could provide some additional info for the apparently weaker beta staining in Fig. 5 (if beta expression is simply weaker). This would be a very simple and useful advantage of the internal tags that could be widely applied.

      In Fig. 2A, we used the heterozygous FLAG-actin cells to directly compare the production of β-actin from the knock-in allele and the wildtype allele in the same cells. The fact that the two bands observed in this western blot analysis (upper and lower) are almost the same (with the FLAG band being a bit more intense) provides the strongest indication that the tag does not interfere with the expression of actin. In Suppl. Fig. 5D, we show that the expression of β-actin is also unaffected in the hemizygous FLAG actin cells, which exclusively express tagged actin.

      Planned revision: As per the Reviewer’s suggestion, we will also add a western blot analysis on the expression of both actin isoforms and total actin in hemizygous cells.

      *- Fig. 3: control with N-terminal tag is missing. Also, why is it not possible to assay filament binding factors like Myosin, Filamin or alpha actinin - instead of co-IP a simple co-sedimentation assay with cell extracts in F-buffer should pick up any major difference in decoration of filaments containing the ALFA tag. Using two speeds for centrifugation it might even be possible to observe effects on filament bundling. The best approach for this would of course be to purify tagged actins and perform in vitro assays but this is clearly beyond the scope of what the authors intended here. I personally think that a broad acceptance of the marker will only come once the biochemistry has been sufficiently characterized so this is a future direction I would strongly encourage. *

      We kindly refer to our response on Page 5/6 for why we have not included the N-terminal control.

      Planned revision: The co-sedimentation assay is an excellent suggestion by the reviewer. Following the Reviewer’s suggestion, we will perform F/G-actin fractionation and assess the presence of several F-actin associated proteins in the F-actin fraction.

      - Fig. 2A has no loading control

      We show this western blot to indicate that the WT actin and tagged actin are expressed at similar levels in the heterozygous knock-in cells. For this, no loading control is needed because we only compare the intensity of the upper band (tagged actin) with the lower band (WT actin).

      - The RPE-1 data are confusing as several constructs show very different localization (completely cytosolic) to HT1080 cells and there is no possible explanation given for this. Maybe simply remove this data set?

      We agree with the reviewer that the differences in the localization between some of the internally tagged actins between the HT1080 and RPE1 cells might be confusing, especially for the A230-A231 variant for example. Yet, the fact that also in these cells, the T229-A230 variant performs equally well as compared to N-terminally tagged actin is an important confirmation that this variant is properly integrated into actin-based structures, independent of cell type. This makes the support for choosing this variant to continue with our studies stronger. A possible explanation for the differences is that RPE1 cells in general tend to form more stress fibers as compared to the HT1080. Since the localization to stress fibers is different between the internally tagged actins, this may explain the differences observed in colocalization.

      __Planned revision: __We will add a short text, in the Results or the Discussion, on the differences between the colocalization values between HT1080 and RPE1 cells.

      *- The angel measurements for lamellipodial actin is not very meaningful: the angel is determined for the radial bundles, which do not correspond to the Arp2/3 angel of single filaments and is likely the results of different nucleation factors, I would suggest to remove this. If angel measurement are really intended, cryoEM needs to be performed. *

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      - Replace all SEM with SD values - use at least 3 biological replicates (4D SEM of n=2)

      Planned revision: We will carefully check our statistics and revise where appropriate.

      Minor points

      - Intro: after listing all the details already understood on actin isoforms it is not very convincing to simply state the molecular principles remain largely unclear (l 34) - maybe better "there is no way to study actin dynamics due to current limitations of specific antibodies to fixed samples. Interesting option would be actually to develop nanobodies that are isoform specific.

      We will rephrase the text in the introduction. Regarding the development isoform-specific nanobodies. Although this sounds like a promising way forward, this would likely not result in isoform-specific targeting in living cells. Similar to the antibodies, isoform-specific nanobodies would have to be generated against the N-terminus which, under native conditions, is likely not available due to the occupation with actin-binding protein. Also, since the N-terminus is not structured, it may be extremely challenging to generate nanobodies against these epitopes.

      *- L 71: "involved" in the kinetics is not a good term - maybe affects or regulates.... *

      We will rephrase the text.

      - L148: "suspect" instead of "expect" - this clonal variation is actually a big danger of the employed approach as possible defects in actin organization could be masked by compensatory changes - it would generally be good to show critical data for at least 3 independent clones to rule out dominant selection effects.

      We will rephrase. We agree that clonal variation could be a danger if actin levels are to be investigated. For future follow-up studies, we plan to make additional cell lines to avoid clone-specific conclusions.

      ***Referees cross-commenting** *

      *I completely agree with the comments by reviewer 2 on the various missing controls - adding several or all of those will make the results much more convincing. The key for the adaptation of any new actin probe will be the level of confidence researchers have on the doumented effects. Even some negative effects on actin behavior (I am sure there will be some) should not prevent usage of the strategy as long as there is robust and convincing documentation of those effects. I also agree that including some basic in vitro characterization will go a long way to convince people dierectly working on actin (there is a very high level of biochemical understanding in that field). *

      Planned revision: We will perform the essential controls as suggested by Reviewer 2. Furthermore, for future experiments, we do envisage the production and purification of internally tagged actins and investigate their binding properties in in vitro reconstitution assays. We have already started with optimizing these approaches through our ongoing collaboration (KD, SP).

      Reviewer #1 (Significance (Required)):

      *Significance: Very useful finding that can be applied to any question related to actin-dependent cellular processes (morphogenesis, cell division, cell polarization, cell migration etc.) *

      *Strength: main finding convincing, strong genome edited cell lines *

      *Limitations: application to study of isoforms very limited and data not convincing, statistics and image quantifications need improvement *

      *Advance: identify new location for integral tagging of actin, which was not really possible before. The main relevance is for fundamental cell biology but the approach can also be applied to the study of disease variants in actin. *

      Audience: general cell biology - very broad interest

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      Actin is highly sensitive to modifications, and tagging it with fluorescent proteins or even smaller motifs can affect its function. The most well-known example of this is that fission yeast where actin has been replaced with GFP-actin are inviable (Wu and Pollard, Science 2005) because the labeled actin cannot incorporate into the formin-dependent filaments that make up the cytokinetic ring. Subsequent experiments revealed that formins filter out GFP-actin monomers, as well as monomers that are labeled with smaller fluorescent motifs (Chen et al, J. Structural Biology 2012). Further, attempts to make mammalian cells lines where GFP-beta-actin was knocked into one allele resulted in extreme down-regulation of the GFP-labeled actin, indicating that there is some implicit toxicity with the labeled version. To my knowledge, all attempts at making homozygous GFP-actin knock-ins have been unsuccessful. Therefore, while GFP-actin or other labeled variants can be over-expressed in many different cell types with some success, there is always the question of how faithful the labeled actin represents bona fide actin localization and dynamics.

      To address this van Zwam et al. have developed a clever strategy of screening actin for internal motifs that can tolerate incorporation of a tag without affecting its function. They appear to have found a good candidate, named IntAct, and provide evidence that this tagging position allows the actin to be functional in both human and yeast cells. The work is very promising, and many of the assays performed satisfy the criteria of rigor and reproducibility. Importantly, the authors have created knock-in human cell lines where the tagged actin is expressed at normal levels, including a double allele knock-in that is viable and has normal proliferation and motility. Additionally, the authors show that labeled S. cerevisiae actin can incorporate into actin cables, which are formin dependent. IntAct constructs were shown to interact with several well-known actin binding proteins and localized well to many different actin structures. There was also interesting data obtained from tagging both beta and gamma actin in human cells. However, as an actin scientist eager for new probes to visualize actin in cells, there are still questions about the functionality of these probes. Addressing these issues, listed below, would alleviate the concerns I still have about IntActs after going through the manuscript. IntActs have the potential to have a large impact on cytoskeletal research if it can be rigorously documented that they are functionally as close to unlabeled actin as possible.

      We thank the Reviewer for their constructive comments and general positive evaluation of our study.

      *Reviewer #2 (Significance (Required)): *

      Concerns:

      1. There are no negative controls performed for either the fixed or live-cell imaging of IntAct. Since the fixed cell data is heavily reliant on the presence of flag-labeled puncta at actin filaments, it is important to show that the immunocytochemistry protocol doesn't produce anything that would mimic the localization of actin. For the live cell data, there has been no effort made to show that the binding of the nanobody to the ALFA tag on InAct is specific.

      Planned revision: __We will add the following controls to exclude that any of the labeling procedures produces anything that would mimic the localization of actin: 1) Immunofluorescence staining of the used tags (FLAG/ALFA) in cells that do not have tagged actins 2) Expression of ALFA-Nb-GFP and ALFA-Nb-mScarlet in cells that do not have tagged actins 3)__ Expression of free GFP in cells that have tagged actins. We will co-stain these cells with phalloidin to visualize F-actin and determine if any signal is specifically localized to the actin cytoskeleton.

      2. The homozygous ALFA-tagged IntAct cells have a 50% reduction in the amount of actin expression (Fig. 2D). What is the F:G ratio in these cells? The F:G measurement is only shown for the FLAG-tagged heterozygous IntAct cells, which have the worst co-localization with phalloidin (Fig. 2F) and were not used for subsequent figures. I appreciate that motility and proliferation were measured and shown to not be affected (Fig. 4D,E) , but in our lab reducing the amount of polymerized actin by 50% (which may be more in ALFA-tagged IntAct cells if the F:G changes) has catastrophic effects on other cytoskeletal and organelle systems. Since the homozygous ALFA IntAct cells are the main ones used in the manuscript, they should be the ones that are fully characterized.

      We would like to point out that the reduction is only 20-25 percent depending on the specific western blot analysis and the loading control. Still, the Reviewer is correct about the necessity of the F:G actin measurements of the ALFA-tagged IntAct cells and we therefore included those as Suppl. Fig. 9 in the original manuscript (text on page 9). The quantification of these assays clearly demonstrated that the F-G actin ratio in the ALFA-tagged IntAct cells is the same as in parental cells.

      3. It is not addressed if expressing the ALFA-Nb-GFP construct in ALFA-IntAct cells alter actin properties? This is essential information for live cell imaging experiments.

      Planned revision: We have already performed proliferation and migration experiments in cells that stably express the ALFA-Nb-GFP. These data indicated that proliferation and migration are not affected by the presence of the nanobody and these data will be included in the revised manuscript. To note, in the original manuscript, we already showed that treadmilling of actin at the lamellipodia is not affected by the presence of the ALFA-Nb-GFP.

      4. It is not addressed how much of the ALFA-IntAct gets labeled with ALFA-Nb-GFP and how uniform the labelling.

      We do not understand this specific request of the Reviewer. To our knowledge, it is not possible to assess how much of a probe (in this case the ALFA-Nb-GFP) binds the target (in this case the ALFA-IntAct actins) in living cells. This is not only the case for the ALFA-Nb-GFP but also for any other probe. As an example, when expressing Lifeact, we also do not know how much of the actin molecules within F-actin get labeled with Lifeact and how uniform the labeling is. From the results of the live-cell imaging we can only conclude that the binding is at least so effective that we can readily observe and discern all the actin-based structures that are also observed by Lifeact (see Suppl. Fig. 8 for Lifeact-GFP/ALFA-Nb-mScarlet cotransfection). Whether the regions that do not have F-actin only contain ALFA-Nb-GFP that is bound to actin monomers or also contains a significant fraction of free ALFA-Nb-GFP seems an issue that cannot be addressed.

      5. To assess lamellapodia architecture, "branched actin angle" is measured using AiryScan imaging of actin filaments. This type of microscopy does not offer the ability to image individual actin filaments; what is actually being measured is the orientation of actin bundles to each other. It should be impossible to image the orientation of actin filaments in Arp2/3 dendritic networks and it is surprising that the measurements average to 70 degrees. A suitable substitute for this would be to measure the size and amount of F-actin in phalloidin-stained lamellipodia using kymograph analysis.

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      6. Was it possible to make an IntAct gene substitution in yeast?

      Planned revision: We thank the reviewer for this interesting question and as also suggested by Reviewer 1, we are now constructing yeast strains with IntAct as the sole expressing actin copy by using the well-established plasmid shuffle system in yeast. The results of these experiments will determine the ability of IntAct to completely substitute actin function in yeast.

      Also, while this is not necessary for this manuscript, making a fission yeast strain where actin has been substituted with IntAct and demonstrating that IntAct gets incorporated into the cytoplasmic ring and into Cdc12p-polymerized filaments would alleviate MANY potential concerns people would have about these probes by directly assessing situations were other labeled actins have been documented to fail. Along the same lines, it would have been nice to see a comparison in some of the assays of ALFA-IntAct and GFP-actin or another labeled actin variant.

      We appreciate the reviewer for their constructive feedback and completely agree that it is important to document how IntAct behaves in scenarios where other labelled actins have failed. As a proof of principle, IntAct incorporates into both formin- and Arp2/3- made linear and branched actin filaments in yeast (Fig.5E, Suppl. Fig. 14) and this data shows that IntAct labelling strategy is the first to achieve good integration into both these structures as previous efforts with labelled actin such as GFP-Actin fail to incorporate into formin-made actin filaments (Doyle et al., PNAS, 1996). Thus, we believe that IntAct does perform better than other labelled actins in yeast, although, further optimizations are required to overcome limitations regarding incorporation into actin cables in the presence of the ALFA nanobody.

      Planned revision: We have already extended applicability of IntAct to another well-known fungal model system, the fission yeast Schizosaccharomyces pombe (S. pombe). We expressed IntAct variants of human β- and γ- actin, budding yeast actin (Sc-IntAct) and fission yeast actin (Sp-IntAct) from an exogenous plasmid under the native S. pombe actin promoter in an S. pombe strain that constitutively expresses the Nb-ALFA-mNG. Live-cell microscopy of S. pombe cells expressing these proteins revealed that all IntAct variants localize to actin patch-like structures located at the cell poles and cell division site (during cytokinesis). These structures show similar dynamics as reported for actin patches of S. pombe previously (Pelham et al., Nat Cell Biol, 2001). These preliminary results suggest that IntAct proteins show a similar localization pattern to only branched actin networks found in the actin patches of S. pombe like we had previously observed for the budding yeast, S. cerevisiae (Fig. S13 in manuscript). The underlying mechanism for this exclusion from linear actin cable network from both budding and fission yeast remain unknown and may represent an inherent specificity and sensitivity of yeast formins. Our current and future experiments will express IntAct variants in absence of the ALFA nanobody and determine the level of incorporation into actin cables, patches, and actomyosin ring.

      Planned revision: We have also already performed a quantitative analysis to ascertain the effect of Sc-IntAct expression of cortical actin patch dynamics which represent sites of endocytosis in yeast (Young et al., J Cell Biol, 2004; Winter et al., Curr Biol, 1997). We compared actin cortical patch lifetimes between wildtype cells and cells expressing Sc-Act1 or Sc-IntAct as an extra copy. We used Abp1-3xmcherry as a marker for actin patches and quantified the time window between the appearance and disappearance of a patch (actin patch lifetime) from time-lapse microscopy experiments. Our preliminary results indicate that actin patch lifetimes are unaffected by exogenous expression of both Sc-Act1 or Sc-IntAct suggesting that IntAct does not negatively influence or alter actin patch dynamics. These observations suggest its applicability as a direct visualization strategy for actin at the cortical patches in budding yeast alongside existing surrogate markers like Abp1, Arc15, etc (Goode et al., Genetics, 2015; Wirshing et al., J Cell Biol, 2023).

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      *Summary: *

      This paper tackles a new strategy to tag actin in cells, by identifying that incorporation of a tag of moderate size in subdomain 4 of actin minimally affects actin dynamics in cells, and does not perturb its interaction with known partners, as observed in pull-down assays.

      *Major comments: *

      The paper is interesting and experiments are convincing.

      *My main concerns are the following : *

      - Varland et al, is reporting a phosphorylation on Thr229 : I think the authors should mention and discuss this potential PTM that could be affected in IntAct.

      We thank the Reviewer for pointing this out. We are aware of this review that includes phosphorylation on Thr229 as a possible PTM. Yet, this PTM is only reported in one of the Tables of the Review and not further discussed in the text. It is also unclear how the authors determined that Thr229 is a possible phosphorylation site except for the notion that this residue is a threonine and exposed at the surface of the actin molecule. Together with the fact that there is no evidence from primary studies that Thr229 is phosphorylated, we therefore decided to not include it in our discussion.

      - The sequence in subdomain 4 (the alpha helix containing T229A230) is extremely conserved in animals, as well as in between the 6 human actin isoforms. This usually indicates a strong selection pressure on the residues. I think the authors should discuss how surprising it is that the T229A230 position can accomodate various tags while it is probably the place of interaction with other proteins and is playing an important role in the mechanical structural integrity of the actin itself.

      We thank the Reviewer for bringing up this important point. To a certain extent, the conservation argument is true for all of the residues/domains in actin. Any manipulation will change a conserved part of the actin molecule in one way or another and thereby potentially modify its function. This is also evident from the fact that for most of the internally tagged actins, we observed a very poor colocalization with the actin cytoskeleton (Fig. 1). While for the T229/A230, we have not observed any major effects yet, this certainly does not mean that no further changes or defects will be uncovered in future experiments. Nonetheless, since our approach is unique with respect to the fact that it allows isoform-specific tagging without manipulating the N-terminus, our internal tagging system complements the already existing repertoire of actin reporting methods (N-terminal fusion, Lifeact, F-Tractin, actin nanobodies) and allows researchers to study so far unknown properties of actin variants. We have already included in the discussion that, at this point, we can only speculate as to why this variant performs much better than the others (Page 16 of the manuscript) and that possible explanations are the location at the inner domain and the higher structural plasticity of this region as compared to the rest of the molecule, as found during an alanine mutagenesis screen (Rommelaere et al., Structure, 2003).

      - It is now well established that actin plays active and important roles in the nucleus : is ALFA-actin correctly translocated to the nucleus ?

      Planned revision: This is an interesting suggestion. We will perform nuclear-cytosol fractionation experiments and determine whether ALFA-actin is still correctly translocated to the nucleus.

      *- OPTIONAL: one may regret that there is no classical in vitro assays, such as pyrene assays to assess some kinetcis parameters on epitope-tagged actins. I guess this would make the paper a bit too large. Although, it will prove useful to better understand how much formin activity is affected (see below) *

      For further biochemical characterization and a detailed investigation of the precise assembly kinetics of the tagged actins, we (KD, SP) are already working together to set up in vitro reconstitution experiments. Yet, as also indicated by the Reviewer, we consider these experiments outside of the scope of the current work.

      *Minor comments: *

      Below are points that could be addressed by the authors to improve the manuscript readability and highlight some important points that are sometimes missing or are not properly discussed:

      -line 40 "...but the distinct N-terminal epitope is not available under native conditions preventing" is a bit too obscure. Can the authors say clearly what is meant by 'native conditions'?

      In our understanding, the term ‘native’ is generally used when referring to conditions in which proteins are in their natural state, without alterations due to heat or denaturants, and possibly also still interacting with their binding partners. We will rephrase to better indicate that in this specific case, we mean that the region that harbors the N-terminus is usually occupied by actin-binding proteins, preventing the binding of the antibody due to steric hindrance.

      - figure 1A : make a clearer correspondance between the number shown in panel A and the amino acid numbers displayed in panel C and G.

      Planned revision: This is a good point, we will add extra annotation in the graph to better link the panels with each other. We will also add additional annotation in Fig. 1D-F for the same purpose.

      - figure 1A : it could be informative to indicate subdomains in this panel.

      Planned revision: We will add the numbers for the subdomains in Fig. 1A.

      - figure 1C : normalized correlation cell : I am not sure I understand how the normalization of the Pearson coefficient is done. It is therefore not clear how can it >1 or >-1 ? This should be clearly explained in the method section of the paper.

      __Planned revision: __We will better explain the normalization procedure in the Methods section.

      - figure S4 : comes a bit too early when ALFA-actin has not been yet introduced in the main text. Please, reposition this part or provide data with the FLAG-tag version.

      Planned revision: This is a good point and completely overlooked by us. We will introduce this Figure later such that the ALFA tag is already introduced.

      - section starting line 121 : this section should be better motivated = Why are different tags being tested ? This comes later in the discussion, but the reader fails at following the reasoning/motivation here.

      Planned revision: We will add extra motivation for why we added multiple tags.

      - figure 2D, line 145 "We also evaluated actin protein expression in the homozygous ALFA-β-actin cells and this showed that the total amount of β-actin was slightly lower in the ALFA-β-actin cells compared to parental HT1080 cells (Fig. 2C-D)." 'Slightly' is not a very quantitative nor accurate term. please rephrase. Besides, a statistical test for the paired data would also be informative. Besides, data in figure S6B-D indeed show a correlated increase in the expression of Gamma-actin that compensate for the decrease in the Beta-actin level in ALFA-Beta-actin. Can the authors explain why they conclude otherwise?

      Planned revision: This indeed is an important point and we will change the phrasing of this section to provide a more quantitative and accurate description of the western blot quantifications.

      - figure S7B: I am not ure anyone has ever reported measurement of angle of branched actin filament using epifluorescence microscopy. I would remove this panel, or the authors should explain how this measurement can be done objectively.

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      *- Figure 2F : can the authors comment on the (significant ?) lower value for FLAG-tag actin ? *

      The lower value for FLAG-tag actin has likely to do with the properties of the antibody and suitability for immunofluorescence. For reason that we do not know, we usually detect more background for the FLAG tag antibody as compared to the other antibodies/ALFA tag nanobody. Since the Pearson correlation coefficient quickly decreases with suboptimal labeling, this is likely the reason that the values for FLAG-actin are lower as compared to the other tagged actins. Importantly, in our biochemistry experiments (F/G-actin), we detect no difference between FLAG-actin and ALFA-actin indicating that it is rather the immunofluorescence and sensitive Pearson correlation analysis than the integration of actin that causes this difference.

      - line 205 "The results from these experiments show that both DIAPH1 and FMNL2 associate with ALFA-β-actin (Fig. 3D),". It is not so obvious that these formins directly interact with monomeric actin via their FH2 domains in co-immunoprecipitation assays. It might very well be mediated by the interaction with profilin, that in turn bind to the FH1 domain of formins. For me, this assay does not make a correct proof that epitope-labelled actin do not interfere with formin activity.

      Planned revision: The point that the co-immunoprecipitation does not demonstrate direct interactions between formins and actin is well taken. We, however, do not claim that this assay proofs that formin activity, or formin-based integration of actin monomers, is similar with tagged actin as compared to wildtype actin. Nonetheless, we will critically re-evaluate the relevant passages and rephrase the text to avoid any confusion.

      - figure 5C&D : both graph should use the same scale for the y-axis for easier comparison.

      Planned revision: We will adapt the scale of Fig. 5D to make it identical to Fig. 5C. Following the other suggestions of the Reviewer (and of Reviewer #1), we will also critically evaluate our normalization procedure and present those numbers in the Figures if the values turn out to be different.

      - figure 5D: I think the way the ratio is performed is misleading. Why not look at the Beta/Gamma ratio using the isoform specific antibodies used in parental cells, and show the results for ALFA-Beta-actin and for ALFA-Gamma-actin separately ?

      We kindly refer to our answer to Reviewer #1 on Page 2 for a detailed explanation on the experimental challenge of comparing the localization of wildtype and tagged actin isoforms.

      Planned revision: We will critically evaluate our normalization procedure and present those numbers in the Figures if the values turn out to be different. Furthermore, we will add a different experimental method to show that the tagged isoforms properly localize to actin-based structures. For this, we will attempt to use micropatterned cells to induce clearly define actin-bases structures and also explore the possibilities of investigating the differential localization in double-tagged cells.

      *- The limitation observed for unbranched cables in yeast that nanobody-tagged ALFA-actin does not incorporate correctly should be discussed and stressed further in the discussion, as it might prove to be a strong limitation for live-cell imaging to reliably study any type of actin networks. *

      We acknowledge the reviewer’s concern regarding the inability of ALFA-tagged actin to incorporate into yeast actin cables when NbALFA is co-expressed and will discuss this point further in the revised manuscript. We have now observed the same limitation for fission yeast actin cables as well and combined, these observations may represent a tighter control and sensitivity of yeast formins towards any perturbations in actin size (since NbALFA binds to ALFA tag with picomolar affinity). To address this issue and as also suggested by Reviewer 1, we are now creating yeast strains with inducible control of NbALFA expression under GALS/GAL1 promoters and observe the labelling of actin structures after this approach. Additionally, expression of variants of NbALFA with high dissociation rates may also allow labelling of actin cables and would be certainly worth a try in the future. A structural comparison between mammalian and yeast formins may be required to shed some light on the molecular basis of this fundamental difference.

      However, since in the absence of the nanobody, this limitation is overcome (Fig. 5E, Suppl. Fig. 14), we believe that with additional modifications and fast developments in imaging technologies, this limitation can be overcome in the future. Thus, IntAct as a labeling strategy represents an advancement over existing labelled actins with the most important aspect being the identification of the T229/A230 residue pair to be permissive for integration of various tags even as large as GFP11 fragment including a linker (26AA) (Reviewer Fig. 2). Importantly, the T229/A230 site is conserved across many organisms (such as Chlamydomonas reinhardatii, Cryptococcus neoformans, etc) and may act as a framework to study the actin cytoskeleton especially in organisms where known surrogate markers like phalloidin and Lifeact may not work or work only sub optimally.

      *Reviewer #3 (Significance (Required)): *

      *General assessment: *

      *This paper provides a new tagging strategy to monitor actin activity in cells, by specifically inserting the tag along the amino acid sequence. *

      *Advance: *

      *This is a very useful tool, as most existing available probes bind to actin in regions that are common to many other actin binding proteins. The authors provide extensive experiments to validate that tagged-actin are functional and do not perturb the actin expression level, actin network architecture nor dynamics. *

      *Audience: *

      *This research paper will be of interest to a rather broad audience (many cell biologists) that are either sutyding actin dynamics or know that actin is involved in the cell functions they study. *

      *Expertise: *

      *My expertise is in vitro actin biochemistry. *

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This paper tackles a new strategy to tag actin in cells, by identifying that incorporation of a tag of moderate size in subdomain 4 of actin minimally affects actin dynamics in cells, and does not perturb its interaction with known partners, as observed in pull-down assays.

      Major comments:

      The paper is interesting and experiments are convincing.

      My main concerns are the following :

      • Varland et al, is reporting a phosphorylation on Thr229 : I think the authors should mention and discuss this potential PTM that could be affected in IntAct.
      • The sequence in subdomain 4 (the alpha helix containing T229A230) is extremely conserved in animals, as well as in between the 6 human actin isoforms. This usually indicates a strong selection pressure on the residues. I think the authors should discuss how surprising it is that the T229A230 position can accomodate various tags while it is probably the place of interaction with other proteins and is playing an important role in the mechanical structural integrity of the actin itself.
      • It is now well established that actin plays active and important roles in the nucleus : is ALFA-actin correctly translocated to the nucleus ?
      • OPTIONAL: one may regret that there is no classical in vitro assays, such as pyrene assays to assess some kinetcis parameters on epitope-tagged actins. I guess this would make the paper a bit too large. Although, it will prove useful to better understand how much formin activity is affected (see below)

      Minor comments:

      Below are points that could be addressed by the authors to improve the manuscript readability and highlight some important points that are sometimes missing or are not properly discussed :

      • line 40 "...but the distinct N-terminal epitope is not available under native conditions preventing" is a bit too obscure. Can the authors say clearly what is meant by 'native conditions' ?
      • figure 1A : make a clearer correspondance between the number shown in panel A and the amino acid numbers displayed in panel C and G.
      • figure 1A : it could be informative to indicate subdomains in this panel.
      • figure 1C : normalized correlation cell : I am not sure I understand how the normalization of the Pearson coefficient is done. It is therefore not clear how can it >1 or >-1 ? This should be clearly explained in the method section of the paper.
      • figure S4 : comes a bit too early when ALFA-actin has not been yet introduced in the main text. Please, reposition this part or provide data with the FLAG-tag version.
      • section starting line 121 : this section should be better motivated = Why are different tags being tested ? This comes later in the discussion, but the reader fails at following the reasoning/motivation here.
      • figure 2D, line 145 "We also evaluated actin protein expression in the homozygous ALFA-β-actin cells and this showed that the total amount of β-actin was slightly lower in the ALFA-β-actin cells compared to parental HT1080 cells (Fig. 2C-D)." 'Slightly' is not a very quantitative nor accurate term. please rephrase. Besides, a statistical test for the paired data would also be informative. Besides, data in figure S6B-D indeed show a correlated increase in the expression of Gamma-actin that compensate for the decrease in the Beta-actin level in ALFA-Beta-actin. Can the authors explain why they conclude otherwise ?
      • figure S7B: I am not ure anyone has ever reported measurement of angle of branched actin filament using epifluorescence microscopy. I would remove this panel, or the authors should explain how this measurement can be done objectively.
      • Figure 2F : can the authors comment on the (significant ?) lower value for FLAG-tag actin ?
      • line 205 "The results from these experimentsshow that both DIAPH1 and FMNL2 associate with ALFA-β-actin (Fig. 3D),". It is not so obvious that these formins directly interact with monomeric actin via their FH2 domains in co-immunoprecipitation assays. It might very well be mediated by the interaction with profilin, that in turn bind to the FH1 domain of formins. For me, this assay does not make a correct proof that epitope-labelled actin do not interfere with formin activity.
      • figure 5C&D : both graph should use the same scale for the y-axis for easier comparison.
      • figure 5D: I think the way the ratio is performed is misleading. Why not look at the Beta/Gamma ratio using the isoform specific antibodies used in parental cells, and show the results for ALFA-Beta-actin and for ALFA-Gamma-actin separately ?
      • The limitation observed for unbranched cables in yeast that nanobody-tagged ALFA-actin does not incorporate correctly should be discussed and stressed further in the discussion, as it might prove to be a strong limitation for live-cell imaging to reliably study any type of actin networks.

      Significance

      General assessment:

      This paper provides a new tagging strategy to monitor actin activity in cells, by specifically inserting the tag along the amino acid sequence.

      Advance:

      This is a very useful tool, as most existing available probes bind to actin in regions that are common to many other actin binding proteins. The authors provide extensive experiments to validate that tagged-actin are functional and do not perturb the actin expression level, actin network architecture nor dynamics.

      Audience:

      This research paper will be of interest to a rather broad audience (many cell biologists) that are either sutyding actin dynamics or know that actin is involved in the cell functions they study.

      Expertise:

      My expertise is in vitro actin biochemistry.

  4. Aug 2023
    1. The assumption is that the Grand C anyon is a remarkably interesting and beautifulplace and that if it had a certain value P for Cárdenas, the same value P may betransmitted to any number of sightseers—

      Not everyone values the same exact things. Each human being sees, feels, and reacts differently. I think Percy is trying to explain that we as humans value different things based on the relativity that it has to us. For some they value the Grand Canyon because that's what they like and some value insulin because that's what they need.

    2. As Mo unier said, the person is not something one can stud y and p rovid e for; he issomething one struggles for. But unless he also struggles for himself, unless he knowsthat there is a struggle, he is going to be just what the planners think he is.

      I think stereotyping is also another way to think of this. We can't always assume we know someone by how they present themselves because then when we try to know them, our preconceived notions are thrown off. If we don't try to break out of the mold, break out of comfort, then we may always be under control of stereotypes and what others assume of us.

    1. Many students will indeed respond to a scolding by. behaving better, but for others, scolding may be a reward for misbehavior that actually increases it.

      This concept was something I felt I related to personally while reading. I have spent the last 4 years working as a full time paraprofessional/teaching assistant...3 years in first grade and one year in kindergarten. Reading this segment brought me back to my time spent in a classroom, and I almost immediately thought of a particular student who fit this scenario very well. This student would at times act out and of course, wanting to maintain classroom rules, we would correct this behavior. At first glance many would think this was the appropriate thing to do. But as the year went on we discovered the more we responded to these negative behaviors, the more he did them. Psychologically, he knew that if he misbehaved, he would get a reaction from us. He didn't seem to care whether it was a negative or positive one, he was just initiating behaviors he knew would get a reaction, which this portion of text highlights and I enjoyed being able to read something that I felt I could personally connect to.

    2. Some people think that good teachers are born that way, Outstanding teachers sometimes seem to have a magic, a charisma that mere mortals could never hope to achieve, Yet research has begun to identify the specific behaviors and skills that make a “magic” teacher

      I chose to comment on this particular portion of text, because I do not believe there is such a thing as a good or bad teacher. I believe instead of using the term "good teacher" we should instead practice labeling educators as "knowledgable". For example, instead of saying Miss McGuire is a really good teacher, we could say she is a very knowledgable teacher. Everyone has a different definition of what a good teacher is, and by looking at how knowledgable they are on current teaching practices or how knowledgable they are about successful classroom management skills, it separates the idea of being good vs bad simply because they don't teach something the way their observer or peers may.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank both reviewers and the editor for their time and effort in carefully reviewing and comprehending our manuscript. We are grateful for their thorough assessment, as well as the insightful questions and suggestions they have provided. We have taken into account the questions and comments raised by the reviewers, and we have incorporated the necessary revisions accordingly. In the following pages the reviewers’ comments are italicized. Our replies are in normal script.

      In addition to revisions suggested by reviewers we also added a new summary schematic (Fig 8) and minor changes to acknowledgments.

      Reviewer 1

      This is a very strong study with few concerns. Regarding DN1+ T cell function, the authors assessed IFN-γ and activation markers, but it is unclear if the cells are polyfunctional (produced high levels of other cytokines at 6 weeks) or if there were changes in the humoral response (serum Ab titers or size/ number of germinal centers.)

      Thank you for your thorough assessment of our work and your kind comments.

      a. We observed a decreased IFN-γ and TNF-α production in antigen experienced DN1 T cells compared to naïve DN1 T cells, which is consistent with findings in Tfh cells.

      b. We tested for anti-MA IgM and IgG production but did not observe an increase in these antibodies in the vaccinated setting. It is possible that additional inflammatory stimulation, such as from an adjuvant or infection, may be necessary to trigger sufficient antibody level for detection using ELISA.

      c. We did not measure the number or size of germinal centers in this study, but future investigations could explore this aspect.

      Reviewer 2

      1. Authors elaborate the introduction solely highlighting the relevance of antigen persistence in the context of vaccination. However, it is well known that several mycobacterial antigens (Lipids and proteins) can cause detrimental responses when overexposed to the immune system. In this regard, it would be appropriate to introduce the possibility of the occurrence of exhaustion when prolonged exposure to antigens is happening, which is the main theme of this paper.

      Thank you for bringing these points to our attention. We have added a paragraph in the discussion section (page 15-16, line 372-386), addressing the implications of our findings in relation to exhaustion in the context of antigen persistence during chronic viral infections. We have also provided an example involving the lipid trehalose 6,6’-dibehenateled (TDM), a known virulence factor for Mtb, which has been utilized in several subunit vaccines without demonstrating significant toxicity.

      1. Authors need to provide more information about the source of MA. It is briefly mentioned in the materials and methods section that it was obtained from Sigma. If that is the case, it would be ideal to show the integrity of the polysaccharide in term of balance and abundance between different MA species.

      We obtained M. tuberculosis MA from Sigma, which comprises α-, keto-, and methoxy MA forms with an average combined lipid tail length of 80 carbons. MA-specific T cells preferentially recognize these three forms of MA have been identified in humans. We have provided more detailed information regarding the MA in the Materials and Methods section (page 17, line 429-431).

      1. Building up on the previous comment, MA is a complex mixture of polysaccharides including multiple lengths of fatty acids and modifications. Could the authors comments on the potential variability of MA structure and potential impact on immune responses?

      The binding capacities of Group 1 CD1-restricted T cells can be influenced by various factors, including specific head groups, lipid tail length, and structure of the lipid tail. Notably, DN1 T cells have been shown to have higher binding affinities towards keto and methoxy MA, while displaying weaker binding to α-MA (Van Rhijn et al., 2017, Eur. J. Immunol. 47:1525). In our study, we successfully utilized a mixture of MA to activate DN1 T cells, indicating that the required subtypes of MA were present in sufficient quantities to elicit this activation. In future investigations focusing on the polyclonal immune response, incorporating a mixture of MA and possibly other Mtb lipid antigens will enable a broader spectrum of T cell activation. This, in turn, is expected to enhance the overall effectiveness and robustness of protection in challenge experiments.

      1. How do the authors explain the lack of stimulation of cell proliferation induced by MA-PLGA formulation? Does this result contradict previous findings?

      This study represents the first instance of utilizing PLGA as a delivery system for a lipid antigen via a pulmonary vaccine route, despite its previous applications in numerous other vaccine formulations. Therefore, we do not think our findings contradicts any existing research in the field. It is worth noting that the immunogenicity of PLGA can be influenced by the specific polymer chemistry and formulation, which may account for potential variations in the observed effects. We have added additional text to the discussion (page 13, line 310 – 313) to address this point.

      1. Fig 3. Authors switch to IT administration simply arguing against the limitation of IN delivery regarding its low volume. However, administration via IN could be done in an iterative manner. According to this change, this reviewer asks whether the performance of MA-PLGA could now be comparable to BCN-MA using IT instead.

      PLGA possesses an inherent background adjuvant effect, which may not be ideal for precisely stimulating group 1 CD1-restricted T cells, as a considerable proportion of these T cells exhibit some level of autoreactivity (Li, et al, 2011, Blood 118:3870, De Lalla et al., 2011, Eur. J. Immunol. 41:602; de Jong et al, 2010, Nat. Immunol. 11:1102). Notably, our observations revealed that blank PLGA-NP exerted a significant stimulatory effect on both mouse (DN1) and human (M11) MA-specific T cells (Fig. 2A-D). This underscores the advantage of the BCN system, which lacks detectable adjuvant effects and enables a more controlled, dose-dependent augmentation of T cell responses with increasing concentrations of loaded MA. Therefore, we did not further evaluate the impact of PLGA-MA using the IT route of vaccination.

      1. What would be the reasons of the no role of encapsulating NP in the persistence of MA?

      In this study, we have provided evidence to support the notion that encapsulation plays a role in antigen persistence, as demonstrated in Fig. 5A-C. Specifically, we directly compared the persistence of MA when delivered encapsulated in BCNs versus without encapsulation in BCNs, using DC pulsing and IT vaccination as the delivery methods. Our results indicate that at 6 weeks post-vaccination, MA encapsulated in BCNs can activate DN1 T cells, while free MA does not. These findings may initially appear to be contradictory to those depicted in Fig. 5D-F, where antigen persistence is observed following vaccination with attenuated Mtb. However, we propose that the attenuated Mtb bacteria may function similarly to nanoparticles by encapsulating and containing MA, thereby facilitating its persistence within the host. We appreciate the opportunity to clarify these points (page 15, line 364-367). Encapsulation within PEG-PPS NP may also contribute to two additional mechanisms. First, we have demonstrated that PEG-PPS NPs target myeloid cell populations (Burke et al., 2022, Nat. Nano. 17:319), such as alveolar macrophages, that can serve as antigen persistence depots as well as present CD1b/MA complexes on their surfaces. NPs allow more efficient delivery to these cells, whereas otherwise the lipid would bind to albumin, HDL, LDL, and other lipid carriers in blood for a broader, non-specific biodistribution, which would include cells less efficient at antigen persistence or presentation. Second, we previously demonstrated that the BCN nanostructure is highly stable within cells, supporting a slow intracellular release (Bobbala et al., 2020, Nanoscale 12:5332). This could assist with a more sustained presentation of lipid antigen by targeted cells in contrast to free form lipid or NPs (like PLGA) that rapidly degrade within cells. Indeed, low levels of fluorescently tagged BCNs were still detectable 6 weeks post-vaccination (Fig. 6B). Our future studies will further investigate this hypothesis.

      1. Authors need to discuss to what extent the MA location into AM is route dependent.

      The localization of MA within alveolar macrophages (AMs) in the lung is likely specific to intratracheal (IT) vaccination. Therefore, mice vaccinated subcutaneously (SC) or intravenously (IV) may possess distinct antigen persistence depots. We have made modifications to the discussion section to further emphasize this point (page 15, line 359-364).

      1. Also, AM are programmed to sustain low immune responses because of their unique location in the lung. In fact, Mtb uses this to replicate while immune response is mounted. In this regard, accumulation of MA into this compartment may not be relevant for the overall immune response. In other words, what would be the contribution of this population to the T cell activation?

      It is likely that AMs primarily function as antigen depots and do not directly contribute to the activation of DN1 T cells. This assertion is supported by our findings, as co-culturing AMs with DN1 T cells alone did not result in T cell activation (Fig. 6E). However, we observed that the presence of hCD1Tg-expressing bone marrow-derived dendritic cells was necessary for DN1 T cell activation in vitro, which likely reflects a similar phenomenon occurring in vivo.

      1. Could the T cells responses measured be due to the reduced fraction of DC loaded with BCN-MA at initial time points?

      Regarding the T cell response observed in Fig. 5A-C, where we used DCs to deliver either free MA or MA-BCN, we took steps to address potential differences in loading capacity between the two at initial time points. Specifically, DCs were pulsed with a concentration of 10 𝜇g/mL for free MA and 5 𝜇g/mL of MA-BCN (the figure legend has been modified to clarify this point, page 37, line 962 - 963). To ensure approximate equivalence in loading, we examined the immune response one week after vaccination and found no statistically significant difference between the two methods.

    1. Reviewer #1 (Public Review):

      Murphy, Fancy and Skene performed a reanalysis of snRNA-seq data from Alzheimer Disease (AD) patients and healthy controls published previously by Mathys et al. (2019), arriving at the conclusion that many of the transcriptional differences described in the original publication were false positives. This was achieved by revising the strategy for both quality control and differential expression analysis. I believe the authors' intention was to show the results of their reanalysis not as a criticism of the original paper (which can hardly be faulted for their strategy which was state-of-the-art at the time and indeed they took extra measures attempting to ensure the reliability of their results), but primarily to raise awareness and provide recommendations for rigorous analysis of sc/snRNA-seq data for future studies.

      STRENGTHS:

      The authors demonstrate that the choice of data analysis strategy can have a vast impact on the results of a study, which in itself may not be obvious to many researchers.

      The authors apply a pseudobulk-based differential expression analysis strategy (essentially, adding up counts from all cells per individual and comparing those counts with standard RNA-seq differential expression tests), which is (a) in line with latest community recommendations, (b) different from the "default options" in most popular scRNA-seq analysis suites, and (c) explains the vastly different number of DEGs identified by the authors and the original publication. The recommendation of this approach together with a detailed assessment of the DEGs found by both methodologies could be a useful finding for the research community. Unfortunately, it is currently not fully substantiated and is confounded with concurrent changes in QC measures (see weaknesses).

      The authors show a correlation between the number of DEGs and the number of cells assessed, which indicates a methodological shortcoming of the original paper's approach (actually, the authors of the original paper already acknowledged that the lesser number of DEGs for rare cell types was a technical artefact). To be educational for the reader it would be important to provide more information about the DEGs that were "found" and those that were "lost". Given vast inter-individual heterogeneity in humans, it is likely that the study was underpowered to find weaker differences using the pseudobulks (Fig. 1B shows that only genes with more than 4-fold change were found "significant").

      All code and data used in this study are publicly available to the readers.

      WEAKNESSES:

      The authors interpret the fact that they found fewer DEGs with their method than the original paper as a good thing by making the assumption that all genes that were not found were false positives. However, they do not prove this, and it is likely that at least some genes were not found due to a lack of statistical power and not because they were actually "incorrect". The original paper also performed independent validations of some genes that were not found here.

      I am concerned that the only DEGs found by the authors are in the rare cell types, foremost the rare microglia (see Fig. 1f). It is unclear to me how many cells the pseudo-bulk counts were based on for these cells types, but it seems that (a) there were few and (b) there were quite few reads per cells. If both are the case, the pseudobulk counts for these cell populations might be rather noisy and the DEG results are liable to outliers with extreme fold changes.

      The authors claim they improved the quality control of the dataset. While I do not think they did anything wrong per se, the authors offer no objective metric to assess this putative improvement. This is another major weakness of the paper as it confounds the results of the improved (?) differential analysis strategy and dilutes the results. I detail this weakness in the two following points:

      Removing low-quality cells: The authors apply a new QC procedure resulting in the removal of some 20k more cells than in the original publication. They state "we believe the authors' quality control (QC) approach did not capture all of these low quality cells" (l. 26). While all the QC metrics used are very sensible, it is unclear whether they are indeed "better". For instance, removal with a mitochondrial count of <5% seems harsh and might account for a large proportion of additional cells filtered out in comparison to the original analysis. There is no blanket "correct cutoff" for this percentage. For instance, the "classic" Seurat tutorial https://satijalab.org/seurat/articles/pbmc3k_tutorial.html uses the 5% threshold chosen by the authors, an MAD-based selection of cutoff arrived at 8% here https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html, another "best practices" guide choses by default 10% https://bioconductor.org/books/3.17/OSCA.basic/quality-control.html#quality-control-discarded, etc. Generally, the % of mitochondrial reads varies a lot between datasets. As far as I can tell, the original paper did not use a fixed threshold but instead used a clustering approach to identify cells with an "abnormally high" mitochondrial read fraction. That also seems reasonable. Overall, I cannot assess whether the new QC is really more appropriate than the original analysis and the authors do not provide any evidence in favor of their strategy.

      Batch correction: "Dataset integration has become a standard step in single-cell RNA-Seq protocols" (l. 29). While it is true that many authors now choose to perform an integration step as part of their analysis workflow, this is by no means uncontroversial as there is a risk of "over-integration" and loss of true biological differences. Also, there are many different methods for dataset integration out there, which will all have different results. More importantly, the authors go on "we found different cell type proportions to the authors (Fig. 1a) which could be due to accounting for batch effects" but offer no support for the claim that the batch effects are indeed related to the observed differences. An alternative explanation would be a selective loss/gain of certain cell types during quality control. The original paper stated concerns about losing certain cell types (microglia, which do not seem to be differentially abundant in the original paper / new analysis).

      Relevant literature is incompletely cited. Instead of referring to reviews of best practices and benchmarks comparing methods for batch correction and or differential analysis, the authors only refer to their own previous work.

      Due to a lack of comparison with other methods and due to the fact that the author's methodology was only applied to a single dataset, the paper presents merely a case study, which could be useful but falls short of providing a general recommendation for a best practice workflow.

      APPRAISAL:

      The manuscript could help to increase awareness of data analysis choices in the community, but only if the superiority of the methodology was clearly demonstrated. The recommended pseudobulk differential expression approach along with the indication of drastic differences that this might have on the results is the main output of the current manuscript, but it is difficult to assess unequivocally how this influenced the results because the differential analysis comes after QC and cell type annotation, which have also been changed in comparison to the original publication. In my opinion, the purpose of the paper might be better served by focusing on the DE strategy without changing QC and instead detailing where/how DEGs were gained/lost and supporting whether these were false positives.

    1. Reviewer #1 (Public Review):

      Summary: This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.<br /> While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).<br /> Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Minor Comment :

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.<br /> An example of the Bm-mamo locus annotation, can be found at : https://www.ncbi.nlm.nih.gov/gene/101738295<br /> RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

  5. cqpress-sagepub-com.lmc.idm.oclc.org cqpress-sagepub-com.lmc.idm.oclc.org
    1. Proponents see two main advantages: One is that police, as generalists, are not trained to respond to every type of domestic or mental health crisis. Having others carry part of the load should free officers up to respond when and where they are really needed, such as violent situations, Travis says.Cherelle Parker, a City Council member in Philadelphia, agrees, saying: “We're not asking police officers to become psychiatrists, psychologists and therapists — we can get those who are experts in those areas to address those issues. When mental and behavioral health is needed, we now have another vehicle that we can use.”

      I think it is good we are realizing police can not do everything just like a DR or a nurse does not do everything. yes you can have a general practitioner but there is still tasks and jobs they don't do. I feel this same strategy with police would allow them to specialize in certain cases where it may be needed. or have others who are more proficient complete those tasks

    1. The social media landscape continues to evolve dramatically, with new social networks like TikTok entering the field as well as existing platforms like Instagram and Telegram gaining markedly in popularity among young audiences. As social natives shift their attention away from Facebook (or in many cases never really start using it), more visually focused platforms such as Instagram, TikTok, and YouTube have become increasingly popular for news among this group. Use of TikTok for news has increased fivefold among 18–24s across all markets over just three years, from 3% in 2020 to 15% in 2022, while YouTube is increasingly popular among young people in Eastern Europe, Asia-Pacific, and Latin America.

      I remember when YouTube had to do something about fake news after the big event on January 6th. They made rules to take down videos with false information. That's good because we want to know the right stuff. Also, TikTok gives users content creation freedom and more freedom of speech, and we can make our own videos and say what we think. But sometimes, that can also be a problem. Since we can post anything, some things might not be true. Like, gossip about famous people or even important things like politics. TikTok does not always check if things are real before they spread however, I do sometimes see warning on the video if the video may cause bodily harm if tried to perform at home.

    2. Here, we aim to unpack these new behaviours as well as to dismantle some broad narratives of ‘young people’. Instead, we consider how social natives (18–24s) – who largely grew up in the world of the social, participatory web – differ meaningfully from digital natives (25–34s) – who largely grew up in the information age but before the rise of social networks – when it comes to news access, formats, and attitudes.2 These groups are critical audiences for publishers and journalists around the world, and for the sustainability of the news, but are increasingly hard to reach and may require different strategies to engage them.

      This part of the article really caught my attention. I'm 28, I remember a time when social media was almost nonexistent. I think it was really interesting watching social media platform become a staple. I can definitely relate to the sentence, "... (digital natives) grew up in the information age...", because that how I viewed the online word. If I wanted to learn about something new or keep up with topics, I had to search and dig through many websites to find one that I, not only liked but also, trusted.

    3. Here, we aim to unpack these new behaviours as well as to dismantle some broad narratives of ‘young people’. Instead, we consider how social natives (18–24s) – who largely grew up in the world of the social, participatory web – differ meaningfully from digital natives (25–34s) – who largely grew up in the information age but before the rise of social networks – when it comes to news access, formats, and attitudes.2

      Social natives may be more interactive in social media and more inclined to socialize on their devices instead of in person. This is my hypothesis because I see a lot of younger people Facetiming and utilizing social media more than I do, and even though I could be as active as them, I am just not as inclined to participate on a more personal account. I would be more inclined to participate promoting a company I work for on social media than interacting with it as much as the younger generation for myself. I think that digital natives may be more skilled at media literacy because they have some background knowledge on what existed before the floods of mass misinformation from social media on platforms like Twitter and Facebook- while seeing how misinformation was perpetuated in the media for the generation older than them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their time, the positive reviews and the useful comments. We answer below and explain the changes made to the manuscript. The comments of the reviewers are in italics.

      Reviewer #1

      1. 'For GWAS, the strains that were fertile after 20 generations were considered non-Mrt.' One aspect of Fig 1D that could be clarified are the dots at generation 21. If these represent strains that were always fertile at generation 21, then perhaps give these a different color to indicate that sterility was never observed?

      Response: This is a good idea. We added colors in Figure 1, which makes it clearer.

      We also provide a different color for surviving replicates in all relevant figures.

      1. 'The mean Mrt values of strains ranged from sterile at 3 generations to fertile after 20 generations at 25°C, with a skewed distribution toward high values (Figure 1B).' Based on Table S2, part of the explanation for this skewed distribution in later generations is that some strains became sterile rapidly for some blocks, whereas the same strain did not become sterile in other blocks. For example, JU1200, JU360, PB303. I suggest providing a second color for Fig. 1D for strains that sometimes displayed sterility and sometimes did not.

      __Response: __We now colored the isolates that never became sterile, with the same color code as in panel B. Because we stopped the scoring at G20 and code fertility at G20 as '21', those with a mean below 21 show some sterility in at least one case.

      Because the number of generations at which we stopped the phenotyping (20) is arbitrary, the fact a line stayed fertile at 20 generations in one replicate is not very meaningful, especially considering that the number of replicates is not the same for all strains. The key point of the variance graph is to show that the strains with the most variance are those with high but

      For those that were sometimes fertile and sometimes sterile, I suggest creating a graph in Figure 1 that shows generations at sterility or lack of sterility, color coded by block. This will allow the significance of strains with high generation Mrt values to be better appreciated for readers who do not look at the supplementary table.

      __Response: __Yes, we added this graph in Figure S1. This is indeed useful.

      1. The GWAS section could benefit from a simple explanation of the premise of GWAS for non-specialist readers.

      __Response: __Yes, we added: "A genome-wide association study (GWAS) is a genetic mapping that uses the natural diversity of a panel of organisms of a given species to test for statistical independence between the allelic state of polymorphic markers and the phenotype of interest (Andersen and Rockman 2022). A statistical association between the marker and the phenotype indicates that a polymorphism tightly linked to the marker in the data (i.e. in linkage disequilibrium with it) causes the variation in phenotype. For statistical reasons, GWAS can only detect polymorphisms that are at intermediate frequencies in the panel, i.e. cases where both alleles occur at frequencies higher than 5%. We only used such polymorphisms in the GWAS (see Methods)."

      And further down:

      "To diminish the multiple testing burden, the initial analysis in Figure 1E used a restricted set of markers, after pruning those that were in high linkage to each other."

      1. One problem might be that the Mrt phenotype is widespread among wild strains. To the authors' credit, they consider results observed in different laboratories as valid, even when the results do not agree. If the Mrt phenotype is influenced by the environment, then some laboratory environments might result in 'false negative' Mrt results that could be ignored in favor of positive results from another lab that appear strong. Might focusing on strains with a set of strong positive results from one lab allow the authors to draw stronger GWAS conclusions?

      2. The authors' perform GWAS based on the variance of the Mrt phenotype data. Would the GWAS data be more illuminating if the authors only considered strains that become sterile fairly rapidly, within 10 generations. The authors might then have a second category that included strains that become sterile from generation 11-20. If the genetic basis for the Mrt phenotypes is the same, then GWAS of strains that become sterile in less than 10 generations might yield similar peaks as GWAS for strains that become sterile between generations 11-20.

      __Response: __These two comments are strongly related so we answer them together. Note that the GWAS is not mapping the variance values but the Mrt values themselves.

      We actually initially only used block 1 (a single replicate, all strains performed in parallel in our laboratory) and also detected the chromosome III association using a categorical variable (threshold at 11), but decided to show the results with all data to maximize power, taking into account the generation value and block effects.

      We investigated other ways to code the data (e.g. categorically) and removing the strains of the most variable middle category, as proposed by the reviewer. This changed the p values and the rank of the markers on chromosome III but not the overall result.

      In summary, we did a variety of tests, which pointed to chromosome III, a region that was validated using crosses (Figure 2).

      Note that in the revision, we updated the GWAS plot and fine mapping table as we noticed a few problems in our previous mapping. 1) We removed 3 isolates that were classified in Lee et al. 2021 as divergent. 2) We included strains that had been lost in the pipeline because their names did not match CeNDR isotypes. This increased the significance of the chromosome III peak.

      __Response: __There was no comment 6.

      1. 'We did not investigate whether a second locus present in JU775 on the right arm of Chr III might have a lesser effect.'

      __Response: __We are not sure what the reviewer meant. Considering the difficulties with the stronger effect locus, we did not try to study loci with a weaker effect.

      1. It might be interesting to test the memory of growth on beneficial bacteria on JU4134, which had a Mrt phenotype that was strongly suppressed by the beneficial bacteria.

      __Response: __We agree that testing other strains would be useful but given the duration of such experiments (30 generations and two weeks of preparation before), we respectfully decline to perform this experiment that does not seem strictly necessary.

      1. The Mrt phenotype of mutants in small RNA inheritance and histone modifying enzymes 'appears however distinct from that of the prg-1/piwi mutant (for which the cause of sterility is debated), especially the latter does not show temperature dependence and is suppressed by starvation.' While it is true that the cause of sterility is debated for the prg-1/piwi mutant, this mutant is defective for small RNA silencing and likely has parallels with some defects in histone modifying enzymes. Anecdotal reports suggest that starvation might affect the Mrt phenotype or longevity of histone modifying enzyme mutants. Moreover, the cause of sterility is not clear for small RNA inheritance and histone modifying enzyme mutants. It is fair to say that the distinction between temperature-sensitivity or lack of temperature sensitivity of small RNA mutants is not understood. Could the authors please comment here about whether any of the wild strains display sterility at 20°C.

      __Response: __The temperature-dependence of the wild isolates is progressive between 20-25°C. We previously showed that strains with a very strong Mrt phenotype, such as QX1211, can display sterility at 20°C (Figure 1B in Frézal et al. 2018). However, its Mrt phenotype is still temperature-dependent as the sterility occurs much earlier at 25°C.

      1. If intracellular bacteria are simply somatic, then how is it that they are transmitted to progeny. If they are released into the environment and then consumed by hatched larvae, this is soma-to-soma transmission.

      __Response: __These microsporidia (which are eukaryotes related to fungi) are indeed transmitted horizontally. To make this clear, we added: "colonizing its intestinal cells and being transmitted horizontally via defecation and ingestion of spores". The soma-to-germline interaction concerns the effect of microsporidia on germline maintenance.

      Minor: 1. 'We measured the mortal germline (Mrt) phenotype'. Mortal Germline (Mrt)

      __Response: __It is unclear as to whether phenotypes start with a capital letter when they are in full words. We did write phenotypes in previous works with a capital letter but have changed because C. elegans nomenclature rules (https://cgc.umn.edu/nomenclature) suggest that they should not: "Phenotypic characteristics can be described in words, e.g., dumpy animals or uncoordinated animals." For the mortal germline phenotype in particular, we find several ways to write it in articles (with 0, 1 or 2 capital letters, including the three reviewers). We are happy to change it if required.

      Reviewer #2

      Major comments: The authors claimed that the variants causing Mrt exist at intermediate frequency in the natural population but the evidence supporting this claim is rather limited.

      __Response: __Thank you for this comment as it helped us clarify the manuscript.

      To better explain the notion of intermediate frequency in the GWAS, we added an explanation of the principle of the GWAS (see above) and again in the Discussion: "The intermediate frequency of the candidate alleles derives from the GWAS approach, which cannot detect rare alleles, such as set-24, that are present in a single strain of the dataset."

      We also illustrated the frequency by adding a plot (Fig. 1F) showing the association of the most associated candidate SNP, with a visual depiction of the frequency. We further added in Results: "For SNPs with a high significance (p-4) in the fine mapping, the frequency of the Mrt associated allele was comprised between 21 and 41% in our GWAS strain set (Table S3); as an example, the Mrt allele of the associated SNP shown in Figure 1F (III:4677491) displayed a frequency of 29% in the restricted strain set. Over the global wild strain set with genotypes at CeNDR in 2020, these numbers are 17-58% and 39%, respectively. "

      To strengthen the claim, the authors should examine the distribution and frequency (perhaps coupled with phylogenetic analysis) of the Ch III haplotype in the wild isolates. The authors should also examine the GWAS peak for the signature of balancing selection (e.g., dN/dS ratio).

      __Response: __Thank you for this comment. The different associated SNPs in Table S3 differ in their allele frequency (Table S3), hence they belong to different haplotypes. We added a supplementary Figure S2 with an analysis of the haplotype structure. Those at a low frequency (around 20%) belong to the same haplotype (e.g. JU775 and MY10) but some associated alleles are present in more haplotypes (40-50%), such as JU1793. Even if we neglect recombination, the history of mutations in the region is complex and there is not a single associated haplotype. We now show the genotypes of these different haplotypes at all SNPs in Table S3. We also added Table S4 that shows the co-occurrence of relevant haplotypes in local populations.

      Concerning tests of balancing selection, without knowing the causal polymorphism and linked haplotype, this is far reaching. We only feel confident to say that the causal polymorphism(s) is present at a significant frequency. We added however the fact that irrespective of which polymorphisms are causal, both alleles were found to coexist locally.

      Results: relevant text was added at the end of the GWAS section.

      Discussion: "The co-occurrence of relevant chromosome III haplotypes on multiple continents and in local populations (Table S4) is suggestive of balancing selection; however, a linked locus other than that causing the Mrt phenotype may be involved."

      Does JU775 carry polymorphisms in genes that are known to be involved in Mrt? These genes may genetically interact with the Ch III variant, as suggested by the partial penetrant phenotypes of the introgressed lines. It would be helpful to have a table summarize the variation in these genes.

      __Response: __It is difficult to deduce much from a genomic variant analysis, so we refrain from showing tables of polymorphisms beyond that used for the fine GWAS mapping in Table S3. For example, a non-synonymous SNP may or may not alter protein activity and cis-regulatory elements are difficult to assess. Moreover, an obviously null allele may be compensated by another polymorphism in the background. The JU775 alleles and bam files are publically available from CeNDR (Erik Andersen's lab): https://caendr.org/data/data-release/c-elegans/latest

      It is curious to me that for experiments with HT115, the expression of the RNAi vectors was induced with IPTG. Is this step necessary? It is known that even the backbone of L4440 could trigger a non-specific RNAi response (PMID: 30838421). I wonder if activating exogenous RNAi response is required for Mrt rescue.

      __Response: __Indeed: this experiment was initially aimed at testing RNAi sensitivity of JU775, thus IPTG was added on the plate (Figure 7, panel B). We therefore repeated the memory experiment with OP50 and without IPTG, with a similar result (Figure 7, panel A).

      In figure 7, it appears that the worms transferred from MG1655/HT115 to OP50 showed an even stronger rescue (higher Mrt value) than the ones constantly on MG1655/HT115. This suggests to me that fluctuations in food composition may strongly affect epigenetic inheritance. Please clarify as this is very interesting, if true.

      __Response: __Note: This answers the comment above (IPTG is not required).

      We indeed noticed this strong rescue but do not wish to make a point as we did no attempt to reproduce this result in the exact same conditions. The experiment in panel B does not show this effect.

      Optional - Numerous studies have shown that SKN-1 regulates metabolism in response to food composition and availability (PMID: 23040073). Additionally, some recent studies have indicated a role of SKN-1 in epigenetic inheritance triggered by exogenous RNAi. In particular, SKN-1 promotes stress-induced epigenetic resetting (PMID: 33729152). I wonder if SKN-1 modulates Mrt based on bacterial diet.

      __Response: __We tested skn-1b/c hypomorphic and gain-of-function mutants in the N2 background on E. coli OP50 and did not see an effect of the skn-1 allele.

      Minor comments Line 47: typo "...they defined..."

      __Response: __We did mean "thus defined".

      Line 100-101: weird sentence structure. Please consider rephrasing.

      __Response: __We simplified to "a wild C. elegans strain can keep the memory of its culture on a suppressing bacterial strain."

      Line 138-139: I don't quite understand what "intermediate-frequency chromosome III alleles" means here. Some SNPs were found in Ch III 4-6Mb? Please expand.

      __Response: __We rephrased to: "because this isolate carries the chromosome III alleles associated in the GWAS analysis with the Mrt phenotype (Table S3)."

      Line 213 - it was unclear to me why the assay was performed at 23C instead of 25C. I later learned in the method section that microsporidia cannot be cultured at 25C. I think it will be helpful to add that information when microsporidia is introduced to improve clarity.

      __Response: __We added: " We used a temperature of 23°C because these microsporidia kill C. elegans too rapidly at 25°C."

      Reviewer #3.

      Minor points 1. Could the authors please define "experimental blocks"

      __Response: __We added the following sentence in Results: "Each Mrt assay started at a certain date constitutes an experimental block."

      1. Legend to supplementary snp table should be completed: define AF, impact, modifier, moderate, AA1, AA2...

      __Response: __This is added in the first sheet of the table. We also simplified the table and removed some of these columns.

      1. Please define "intermediate-frequency allele"

      __Response: __We added in Results: "GWAS can only detect polymorphisms that are at intermediate frequencies in the panel, i.e. cases where both alleles occur at frequencies higher than 5%." We also added below: " "For SNPs with a high significance (p-4) in the fine mapping, the frequency of the Mrt associated allele was comprised between 21 and 41% in our GWAS strain set (Table S3); as an example, the Mrt allele of the associated SNP shown in Figure 1F (III:4677491) displayed a frequency of 29% in the restricted strain set."

      1. Figure 7 legend: Authors should be more specific in describing the figure: After 10 (A panel), 13 or 20 generations (B panel) on the K-12 strain... What is E. coli OP50 start 'G10'? the 15° stock?

      __Response: __We changed to: " After 10 (A panel), 13 or 20 generations (B panel) on the K-12 strain" and added some details in:

      "A control from a 15°C culture maintained without starvation ("15°C stock") was bleached in parallel (labeled "E. coli OP50 start "G10" " in the graph of panel A)."

      Optional: Did the authors attempt to rescue the Mrt phenotype with individual metabolites (eg Vit B12...)? These are not straight forward experiments and most likely part of a future study.

      __Response: __We indeed tested several metabolites that are known to differ in C. elegans raised on E. coli OP50 versus K-12 strains for their effect on the Mrt phenotype. None was able to rescue the mortal germline phenotype. However, especially in these long multigenerational experiments, it is difficult to know whether the metabolites are stable. We monitored vitamin B12 activity by using an acdh-1::GFP reporter that is known to be repressed by vitamin B12 - so we are confident of this negative result, which we now show in Figure S4. As cell wall lipopolysaccharide (LPS) differ between E. coli K-12 and B strains, we also tested the E. coli LPS mutants, which had no eff

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

      Reviewer #1 (Recommendations For The Authors):

      Holloway, Broadfield & Yuan (2004) estimate ER 3732 as having a cranial capacity of 750 cc, which is larger than chimps and australopiths and similar to ER 1470 (752 cc, same reference). (That for Dmanisi 2282 is somewhat smaller at around 650 cc.) Cranial capacities should be mentioned along with added discussion about possible allometric scaling of (increased) numbers of sulci with increasing brain size as well as possible shifts in locations of sulci relative to cranial sutures in larger-brained (including due to ontogenetic maturation) in individuals/species. Could these variables (especially brain size) be relevant for your discussion/conclusions?

      We thank Rev. 1 for their suggestion. We included the estimate by Holloway et al. (2004) (line 95): “Holloway et al. (2004) estimated the endocranial volume as about 750-800 cc but insisted on the low reliability of their estimate.”. Additionally, we raised the possibility of potential allometric effect (line 149): “In parallel, the possibility of allometric scaling and influence of brain size on sulcal patterns in early Homo has to be further explored.” for future discussion.

      From the two figures, it appears that the authors produced a virtual endocast from the cranial remains of ER 3732 and compared its features with those seen on a virtual reproduction of the corresponding natural endocast. If so, this needs to be clarified in the text, not just the figures.

      We thank Rev. 1 for their suggestions that were integrated.

      Reviewer #3 (Recommendations For The Authors):

      While the sulcal imprints on the left hemisphere can be interpreted unambiguously, the anatomical assignment of those on the right side may need to be reconsidered, as they are more ambiguous. For example, the postcentral sulcus (pt) almost touches the middle frontal sulcus, which is an unlikely natural configuration.

      We agree that the configuration on the right hemisphere is intriguing, especially when compared to the extant human and chimpanzee atlases. As such, we decided to change the label for what we think could be the inferior frontal sulcus and leave a question mark instead.

      I encourage the authors to include:

      • a posterior view in Figure 1, and mark the lambdoid suture, parts of which seem to be preserved especially on the left side. This will help the readership to better understand which parts of the endocranial morphology are preserved.

      • a scale bar would be of great utility to appreciate the small size of this specimen. The distance from bregma to the Broca cap seems to be short, indicating an endocranial volume much smaller than the published estimate of 750 ccm. Perhaps the authors can provide a new estimate, which would provide further support for the arguments proposed in the discussion section, especially the question of any presence of Australopithecus at Koobi Fora.

      We included a posterior view of the specimen in Figure 1 and scale bar and modified the legend accordingly. Unfortunately, we were not able to identify with certainty the feature that could correspond to the lambdoid suture. We might see the impression where the parietal bone meets the occipital bone, but there is a risk of misidentification (which is an issue frequently raised in the literature, see for example Gunz et al. 2020 Sci Adv). Concerning the endocranial volume, in the revised version of the manuscript we included the estimate by Holloway et al. (2004). Because the specimen only preserves the superior part, we are reluctant in providing an estimate of the total volume. However, we agree that this would be an interesting feature to integrate in the interpretation of this specimen.

      Minor points

      • This sentence needs to be clarified: «The superior temporal sulcus nearly intersects the lateral fissure on the right hemisphere».

      • The terms «Broca's region» and «orbital cap» need some more context. Do the authors mean «Broca's cap» in either instance?

      We clarified/modified when needed, thank you very much.

      We included minor corrections in addition to those recommended by the reviewers:

      -Lines 50, 74, 142, 149: “Broca’s area” instead of “Broca’s cap”

      -Line 73: “in the pre-1.5 Ma Homo specimen” instead of “in pre-1.5 Ma Homo specimen”

      -Line 100: we specified “in human brains and endocasts”

      -Line 120: “sulcal pattern” instead of “sulcal patterns”

      -Line 144: “behaviors” (plural)

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Major comments:

      1. A control group of mice fed chow diet is needed to distinguish the effects of the genotype from those caused by diet. What is the phenotype of regular chow-fed mice in terms of energy metabolism and thermogenesis?

      We are sincerely grateful to Reviewer 1 for raising an important question regarding the need for a control group of mice fed chow diet.

      To address this concern, we have conducted experiments on mice fed a regular chow diet and measured their phenotype in terms of energy metabolism and thermogenesis. In addition to be sure that the phenotype also is present in when we compared littermates we have included as control both to chow-fed CD4-Cre and littermates (MKK3/6f/f). Our findings reveal that MKK3/6CD4-KO mice fed a chow diet presented an increased brown adipose tissue (BAT) thermogenesis compared with CD4-Cre and littermates. This phenotype is similar to the observed in HFD-fed mice. Also, these results indicate that the same phenotype is observed when we compared with littermates including an extra control in the study.

      To further investigate the effect on energy metabolism, we utilized metabolic cages. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly.

      We have thoughtfully incorporated these essential findings into in Supplementary Figure 2C-D of the manuscript.

      1. While an increase in BAT temperature (as demonstrated here by infrared imaging) in line with increased thermogenesis, it will be critical to verify this hypothesis by indirect calorimetry. Energy expenditure, food intake, and activity measures should be added for regular and DIO mice. Please follow the guidelines for ANCOVA analysis and measurements explained in PMID: 22205519 and PMID: 21177944.

      We are grateful to Reviewer 1 for bringing up an essential point concerning the need to verify our hypothesis on increased BAT temperature and thermogenesis through indirect calorimetry. We acknowledge the importance of including energy expenditure, food intake, and activity measures for both regular and DIO mice to strengthen our study.

      To address this valuable suggestion, we have taken immediate action. We utilized metabolic cages in mice under chow diet. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure, without differences in food intake or locomotor activity. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly. These new data are now in Supplementary Figure 2A-B.

      In addition, we have initiated a new experimental group of age-matched mice on HFD, which we will carefully feed for 8 weeks. Following this dietary period, we will subject the mice to metabolic cage analysis, allowing us to obtain accurate data on energy expenditure, food intake, and activity levels. These additional measurements will provide a comprehensive understanding of the metabolic changes induced by MKK3/6 deficiency in T cells under different dietary conditions.

      1. That the phenotype is still seen at isothermal housing is interesting but should be backed up by direct assessment of thermogenic capacity (see PMID: 21177944). In the end, it could also be increased heat loss, independently of heat production. If the browning is cause or consequence remains unclear, then.

      Thank you for raising this important point. Indeed, it is essential to corroborate the observed phenotype with direct assessments of thermogenic capacity to gain a comprehensive understanding of the underlying mechanisms. The study mentioned in PMID: 21177944 highlights the significance of evaluating thermogenesis directly to support the findings.

      According to your suggestion, we plan to house the animals at 30 ºC for four weeks and subsequently inject norepinephrine to evaluate thermogenesis capacity while measuring brown adipose tissue (BAT) activation. This approach should provide valuable insights into the thermogenic potential of the animals under isothermal conditions.

      However, we will not be able to conduct the experiment in metabolic cages at 30 ºC due to the constraint that our system does not allow 30 ºC temperature. For this reason, we will measure BAT temperature to analyze this experiment.

      1. Regarding the in vitro data, a thermogenic phenotype should be functionally verified by Seahorse analysis.

      We thank Reviewer 1 for raising an important point concerning the need for functional verification of the thermogenic phenotype observed in our in vitro data using Seahorse analysis.

      In response to this valuable suggestion, we performed Seahorse analysis in differentiated adipocytes treated with or without IL-35 for 48 hours. The results demonstrated a slight increase in basal metabolism and a heightened response to isoproterenol (ISO) stimulation of β3 adrenergic receptors in adipocytes after IL-35 treatment. These findings provide functional evidence supporting the thermogenic phenotype induced by IL-35 in adipocytes.

      We have thoughtfully included this essential data in Figure 2 of this revision plan, allowing reviewers and the scientific community to comprehensively evaluate and validate the functional implications of our findings.

      1. Mechanistically, there is epistasis type of experiment that IL-35 influences Ucp1 levels via ATF2 as the data remain associative in nature.

      Thank you for your valuable comment. We agree that to establish a mechanistic link between IL-35 and Ucp1 levels will improve the strength of the manuscript.

      To delve deeper into the mechanism through which IL-35 influences Ucp1 expression, we focused on the role of ATF2, a transcription factor known to be involved in regulating UCP1 levels (PMID: 11369767 and PMID: 15024092). In our investigation, we treated adipocytes with IL-35 both in the presence and absence of an inhibitor targeting the ATF2 pathway. The results were illuminating as we observed a significant reduction in the expression of Ucp1 when the ATF2 pathway was inhibited.

      These findings indicate that ATF2 is indeed a crucial mediator of the effects of IL-35 on Ucp1 levels. By inhibiting the ATF2 pathway, we demonstrate a direct functional link between IL-35 and the expression of Ucp1, providing mechanistic insights into the regulatory role of IL-35 in thermogenesis. We included new results in Figure 7F.

      1. What are other consequences of injecting IL-35? Is it good or bad? What is the therapeutic potential in DIO mice? Also, in these experiments (Fig. 7) indirect calorimetry as described would be supportive of the claims.

      Regarding the consequences of injecting IL-35, we have already performed experiments to analyze its effect. Our findings indicate that IL-35 increases thermogenesis in BAT (Figure 7), suggesting that it may play a role in promoting energy expenditure, which could be beneficial in combating diet-induced obesity (DIO) in mice. Importantly, we did not observe any negative effects of IL-35 in our experiments.

      Based on these promising results, we are expecting the therapeutic potential of IL-35 in DIO mice. By promoting thermogenesis in BAT, IL-35 may offer a novel approach to manage obesity and related metabolic disorders. However, we acknowledge that further comprehensive studies are needed to fully understand its therapeutic benefits and potential side effects.

      In our future works, we plan to evaluate a targeted delivery system for IL-35. We are currently generating IL-35 loaded metal-organic frameworks (MOFs) labeled with adipose tissue-specific peptides. This innovative strategy aims to enhance the delivery of IL-35 to adipose tissue, potentially maximizing its effects in the relevant areas. Our ongoing work with IL-35 loaded MOFs may offer a promising avenue for targeted delivery.

      Minor comments:

      1. The authors claim that their HFD-fed MKK3/6CD4-KO mice are protected against hyperglycemia, but only fasted/fed blood glucose tests are performed. Lower glucose levels could be explained due to a hyperinsulinemic state in response to growing insulin resistance in the presence of HFD. It would be sensible to perform both glucose and insulin tolerance tests to back up your statement.

      Thank you for your insightful comment. We agree that to support our claim of protection against hyperglycemia in HFD-fed MKK3/6CD4-KO mice, further tests are necessary beyond fasted/fed blood glucose measurements.

      In response to your suggestion, we conducted both glucose tolerance tests (GTT) and insulin tolerance tests (ITT) in HFD-fed MKK3/6CD4-KO mice. We did not observed differences in glucose tolerance and but ITT showed significantly enhanced insulin sensitivity compared to control mice. These findings provide evidence that the protection against hyperglycemia in HFD-fed MKK3/6CD4-KO mice is not solely due to a hyperinsulinemic state, but rather indicates genuine improvements in glucose handling and insulin response.

      We have thoughtfully included these crucial data in the revised version of the manuscript, both in the main text and Supplementary Figure 4. We extend our appreciation to the reviewer for this valuable suggestion, which has enhanced the scientific rigor and completeness of our study.

      1. Please provide the loading control for p38 and S6 blots (Figure 6G).

      Thank you for the comment. The loading control we used for P p38 and P S6 blots in Figure 6G is β-actin. Due to the limited amount of sample available, we can only use β-actin as the loading control. The sample amount obtained is very limited, and we can only provide enough lysate to run a couple of blots from the same sample. Running several western blots with the same sample is almost impossible given the constraint of the sample availability. We apologize for this limitation, but it is necessary to avoid using too many mice for ethical reasons, as the samples come from a large number of mice.

      1. Statistical test from Figure 7B should be a t-test, since it is only comparing 2 variables (PBS vs IL-35), and not a 2-way ANOVA as described in the legend.

      We sincerely thank the reviewer for the comment. It was indeed a mistake in the text. While we have performed a t-test, there was an error in the legend that we have now corrected. We apologize for any confusion this may have caused and appreciate the opportunity to rectify the oversight.

      1. Label correctly the panels in the figures -examples: Fig 3, panels C and D are interchanged; reference in the text to Fig S1G even though the figure only as panels A-F; Fig 7 legend referes to the statistical test of panel E when the figure only has A-D.

      We sincerely apologize for any mistakes in our manuscript that may have caused difficulties while reading the article and potentially led to misleading results. We are grateful to Reviewer #1 for bringing these errors to our attention. Thanks to their diligent review, we have been able to identify and rectify the issues in our manuscript. The necessary corrections have been made, ensuring the accuracy and reliability of our research. We greatly appreciate the reviewer's valuable feedback and contribution to improving the quality of our work.

      1. There are several typos along the text, please revise (example: page 4;line 4 -"tremorgenic")

      We apologize for the presence of any typos in the initial version of the article. We have thoroughly revised the manuscript to correct these errors. Thank you for bringing this to our attention and helping us improve the accuracy and clarity of our work.

      Reviewer #1 (Significance):

      The manuscript is well written, and the research conducted properly, even though a thorough analysis of energy metabolism in mice and cells is missing and the mechanistic claims are based on relatively thin data.

      The immune system and inflammation play important roles for obesity and insulin resistance, yet the roles they play in thermogenic adipocytes remains unclear. This work adds novel aspects to this relationship.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript by Nikolic et al sought to investigate the role of p38 activation in adipose tissue Treg cells and obesity. They found that the expression of p38a, its upstream kinase MKK6, and downstream substrate ATF2 was upregulated specifically in adipose T cells associated with human obesity. They generated T cell-specific knockout MKK3/6 in mice and found these animals were protected from diet-induced obesity as a result of increased BAT thermogenesis. Mechanistically, loss of p38a activation promoted adipose tissue accumulation of Treg cells, leading to elevated IL-35 availability and UCP1 expression.

      Major comments:

      1. They attributed the obesity protection to energy expenditure; however, food intake and intestinal absorption were never tested. Immune cells particularly Treg cells are important modulates of nutrient uptake.

      We are sincerely grateful to Reviewer #2 for this crucial comment, highlighting the importance of assessing not only energy expenditure but also food intake and intestinal absorption in our study.

      In response to this valuable suggestion, we have initiated an HFD experiment to comprehensively examine food intake and intestinal absorption. For food intake analysis, we are employing metabolic cages, which will allow us to monitor and quantify the amount of food consumed by the mice accurately. Additionally, we plan to follow the methodology outlined in the study by Kraus et al. (PMID: 27110587) to measure lipid content in feces, enabling us to evaluate intestinal absorption.

      By conducting these additional experiments, we aim to gain a deeper understanding of the potential role of Treg cells, known immune modulators of nutrient uptake, in our observed obesity protection phenotype.

      1. At thermoneutrality, BAT is inactive even though UCP1 expression is still present (not activated). MKK3/6 deficiency in T cells still confer protection against obesity at thermoneutrality suggests it regulates other energy balance components in addition to BAT thermogenesis.

      Thanks for the comment. We believe that the effects of IL35 on thermogenesis are likely partly mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT in vivo (Figure 3D of the manuscript), and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive (Figure 4E of the manuscript). This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      While our current findings provide valuable insights, further experiments may be necessary to fully understand the underlying mechanisms. For instance, conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice could shed more light on the specific pathways through which IL35 exerts its effects on thermogenesis and energy balance.

      In conclusion, we hypothesize that IL35's effects on thermogenesis are mediated partly by alternative mechanisms beyond UCP1 activation, and its ability to enhance thermogenesis even at thermoneutrality highlights its potential as a regulator of energy balance. We plan to further investigate the specific mechanisms through which IL35 impacts thermogenesis and energy balance. To achieve this, we will consider conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice in follow up studies. This is now discussed in our manuscript.

      1. Loss of adipose Treg cells (such as Pparg KO, Foxp3-DTR) did not lead to obvious obesity phenotypes. Gain-of-function Treg cells (such as adoptive transfer, IL-2/IL-2 Ab) did not results in profound obesity protection as observed in MKK3/6 CD4-KO mice. It suggests that MKK3/6 KO in T cells causes other immune defects (besides Tregs).

      We agree with the referee's assessment that the lack of obvious obesity phenotypes in above mentioned animal models. The results we observed in our MKK3/6CD4-KO mice suggest that p38 signaling pathway in T cells may modulate their function, leading to an upregulation of IL35 expression, which could be a contributing factor to the significant obesity protection observed in MKK3/6CD4-KO mice. We believe that IL35's effects on energy balance and thermogenesis are critical components of the observed protection against obesity in this model.

      Regarding the studies with PPAR KO in Treg cells, it is important to note that they did not specifically focus on the effect of thermogenesis. While they observed a general tendency of increased fat deposition when treated with a PPAR agonist in the Treg deficient PPAR KO mice, these findings were not extensively studied in that particular paper. Thus, additional research is necessary to specifically evaluate thermogenesis in these mice and further understand the role of PPAR in Treg-mediated thermogenic processes.

      We also acknowledge the presence of contradictory results from loss-of-function experiments of Treg cells in mice. The observed metabolic changes may be context-dependent, and the impact of Treg cells on metabolism might vary under different physiological conditions. For instance, in lean conditions where adipose tissue inflammation is low, a decrease in VAT Treg cells might not lead to significant metabolic changes. However, under certain circumstances, such as obesity, VAT Treg cells may play a critical role in regulating metabolism. In this context increasing that population that is reduced during obesity could results in improve metabolic performance.

      In conclusion, our findings suggest that the lack of p38 activation in Treg cells may prevent the dramatic down-regulation and loss of function observed in Treg cells during obesity. This preservation of Treg function could be a significant factor driving the observed protection against obesity in MKK3/6CD4-KO mice.

      While further studies are required to elucidate the precise timing and spatial aspects of the specific functions of adipose-resident Treg cells, it is evident that these cells play a crucial role in maintaining immune and metabolic homeostasis. They achieve this, in part, by regulating adipose inflammation, insulin sensitivity, lipolysis, and thermogenesis. This is now discussed in our manuscript.

      1. The increase in IL-35 seemed to be very moderate, compared to the metabolic phenotypes. It raises the question if IL-35 is responsible for BAT activation and reduced weight gain. It is unclear what systemic and local levels of IL-35 were reached after recombinant IL-35 treatment (Fig. 7B). IL-35 antibody blockade experiment in KO mice is recommended.

      Physiological changes in cytokines can indeed have a significant impact on the metabolic profile due to their continuous and intricate interactions. Even minor alterations in the overall cytokine milieu can result in substantial changes in metabolism (doi.org/10.1073/pnas.1215840110). In fact, it is well-established that in humans, small changes in cytokine profiles between genders, in obesity, and during aging can play a critical role in the development of pathology. These cytokines often operate in a chronic manner, exerting long-term effects on various physiological processes (doi.org/10.1038/s41467-020-14396-9).

      In summary, the dynamic interplay of cytokines in metabolism can lead to significant metabolic changes even with subtle alterations in their levels. While the increase in IL-35 may appear moderate, our findings using recombinant IL35 indicate that IL-35 increases thermogenesis in BAT, suggesting that it may play a role in promoting energy expenditure, which could be beneficial in combating diet-induced obesity (DIO) in mice. Importantly, we did not observe any negative effects of IL-35 in our experiments.

      1. IL-35 induced p-ATF2 is acute and transient (Fig. 7D) and it was able to increase BAT temperature in just 4 h (Fig. 7B). However, Ucp1 transcription and translation generally take much longer time (e.g. 2d in Fig. 7C). IL-35 may increase energy expenditure through UCP1-independent mechanisms.

      Thanks for the comment. As previously mentioned, we believe that the effects of IL35 on thermogenesis are might be mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT, and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive. This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      While our current findings provide valuable insights, further experiments may be necessary to fully understand the underlying mechanisms. For instance, conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice could shed more light on the specific pathways through which IL35 exerts its effects on thermogenesis and energy balance. We plan to further investigate the specific mechanisms through which IL35 impacts thermogenesis and energy balance. To achieve this, we will consider conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice in follow up studies. This is now discussed in our manuscript.

      Minor comments:

      1. The gating of Treg cells should exclude CD25- cells. Single positive (CD25+ or Foxp3+) cells are progenitors of Tregs. In addition to number, phenotypic activation of Treg cells should also be determined.

      Thank you for the comment. We have reanalyzed our data by excluding CD25- cells and included now in the figure 5A of the manuscript and new supplementary figure 7 of revised manuscript. We also checked CD69+ and KLRG1+ Treg cells and observed no differences between genotypes. We also included figures in this revision plan (Figure 5 and 6).

      1. ATF is also important for adipogenesis, is the adipogenic differentiation of BAT SVF cells affected by MKK3/6 KO or IL-35 treatment?

      We appreciate the reviewer's observation regarding the importance of ATF in adipogenesis. To investigate this aspect further, we performed in vitro differentiation of adipocytes and treated them with IL-35 in the presence or absence of an inhibitor targeting the upstream activator of ATF.

      The results were compelling, as IL-35 treatment led to an increase in the expression of adipogenic markers, including Pparg, Adipoq, Leptin, and Perilipin. In contrast, inhibiting ATF activation resulted in a reduction of these adipogenic markers. These findings provide strong evidence that ATF plays a significant role in mediating the effects of IL-35 on adipogenesis.

      We have thoughtfully included these essential data in Figure 7G of the manuscript. We extend our gratitude to the reviewer for their keen observation, which has enhanced the scientific depth and completeness of our study.

      1. Metabolic cage experiments are desired to determine whole-body energy balance, including food intake, physical activity, and heat production.

      To address this valuable suggestion, we have taken immediate action. We utilized metabolic cages in mice under chow diet. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure, without differences in food intake or locomotor activity. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly. The new data are included in Supplementary figure 2 A-B.

      In addition, we have initiated a new experimental group of age-matched mice on HFD, which we will carefully feed for 8 weeks. Following this dietary period, we will subject the mice to metabolic cage analysis, allowing us to obtain accurate data on energy expenditure, food intake, and activity levels. These additional measurements will provide a comprehensive understanding of the metabolic changes induced by MKK3/6 deficiency in T cells under different dietary conditions.

      1. Total UCP1 expression (both RNA and protein) in the whole BAT from an animal should determined (since BAT is smaller in KO mice).

      Thank you for this comment. Yes, we have measured UCP1 expression in the whole BAT from the animals. It is in the figure 3C and 3D and here. Although in vitro studies indicated that IL35 increase UCP1 in adipocytes we were not able to find an increase of this protein in BAT

      We believe that the effects of IL35 on thermogenesis are likely partly mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT in vivo, and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive (Figure 4E of the manuscript). This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      1. Fig. 6C, IL-35-expressing Treg cells should be quantified from adipose tissue.

      We appreciate the referee's suggestion to quantify IL-35-expressing Treg cells from adipose tissue in Fig. 6C. While we agree that this would be valuable information, we encountered technical challenges that made it impractical to measure IL-35 directly in Treg cells from the visceral adipose tissue (VAT).

      One of the main technical challenges we encountered is the low number of Treg cells present in the adipose tissue, making it difficult to obtain sufficient cell material for accurate quantification of IL-35. Treg cells are relatively rare compared to other immune cell populations in the adipose tissue, and their extraction and analysis can be technically demanding.

      Reviewer #2 (Significance):

      The manuscript is innovative in define the novel role of p38 activation in the T cell compartment and its metabolic regulation. The involvement of Treg cells in adipose tissue homeostasis has been well documented and Treg cell-derived IL-35 has been demonstrated in immune regulation. The authors provided a relatively thorough description of the altered metabolism in these Mkk3/6 CD4-KO mice; however, the reviewer has doubts if Treg cells and IL-35 are primary mechanisms of the observed protection from obesity. The manuscript would be much stronger if the model were Treg cell-specific KO and/or IL-35 deficiency in Treg cells reverses obesity resistance conferred by MKK3/6 deficiency. It also suspected that BAT thermogenesis is not the major reason, as BAT deficiency or UCP1 KO results in much milder phenotypes in mice, even at thermoneutrality.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Specific comments:

      1. It's important to use proper controls for mouse metabolic studies. The authors stated that CD4-Cre and MKK3/6 CD4-KO mice are all in the C57B/6L background. However, it would appear that these two lines were bred separately. The difference in the genetic background, despite minor, can lead to the observed phenotype, notably weight gain. Since the metabolic phenotypes seem to be driven by the weight difference, it is even more critical to include additional controls to validate the findings. For instance, crossing MKK3/6 f/f with one copy of CD4-Cre with MKK3/6 f/f to generate age-matched MKK3/6 CD4-KO and MKK3/6 f/f controls should be used to repeat major in vivo studies similar to those in Fig. 2-4.

      We thank the reviewer for the comment. Although, every control is important using conditional mice, there are several papers indicating that all the cre expression lines have for their own effects that could be important in metabolism and there are several articles that strongly recommended to use cre+ lines as a control. For that reason, we have used the cre expressing line as a control because we really think is the best one (Jonkers and Berns, 2002). In fact, Jackson laboratory recommend to use cre expressing line as a control to avoid side effects that cre overexpression could have in the tissue of interest (https://biokamikazi.files.wordpress.com/2014/07/cre-lox-imp-notes.pdf).

      However, as this reviewer suggested, we checked that similar results were obtained using littermates as controls and we have now included these data in the manuscript (Supplementary Figure 2D).

      1. The assessment of adipose tissue immune cell population in Fig. 5 was conducted after HFD-induced obesity. As mentioned above, the change in Treg and M2 cell percentage could be due to the body weight difference. The experiment should be repeated (with proper controls) in normal chow and after a few weeks of HFD when Treg numbers start to decline.

      Thank you for the comment. We currently performing short HFD experiment to check Treg and M2 cell population in adipose tissue using the littermates as controls.

      In addition, we checked those cell populations in adipose tissue infiltrates in mice fed chow diet and observed no differences in M2 macrophage population between mice, while the percentage of Treg cells was actually lower in MKK3/6CD4-KO mice ND-fed mice (Fig 12 of revision plan). This result suggests that higher accumulation of Treg cells in mice lacking p38 activation in T cells are specific of obese state and strengthen our hypothesis that DIO protection in MKK3/6CD4-KO mice is due to Treg cell population.

      1. Data related to the mechanistic link in Fig. 6/7 are not robust and require a large amount of additional work to substantiate the claim. First of all, the role of IL-35 in BAT thermogenesis remains unclear. It's somewhat surprising to see a single dose of IL-35 i.v. injection is sufficient to increase BAT temperature in Fig. 7B. Minimally, the authors need to demonstrate that IL-35 treatment (perhaps after a few daily doses) is able to increase browning/beiging of fat cells and improve cold tolerance when placing the mice at 4 degree of several hours (and up to 3 days). Serum FGF21 level should also be measured after/during IL-13 treatment. Secondly, ATF2 knockout or knockdown in brown preadipocytes should be employed to demonstrate that IL-35 induced UCP1 and FGF21 expression is ATF2 dependent. Another key experiment is to use IL-35 deficient Treg model to definitively demonstrate the requirement of Treg IL-35 to maintain thermogenesis. However, this can be done in a follow up study.

      We are grateful for all the insightful comment provided by Reviewer #3. We understand the concern, but we have the limitations in performing several sequential i.v. injections in our animal facility due to ethical permissions. In light of this constraint, we have devised an alternative approach to evaluate the role of IL-35 in adaptive thermogenesis.

      To address this, we conducted a cold tolerance test in both control mice and MKK3/6CD4-KO mice, which express higher levels of IL-35. Our findings revealed that MKK3/6CD4-KO mice exposed to cold conditions were able to preserve their body and brown adipose tissue (BAT) temperature, while the temperature of control CD4-Cre mice gradually dropped during the cold challenge.

      The data from this cold tolerance test support our hypothesis and demonstrate the role of IL-35 in promoting adaptive thermogenesis, leading to enhanced temperature maintenance in MKK3/6CD4-KO mice. These observations have been included in Figure 7B of the manuscript, and detailed results are available in Figure 11 of this revision plan.

      We appreciate the reviewer's valuable input, which has encouraged us to explore alternative experimental approaches to address the research question effectively.

      We agree with the reviewer #3 that using IL-35 deficient Treg model would be great approach to confirm our results, but we think that now with the additional experiments we have performed, we strength our findings that IL-35 has a novel role in controlling adipose tissue thermogenesis.

      Reviewer #3 (Significance):

      Dissipating energy as heat through brown or beige adipocyte-mediated thermogenesis is believed to be an effective way to combat obesity. The current study aims to characterize the p38 signaling pathway in T cells as a potential target to modulate browning or beiging of adipose tissues. This would be of interest to the basic biomedical research community, particularly in the area of immunometabolism. A major limitation is the concern of improper controls for the mouse models, which makes data interpretation difficult. In addition, the mechanistic studies lack in depth analyses to support the conclusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constrictive and detailed feedback provided. We have adopted the proposed changes to improve the manuscript clarity and accessibility. The following revisions are included in the revised manuscript:

      Reviewer #1 (Public Review):

      The analytical framework is not sufficiently explained in the main text.

      We think the reviewer is referring to the conceptual framework mentioned in introduction. In the previously submitted manuscript, we did not provide details because the framework is published elsewhere. However, we agree with the reviewer that a short explanation may be helpful, which we have included in the resubmitted manuscript.

      The significance of findings in relation to functional changes is not clear. What are the consequences of enrichment of RNA transport or ribosome biogenesis pathways between pesticides and recovery stages, for example?

      We thank the reviewer for this suggestion. In the previously submitted manuscript, we included an explanation of the central functions these pathways can alter (e.g. metabolism and infection response). These functions are self-explanatory. However, we have elaborated on the consequence that the disruption of these pathways can cause in the resubmitted manuscript.

      The impact of individual biocides and climate variables, and their additive effects, are assessed but there is no information offered on non-additive interactions (e.g., synergistic, antagonistic).

      This was a misunderstanding based on our use of the term synergistic in this context. The approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations). The biocide type and the climate variable were interpreted to have a joint effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). We realise that the use of synergistic or additive was not correct in this context and have replaced the term synergistic with joint effect throughout the manuscript.

      The level of confidence associated with results is not made explicit. The reader is given no information on the amount of variability involved in the observations, or the level of uncertainty associated with model estimates.

      As we didn’t use traditional statistical approaches, confidence level estimation in the traditional sense is not possible. Instead, we used permutation tests and adjusted P-values to identify significant correlations in the data. These approaches are more robust than traditional statistics for integrating and discovering complex, group-wise patterns among high-dimensional datasets. While most forms of machine learning require large sample sizes, sCCA uses fewer observations to identify the most correlated components among data matrices and captures the multivariate variability of the most important features.

      The major implications of the findings for regulatory ecological assessment are missed. Regulators may not be primarily interested in identifying past "ecosystem shifts". What they need are approaches which give greater confidence in monitoring outcomes by better reflecting the ecological impact of contemporary environmental change and ecosystem management. The real value of the work in this regard is that: (1) it shows that current approaches are inappropriate due to the relatively stable nature of the indicators used by regulators, despite large changes in pollutant inputs; (2) it presents some better alternatives, including both taxonomic and functional indicators; and (3) it provides a new reference (or baseline) for regulators by characterizing "semi-pristine" conditions.

      We thank the reviewer for this suggestion, which we have included in the main text (L451461)

      Reviewer #2 (Public Review):

      Results - They are brief and should expand some more. Particularly, there are no results regarding metabarcoding data (number of reads, filtering etc.). These details are important to know the quality of the data which represents the bulk of the analyses. Even the supplementary material gives little information on the metabarcoding results (e.g. number of ASVs - whether every ASV of each family were pooled etc.).

      We thank the reviewer for this suggestion. We have included a paragraph in results reporting read numbers and other statistics. The filtering criteria and handling of samples can be found in methods (L658-661; L670-675). As explained in methods the taxonomy was assigned using qiime feature-classifier classify-sklearn and used at family level where possible. When classification was not possible at family level because of incomplete/missing information in the online database or a poor match to reference database, the lowest classification possible was used.

      The drivers of biodiversity change section could be restructured and include main text tables showing the families positively or negatively correlated with the different variables (akin to table S2 but simplified).

      As there are over 180 unique families/taxonomic units correlated with at least one biocide or environmental variable, a simplified version of this table would be too large to include in the main text. Therefore, we prefer to keep this information in supplementary table 2 complete with correlation statistics.

      We thank the reviewers for providing detailed feedback on the manuscript and respond to their suggestions as follows:

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to review your manuscript, which I found interesting and enjoyable to read. Here are some suggestions for improving it.

      Remove spaces before citations in text.

      Lines 51-53: "Community-level biodiversity reliably explained freshwater ecosystem shifts whereas traditional quality indices (e.g. Trophic Diatom Index) and physicochemical parameters proved to be poor metrics for these shifts." Seems to be the wrong way around / not clear???

      Rephrased to clarify.

      Line 54: Should be "...advocates the use of..." or "...demonstrates the advantages of..."

      Done, thanks for the suggestion.

      Line 62: Spell out numbers <10, i.e. "sixth mass extinction"

      Done, thank you.

      Lines 66-72: These sentences lack clarity. It's not clear that "experimental manipulation of biodiversity" hasn't involved investigation of "multi-trophic changes". By the third of these four sentences it is not clear what "they" is referring to. And in the fourth sentence, "these holistic studies" are not defined. Perhaps it would suffice to say that experiments have so far focused primarily on a single trophic level and largely neglected freshwater systems.

      We have rephrased to improve clarity.

      Line 81: Delete unnecessary bracket

      Done, thank you.

      Line 82: "a minority of freshwater ecosystems" sounds as if you're saying that few freshwater ecosystems are represented in BioTIME, which seems obvious and would also apply to terrestrial and marine systems. Do you mean that freshwater ecosystems re not well represented in the data?

      We have clarified the sentence, thanks.

      Line 106: Resolve issue with citation in text at the end of the sentence (repeated at line 109 and possibly other lines).

      Done, thank you.

      Line 116: By ">1999s" do you mean 1990s?

      This was a typo. it was supposed to be >1999

      Line 120: The reader would benefit greatly from a brief explanation of explainable network models and multimodal learning in the introduction. Why are these the right tools to use? How do they work in this context? Figure 1 helps to some extent but needs more commentary in the text.

      We have included an explanation of the explainable network models and multimodal learning and how their use can be beneficial to the study of diverse data types.

      Line 144: Here and throughout the text the language could be much more efficient and readable. "Alpha diversity" does not require a definite article. Furthermore, when referring to significance it is convention to state the p-value, test statistic and test used.

      As there are different p-values for each barcode, we have included them in legend to Supplementary Fig. 1 to avoid crowding the main text. We prefer to leave the text unchanged for this reason.

      Line 155: "The primary producer's composition" is grammatically awkward and less suitable than "the composition of primary producers". This kind of awkwardness occurs again at line 285 ("diatom's") and possibly in other parts of the manuscript.

      Thanks, corrected.

      Line 169: The statement that this family was "relatively more abundant" needs a little more explanation. What is it relative to - other groups or to previous stages?

      More abundant than in the other phases – the sentence has been modified.

      Line 179: Nested brackets are unnecessary and affect readability. This could simply be a new sentence, i.e. "For example, Nitrospiraceae (nitrite oxidizers)..."

      Done, thanks.

      Line 215: "Functional biodiversity", which implies that some biodiversity is functional and some not, does not seem an appropriate term to describe the results you present in this section. Simply "functioning of the prokaryotic community" would suffice.

      Thanks, done.

      Line 214-233: This section may be inaccessible for many readers. For example, what are Kegg Orthologs and what role do they play in the functioning of a lake ecosystem? The explanation comes later in the paragraph but there needs to be a gentler introduction before diving into specific technical concepts.

      We appreciate this comment and have included a short explanation of what KEGG and KO terms mean.

      Supplementary Figure 3: It would be helpful to superimpose the lake stages here, as done in Figure 2.

      The figure has been updated with coloured data points corresponding to each phase, as in supplementary figure 1.

      Line 265: Should be "19 of which were identified..."

      Done, thanks.

      Line 284: "Predominantly" rather than "prominently"?

      Done

      Line 242-316: This section is good in that it identifies and ranks individual biocides and climate variables but there is no information on non-additive interactions (e.g., synergistic, antagonistic). Could the authors at least comment on why this was not done or not necessary, and what uncertainties this omission could introduce into the results?

      This was a misunderstanding based on our use of the term synergistic in this context. the approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations) – this is shown in Supplementary Fig. 5; Step 6. The biocide type and the climate variable were interpreted to have an additive effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). we have replace synergistic with joint effect to avoid confusion.

      Figure 4: These 3-D plots are very hard to read. Without additional features (e.g. shadows on each plane, or lines connecting points to planes) it is impossible for the viewer to tell where the points are located on each axis.

      We have created interactive 3D plots here: https://environmental-omicsgroup.github.io/Biodiversity_Monitoring/.

      Figure 5: Legend entry should be "summer precipitation" not "precipitations". "Additive effect" rather than "joint effect" would be more consistent with the main text.

      “Precipitations” has been updated to “precipitation” where relevant throughout. We left ‘joint effect’ and unified the main text, responding to a previous comment of this reviewer on the meaning of synergistic effects in our study.

      Line 348: Doesn't your approach also require specialist skills? I often feel that the "traditional" versus "molecular" monitoring debate misses this point. Some comment on the training and development needs for those interested in applying the sedaDNA approach would be welcome. Otherwise it is an unfair comparison.

      Whereas the application of high throughput sequencing technologies requires training, these technologies are well established with publicly available standard operating procedures. As compared to direct observations, high throughput sequencing provides replicable results regardless of the operator. Moreover, the application of metabarcoding to sedaDNA or more generally eDNA can be outsourced to established environmental services, removing the need for training if it is a limiting factor. The above has been included in discussion.

      Line 391: "Significantly did" what? "Did significantly change over time" would be better.

      Done, thanks.

      Line 407: Should be "an indicator of..." and "did not significantly change over time..."

      Done, thanks.

      Line 408-410: Regulators are not necessarily interested in identifying past "ecosystem shifts", so this does not seem to be the best way to contrast the capabilities of the sedaDNA approach with those of LTDI2. The real value of this work, in my opinion, is threefold. First, it shows that the reliance on diatoms as indicators of ecological status is inappropriate due to the relatively stable nature of diatom communities in the face of large environmental changes. Second, it presents some better alternatives, including both taxonomic and functional indicators. And third, it provides a new reference point for regulators by characterising "semi-pristine" conditions.

      Thanks for the insightful suggestion. We agree with the reviewer on the advantages and have spelled them out in the resubmitted manuscript.

      Line 445: What are "housekeeping functions"? I checked the Cuenca-Cambronero paper cited but did not find the term there.

      Housekeeping functions are essential basic cellular functions that are evolutionary conserved. They are more commonly present in public databases because they have been characterised in a number of model species (e.g. Drosophila, C. elegans and Mus musculus). Our reference it not to the Cuenca-Cambronero paper, but to Mi et al, describing the reference database PANTHER. We included the definition of housekeeping functions in the main text.

      Line 449: Briefly state the main functional changes found here.

      Examples have been included.

      Lines 451-452: Whilst this statement may be found in the cited source, most readers I suspect would not identify with it. Indeed, one could argue that most of freshwater ecology has been dedicated to this very task (documenting chemical impacts on biodiversity)! A more balanced view is needed here.

      The sentence the reviewer refers to includes also reference to climate change. Climate change and chemical pollution are the two most common causes of biodiversity loss, and not only in freshwater ecosystems.

      Lines 463-466: These examples both point to non-additive (synergistic) effects, which were not assessed in the current study.

      Please refer to our explanation above about the inappropriate use of synergistic and, here, additive. We have altered the text throughout to use joint effects as we do not investigate synergistic, antagonistic and additive effects as traditionally described in ecology.

      Lines 472-474: This sentence is unclear. Do you mean that this approach surpasses others in terms of reliability? If so, I don't believe this has been demonstrated in the paper.

      We apologise. The word ‘reliability’ should have not been in the text. We have improved the clarity of this sentence.

      Lines 474-482: In these sentences it is unclear whether or not you are talking about your method or contrasting it with another method(s). If the latter, which method or methods are you referring to?

      We have fixed this sentence to better reflect that our algorithm provides a high degree of confidence that surpasses state-of-the-art analysis, which predominantly identify patterns of co-occurrence of taxa within communities (e.g. Correlation-Centric Network).

      Line 631: Should be "Physico-chemical variables". I have not extensively checked the rest of the methods for such errors.

      Thank you, the text has been changed where present.

      Reviewer #2 (Recommendations For The Authors):

      Introduction Line 80 remove extra ')'

      Done, thank you.

      Line 81 rephrase e.g includes few freshwater ecosystems

      We modified this sentence also following Reviewer #1

      Line 83 although, instead of whereas?

      Done, thanks.

      Line 106 formatting reference issue

      Line 109 same as above

      Thank you, noted.

      Results

      Line 141 - 144 how was the sampling of the sediment performed over the 100 year core? Every year? Every 5 years? Or were they pooled to represent the (as of yet unlisted) phases?

      The reviewer is correct that details are not provided here. They are in methods. We have added some text to explain the basic concepts of how the core was obtained and sliced and refer the reader to the method section for more details.

      Line 154 the authors have not yet explicitly listed the lake phases, so it is difficult to refer to them now.

      Noted, the addition of a short explanation at the beginning of the results section should take care of this issue.

      Line 216 - may be worth briefly explaining KEGG orthologs and how these relate to functional biodiversity.

      We thank the reviewer. Also responding to a similar comment from Reviewer #1, we included a description of KO terms and their links to functional biodiversity.

      Lines 249 - 260 instead of a supplementary table, it could remain in the main text

      Supplementary table 2 is a multi-tab table including information for each region amplified here. It is not possible to include this table in the main text.

      Materials and Methods Due to the formatting of the manuscript (results & discussion before materials and methods), many of the results are not clearly understood without having to visit the M&M section. Particularly, how the biocide types were obtained (Historic records plus persistence of DDT in sediments). This could be resolved y including a few sentences on how the data was gathered in the results section. Overall, materials and methods are sufficient, however, it is not clear how many of the 37 metabarcoding samples correspond to which of the lake phases. Finally, I suggest a better organization of M&Ms by having subheadings for each section. For example, under Biodiversity fingerprinting across 100 years, one subheading could de DNA extraction and sequencing, another subheading could be bioinformatics.

      We thank the reviewer for the suggestion. To alleviate the issues linked to the methods section coming after the results section, we have introduced a short explanation of the sediments core and the lake phases at the beginning of the results section. A description of the climate and chemical data has been included at the beginning of the section ‘Drivers of biodiversity change’ in results. Subheadings were introduced in methods as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      .In the best genetically and biochemically understood model of eukaryotic DNA replication, the budding yeast, Saccharomyces cerevisiae, the genomic locations at which DNA replication initiates are determined by a specific sequence motif. These motifs, or ARS elements, are bound by the origin recognition complex (ORC). ORC is required for loading of the initially inactive MCM helicase during origin licensing in G1. In human cells, ORC does not have a specific sequence binding domain and origin specification is not specified by a defined motif. There have thus been great efforts over many years to try to understand the determinants of DNA replication initiation in human cells using a variety of approaches, which have gradually become more refined over time.

      In this manuscript Tian et al. combine data from multiple previous studies using a range of techniques for identifying sites of replication initiation to identify conserved features of replication origins and to examine the relationship between origins and sites of ORC binding in the human genome. The authors identify a) conserved features of replication origins e.g. association with GC-rich sequences, open chromatin, promoters and CTCF binding sites. These associations have already been described in multiple earlier studies. They also examine the relationship of their determined origins and ORC binding sites and conclude that there is no relationship between sites of ORC binding and DNA replication initiation. While the conclusions concerning genomic features of origins are not novel, if true, a clear lack of colocalization of ORC and origins would be a striking finding.

      Thank you. That is where the novelty of the paper lies.

      However, the majority of the datasets used do not report replication origins, but rather broad zones in which replication origins fire. Rather than refining the localisation of origins, the approach of combining diverse methods that monitor different objects related to DNA replication leads to a base dataset that is highly flawed and cannot support the conclusions that are drawn, as explained in more detail below.

      We are using the narrowly defined SNS-seq peaks as the gold standard origins and making sure to focus in on those that fall within the initiation zones defined by other methods. The objective is to make a list of the most reproducible origins. Unlike what the reviewer states, this actually refines the dataset to focus on the SNS origins that have also been reproduced by the other methods in multiple cell lines. We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “shared origins”. This and the Fig. 2B (as it is) will make our strategy clearer.

      Methods to determine sites at which DNA replication is initiated can be divided into two groups based on the genomic resolution at which they operate. Techniques such as bubble-seq, ok-seq can localise zones of replication initiation in the range ~50kb. Such zones may contain many replication origins. Conversely, techniques such as SNS-seq and ini-seq can localise replication origins down to less than 1kb. Indeed, the application of these different approaches has led to a degree of controversy in the field about whether human replication does indeed initiate at discrete sites (origins), or whether it initiates randomly in large zones with no recurrent sites being used. However, more recent work has shown that elements of both models are correct i.e. there are recurrent and efficient sites of replication initiation in the human genome, but these tend to be clustered and correspond to the demonstrated initiation zones (Guilbaud et al., 2022).

      These different scales and methodologies are important when considering the approach of Tian et al. The premise that combining all available data from five techniques will increase accuracy and confidence in identifying the most important origins is flawed for two principal reasons. First, as noted above, of the different techniques combined in this manuscript, only SNS-seq can actually identify origins rather than initiation zones. It is the former that matters when comparing sites of ORC binding with replication origin sites if a conclusion is to be drawn that the two do not co-localise.

      Exactly. So the reviewer should agree that our method of finding SNS-seq peaks that fall within initiation zones actually refines the origins to find the most reproducible origins. We are not losing the spatial precision of the SNS-seq peaks.

      Second, the authors give equal weight to all datasets. Certainly, in the case of SNS-seq, this is not appropriate. The technique has evolved over the years and some earlier versions have significantly different technical designs that may impact the reliability and/or resolution of the results e.g. in Foulk et al. (Foulk et al., 2015), lambda exonuclease was added to single stranded DNA from a total genomic preparation rather than purified nascent strands), which may lead to significantly different digestion patterns (ie underdigestion). Curiously, the authors do not make the best use of the largest SNS-seq dataset (Akerman et al., 2020) by ignoring these authors separation of core and stochastic origins. By blending all data together any separation of signal and noise is lost. Further, I am surprised that the authors have chosen not to use data and analysis from a recent study that provides subsets of the most highly used and efficient origins in the human genome, at high resolution (Guilbaud et al., 2022).

      1) We are using the data from Akerman et al., 2020: Dataset GSE128477 in Supplemental Table 1. We can examine the core origins defined by the authors to check its overlap with ORC binding.

      2) To take into account the refinement of the SNS-seq methods through the years, we actually included in our study only those SNS-seq studies after 2018, well after the lambda exonuclease method was introduced. Indeed, all 66 of SNS-seq datasets we used were obtained after the lambda exonuclease digestion step. To reiterate, we recognize that there may be many false positives in the individual origin mapping datasets. Our focus is on the True positives, the SNS-seq peaks that have some support from multiple SNS-seq studies AND fall within the initiation zones defined by the independent means of origin mapping (described in Fig. 1A and 2B). These True positives are most likely to be real and reproducible origins and should be expected to be near ORC binding sites.

      We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “Shared origins”.

      Ini-seq by Torsten Krude and co-workers (Guillbaud, 2022) does NOT use Lambda exonuclease digestion. So using Ini-seq defined origins is at odds with the suggestion above that we focus only on SNS-seq datasets that use Lambda exonuclease. However, Ini-seq identifies a much smaller subset of SNS-seq origins, so we will do the analysis with just that smaller set in the revision of the paper.

      References:

      Akerman I, Kasaai B, Bazarova A, Sang PB, Peiffer I, Artufel M, Derelle R, Smith G, Rodriguez-Martinez M, Romano M, Kinet S, Tino P, Theillet C, Taylor N, Ballester B, Méchali M (2020) A predictable conserved DNA base composition signature defines human core DNA replication origins. Nat Commun, 11: 4826

      Foulk MS, Urban JM, Casella C, Gerbi SA (2015) Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res, 25: 725-735

      Guilbaud G, Murat P, Wilkes HS, Lerner LK, Sale JE, Krude T (2022) Determination of human DNA replication origin position and efficiency reveals principles of initiation zone organisation. Nucleic Acids Res, 50: 7436-7450

      Reviewer #2 (Public Review):

      Tian et al. perform a meta-analysis of 113 genome-wide origin profile datasets in humans to assess the reproducibility of experimental techniques and shared genomics features of origins. Techniques to map DNA replication sites have quickly evolved over the last decade, yet little is known about how these methods fare against each other (pros and cons), nor how consistent their maps are. The authors show that high-confidence origins recapitulate several known features of origins (e.g., correspondence with open chromatin, overlap with transcriptional promoters, CTCF binding sites). However, surprisingly, they find little overlap between ORC/MCM binding sites and origin locations.

      Overall, this meta-analysis provides the field with a good assessment of the current state of experimental techniques and their reproducibility, but I am worried about: (a) whether we've learned any new biology from this analysis; (b) how binding sites and origin locations can be so mismatched, in light of numerous studies that suggest otherwise; and (c) some methodological details described below.

      Major comments:

      Line 26: "0.27% were reproducibly detected by four techniques" -- what does this mean? Does the fragment need to be detected by ALL FOUR techniques to be deemed reproducible?

      If the reproducible SNS-seq peaks are included in the reproducible initiation zones found by the other methods, then we consider it reproducible across datasets. The strategy is to focus our analysis on the most reproducible SNS-seq peaks that happen to be in reproducible initiation zones. It is the best way to confidently identify a very small set of true positive origins.

      And what if the technique detected the fragment is only 1 of N experiments conducted; does that count as "detected"?

      A reproducible SNS-seq origin has been reproduced above a statistical threshold of 20 reproductions. A threshold of reproduction in 20 datasets out of 66 SNS-seq datasets gives an FDR of <0.1. This is explained in Fig. 2a and Supplementary Fig. S2. For the initiation zones, we considered a Zone even if it appears in only 1 of N experiments, because N is usually small. This relaxed method for selecting the initiation zones gives the best chance of finding SNS-seq peaks that are reproduced by the other methods.

      Later in Methods, the authors (line 512) say, "shared origins ... occur in sufficient number of samples" but what does sufficient mean?

      Sufficient means that SNS-seq origin was reproducibly detected in ≥ 20 datasets and was included in any initiation zone defined by three other techniques.

      Then on line 522, they use a threshold of "20" samples, which seems arbitrary to me. How are these parameters set, and how robust are the conclusions to these settings? An alternative to setting these (arbitrary) thresholds and discretizing the data is to analyze the data continuously; i.e., associate with each fragment a continuous confidence score.

      We explained Fig. 2a and Supplementary Fig. S2 in the text as follows: The occupancy score of each origin defined by SNS-seq (Supplementary Fig. 2a) counts the frequency at which a given origin is detected in the datasets under consideration. For the random background, we assumed that the number of origins confirmed by increasing occupancy scores decreases exponentially (see Methods and Supplementary Table 2). Plotting the number of origins with various occupancy scores when all SNS-seq datasets published after 2018 are considered together (the union origins) shows that the experimental curve deviates from the random background at a given occupancy score (Fig. 2a). The threshold occupancy score of 20 is the point where the observed number of origins deviates from the expected background number (with an FDR < 0.1) (Fig. 2a). In the Methods: In other words, the number of observed origins with occupancy score greater than 20 is 10 times more than expected in the background model. This approach is statistically sound and described by us in (Fang et al. 2020).

      Line 20: "50,000 origins" vs "7.5M 300bp chromosomal fragments" -- how do these two numbers relate? How many 300bp fragments would be expected given that there are ~50,000 origins? (i.e., how many fragments are there per origin, on average)? This is an important number to report because it gives some sense of how many of these fragments are likely nonsense/noise. The authors might consider eliminating those fragments significantly above the expected number, since their inclusion may muddle biological interpretation.

      I think we confused the reviewer by the way we wrote the abstract. The 50,000 origins that are mentioned in the abstract is the hypothetical expected number of origins that have to fire to replicate the whole 6x10^9 base diploid genome based on the average inter-origin distance of 10^5 bases (as determined by molecular combing). The 7.5M 300 bp fragments are the genomic regions where the 7.5M union SNS-seq-defined origins are located. Clearly, that is a lot of noise, some because of technical noise and some due to the fact that origins fire stochastically. Which is why our paper focuses on a smaller number of reproducible origins, the 20,250 shared origins. Our analysis is on the 20,250 shared origins, and not on all 7.5M union origins. Thus, we are not including the excess of non-reproducible (stochastic?) origins in our analysis.

      The revised abstract in the revised paper will say: “Based on experimentally determined average inter-origin distances of ~100 kb, DNA replication initiates from ~50,000 origins on human chromosomes in each cell-cycle. The origins are believed to be specified by binding of factors like the Origin Recognition Complex (ORC) or CTCF or other features like G-quadruplexes. We have performed an integrative analysis of 113 genome-wide human origin profiles (from five different techniques) and 5 ORC-binding site datasets to critically evaluate whether the most reproducible origins are specified by these features. Out of ~7.5 million union origins identified by 66 SNS-seq datasets, only 0.27% were reproducibly contained in initiation zones identified by three other techniques (20,250 shared origins), suggesting extensive variability in origin usage and identification in different circumstances.”

      Line 143: I'm not terribly convinced by the PCA clustering analysis, since the variance explained by the first 2 PCs is only ~25%. A more robust analysis of whether origins cluster by cell type, year etc is to simply compute the distribution of pairwise correlations of origin profiles within the same group (cell type, year) vs the correlation distribution between groups. Relatedly, the authors should explain what an "origin profile" is (line 141). Is the matrix (to which PCA is applied) of size 7.5M x 113, with a "1" in the (i,j) position if the ith fragment was detected in the jth dataset?

      The reviewer is correct about how we did the PCA and have now included the description in the Methods. We will also do the pairwise correlations the way the reviewer suggests (a) by techniques, (b) by cell types (SNS-seq), (c) by year of publication (SNS-seq).

      It's not clear to me what new biology (genomic features) has been learned from this meta-analysis. All the major genomic features analyzed have already been found to be associated with origin sites. For example, the correspondence with TSS has been reported before:

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320713/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547456/

      So what new biology has been discovered from this meta-analysis?

      The new biology can be summarized as: (a) We can identify a set of reproducible (in multiple datasets and in multiple cell lines) SNS-seq origins that also fall within initiation zones identified by completely independent methods. These may be the best origins to study in the midst of the noise created by stochastic origin firing. (b) The overlap of these True Positive origins with known ORC binding sites is tenuous. So either all the origin mapping data, or all the ORC binding data has to be discarded, or this is the new biological reality in mammalian cancer cells: on a genome-wide scale the most reproduced origins are not in close proximity to ORC binding sites, in contrast to the situation in yeast. (c) All the features that have been reported to define origins (CTCF binding sites, G quadruplexes etc.) could simply be from the fact that those features also define transcription start sites (TSS), and origins prefer to be near TSS because of the favorable chromatin state.

      Line 250: The most surprising finding is that there is little overlap between ORC/MCM binding sites and origin locations. The authors speculate that the overlap between ORC1 and ORC2 could be low because they come from different cell types. Equally concerning is the lack of overlap with MCM. If true, these are potentially major discoveries that butts heads with numerous other studies that have suggested otherwise. More needs to be done to convince the reader that such a mis-match is true. Some ideas are below:

      Idea 1) One explanation given is that the ORC1 and ORC2 data come from different cell types. But there must be a dataset where both are mapped in the same cell type. Can the authors check the overlap here? In Fig S4A, I would expect the circles to not only strongly overlap but to also be of roughly the same size, since both ORC's are required in the complex. So something seems off here.

      We agree with the reviewer that there is something “off here”. Either the techniques that report these sites are all wrong, or the biology does not fit into the prevailing hypothesis. One secret in the ORC ChIP field that our lab has struggled with for quite some time is that the various ORC subunits do not necessarily ChiP-seq to the same sites. The poor overlap between the binding sites of subunits of the same complex either suggests that the subunits do not always bind to the chromatin as a six-subunit complex or that all the ChIP-seq data in the Literature is suspect. We provide in the supplementary figure S4A examples of true positive complexes (SMARCA4/ARID1A, SMC1A/SMC3, EZH2/SUZ12), whose subunits ChIP-seq to a large fraction of common sites. As shown in Supplementary Fig. S4C, we do not have ORC1 and ORC2 ChIP-seq data from the same cell-type. We have ORC1 ChIP-seq and SNS-seq data from HeLa cells and ORC2 ChIP seq and origins from K562 cells, and so will add the proximity/overlap of the binding sites to the origins in the same cell-type in the revision.

      Idea 2) Another explanation given is that origins fire stochastically. One way to quantify the role of stochasticity is to quantify the overlap of origin locations performed by the same lab, in the same year, in the same experiment, in the same cell type -- i.e., across replicates -- and then compute the overlap of mapped origins. This would quantify how much mis-match is truly due to stochasticity, and how much may be due to other factors.

      A given lab may have superior reproducibility compared to the entire field. But the notion of stochasticity is well accepted in the field because of this observation: the average inter-origin distance measured by single molecule techniques like molecular combing is ~100 kb, but the average inter-origin distance measure on a population of cells (same cell line) is ~30 kb. The only explanation is that in a population of cells many origins can fire, but in a given cell on a given allele, only one-third of those possible origins fire. This is why we did not worry about the lack of reproducibility between cell-lines, labs etc, but instead focused on those SNS-seq origins that are reproducible over multiple techniques and cell lines.

      Idea 3) A third explanation is that MCMs are loaded further from origin sites in human than in yeast. Is there any evidence of this? How far away does the evidence suggest, and what if this distance is used to define proximity?

      MCMs, of course, have to be loaded at an origin at the time the origin fires because MCMs provide the core of the helicase that starts unwinding the DNA at the origin. Thus, the lack of proximity of MCM binding sites with origins can be because the most detected MCM sites (where MCM spends the most time in a cell-population) does not correspond to where it is first active to initiate origin firing. This has been discussed. MCMs may be loaded far from origin site, but because of their ability to move along the chromatin, they have to move to the origin-site at some point to fire the origin.

      Idea 4) How many individual datasets (i.e., those collected and published together) also demonstrate the feature that ORC/MCM binding locations do not correlate with origins? If there are few, then indeed, the integrative analysis performed here is consistent. But if there are many, then why would individual datasets reveal one thing, but integrative analysis reveal something else?

      We apologize for this oversight. In the revised manuscript we will discuss PMC3530669, PMC7993996, PMC5389698, PMC10366126. None of them have addressed what we are addressing, which is whether the small subset of the most reproducible origins proximal to ORC or MCM binding sites, but the discussion is essential.

      Idea 5) What if you were much more restrictive when defining "high-confidence" origins / binding sites. Does the overlap between origins and binding sites go up with increasing restriction?

      We will make origins more restrictive by selecting those reproduced by 30-60 datasets. The number of origins will of course fall, but we will measure whether the proximity to ORC or MCM-binding sites increases/decreases in a statistically rigorous way.

      Overall, I have the sense that these experimental techniques may be producing a lot of junk. If true, this would be useful for the field to know! But if not, and there are indeed "unexplored mechanisms of origin specification" that would be exciting. But I'm not convinced yet.

      It would be nice in the Discussion for the authors to comment about the trade-offs of different techniques; what are their pros and cons, which should be used when, which should be avoided altogether, and why? This would be a valuable prescription for the field.

      Thanks for the suggestion. We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      Reviewer #3 (Public Review):

      Summary: The authors present a thought-provoking and comprehensive re-analysis of previously published human cell genomics data that seeks to understand the relationship between the sites where the Origin Recognition Complex (ORC) binds chromatin, where the replicative helicase (Mcm2-7) is situated on chromatin, and where DNA replication actually beings (origins). The view that these should coincide is influenced by studies in yeast where ORC binds site-specifically to dedicated nucleosome-free origins where Mcm2-7 can be loaded and remains stably positioned for subsequent replication initiation. However, this is most certainly not the case in metazoans where it has already been reported that chromatin bindings sites of ORC, Mcm2-7, and origins do not necessarily overlap, likely because ORC loads the helicase in transcriptionally active regions of the genome and, since Mcm2-7 retains linear mobility (i.e., it can slide), it is displaced from its original position by other chromatin-contextualized processes (for example, see Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, and Prioleau et al., 2016 G&D amongst others). This study reaches a very similar conclusion: in short, they find a high degree of discordance between ORC, Mcm2-7, and origin positions in human cells.

      Strengths: The strength of this work is its comprehensive and unbiased analysis of all relevant genomics datasets. To my knowledge, this is the first attempt to integrate these observations and the analyses employed were suited for the questions under consideration.

      Thank you for recognizing the comprehensive and unbiased nature of our analysis. The fact that the major weakness is that the comprehensive view fails to move the field forward, is actually a strength. It should be viewed in the light that we cannot even find evidence to support the primary hypothesis: that the most reproducible origins must be near ORC and MCM binding sites. This finding will prevent the unwise adoption of ORC or MCM binding sites as surrogate markers of origins and may perhaps stimulate the field to try and improve methods of identifying ORC or MCM binding until the binding sites are found to be proximal to the most reproducible origins. The last possibility is that there are ORC- or MCM-independent modes of defining origins, but we have no evidence of that.

      Weaknesses: The major weakness of this paper is that this comprehensive view failed to move the field forward from what was already known. Further, a substantial body of relevant prior genomics literature on the subject was neither cited nor discussed. This omission is important given that this group reaches very similar conclusions as studies published a number of years ago. Further, their study seems to present a unique opportunity to evaluate and shape our confidence in the different genomics techniques compared in this study. This, however, was also not discussed.

      We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. Thanks for the suggestion. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus, we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      We do not cite the SNS-seq data before 2018 because of the concerns discussed above about the earlier techniques needing improvement. We will discuss other genomics data that we failed to discuss.

      We will cite the papers the reviewer names:

      Gros, Mol Cell 2015 and Powell, EMBO J. 2015 discuss the movement of MCM2-7 away from ORC in yeast and fliesand will be cited. MCM2-7 binding to sites away from ORC and being loaded in vast excess of ORC was reported earlier on Xenopus chromatin in PMC193934, and will also be cited.

      Miotto, PNAS, 2016: publishes ORC2 ChIP-seq sites in HeLa (data we have used in our analysis), but do not measure ORC1 ChIP-seq sites. They say: “ORC1 and ORC2 recognize similar chromatin states and hence are likely to have similar binding profiles.” This is a conclusion based on the fact that the ChIP seq sites in the two studies are in areas with open chromatin, it is not a direct comparison of binding sites of the two proteins.

      Prioleau, G&D, 2016: This is a review that compared different techniques of origin identification but has no primary data to say that ORC and MCM binding sites overlap with the most reproducible origins.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study investigates the context-specificity of facial expressions in three species of macaques to test predictions for the 'social complexity hypothesis for communicative complexity'. This hypothesis has garnered much attention in recent years. A proper test of this hypothesis requires clear definitions of 'communicative complexity' and 'social complexity'. Importantly, these two facets of a society must not be derived from the same data because otherwise, any link between the two would be trivial. For instance, if social complexity is derived from the types of interactions individuals have, and different types of signals accompany these interactions, we would not learn anything from a correlation between social and communicative complexity, as both stem from the same data.

      The authors of the present paper make a big step forward in operationalising communicative complexity. They used the Facial Action Coding System to code a large number of facial expressions in macaques. This system allows decomposing facial expressions into different action units, such as 'upper lid raiser', 'upper lip raiser' etc.; these units are closely linked to activating specific muscles or muscle groups. Based on these data, the authors calculated three measures derived from information theory: entropy, specificity and prediction error. These parts of the analysis will be useful for future studies.

      The three species of macaque varied in these three dimensions. In terms of entropy, there were differences with regard to context (and if there are these context-specific differences, then why pool the data?). Barbary and Tonkean macaques showed lower specificity than rhesus macaques. Regarding predicting context from the facial signals, a random forest classifier yielded the highest prediction values for rhesus monkeys. These results align with an earlier study by Preuschoft and van Schaik (2000), who found that less despotic species have greater variability in facial expressions and usage.

      Crucially, the three species under study are also known to vary in terms of their social tolerance. According to the highly influential framework proposed by Bernard Thierry, the members of the genus Macaca fall along a graded continuum from despotic (grade 1) to highly tolerant (grade 4). The three species chosen for the present study represent grade 1 (rhesus monkeys), grade 3 (Barbary macaques), and grade 4 (Tonkean macaques).

      The authors of the present paper define social complexity as equivalent to social tolerance - but how is social tolerance defined? Thierry used aggression and conflict resolution patterns to classify the different macaque species, with the steepness of the rank hierarchy and the degree of nepotism (kin bias) being essential. However, aggression and conflict resolution are accompanied by facial gestures. Thus, the authors are looking at two sides of the same coin when investigating the link between social complexity (as defined by the authors) and communicative complexity. Therefore, I am not convinced that this study makes a significant advance in testing the social complexity for communicative complexity hypothesis. A further weakness is that - despite the careful analysis - only three species were considered; thus, the effective sample size is very small.

      Social tolerance in macaques is defined by various covarying traits, among which rates of counter-aggression and conflict resolution are only two of many included (see Thierry 2021 for a recent discussion and review). We do not deviate from Thierry’s definition of social tolerance. We simply highlight that the constellation of behavioral traits in the most tolerant macaque species results in a social environment where the outcome of social interactions is more uncertain (see introduction lines 102-114). As we argue throughout the paper, higher uncertainty can be used as a proxy for higher complexity and thus we conclude that the most tolerant macaque species have the highest social complexity. While most social behavior in macaques is accompanied by some facial behavior, we were careful to define social contexts only from the body language/behavior (e.g., lunge for aggression, grooming for affiliation) of the individuals involved and ignored the facial behavior used (see method lines 371-381). Therefore, the facial behavior of macaques (communication signals) was not used in defining either social tolerance (and by extension complexity) or the social context in which it was used. We feel like this appropriately minimizes any elements of circularity in the analysis of social and communicative complexity.

      Regarding the effective sample size of three species, we agree that it is small, and it is a limitation of this study. However, the methodology we used is applicable to any species for which FACS is available (including other non-human primates, dogs, and horses), and therefore, we hope that other datasets will complement ours in the future. Nevertheless, we now acknowledge this limitation in the discussion (lines 314317).

      Reviewer #2 (Public Review):

      This is a well-written manuscript about a strong comparative study of diversity of facial movements in three macaque species to test arguments about social complexity influencing communicative complexity. My major criticism has to do with the lack of any reporting of inter-observer reliability statistics - see comment below. Reporting high levels of inter-observer reliability is crucial for making clear the authors have minimized chances of possible observer biases in a study like this, where it is not possible to code the data blind with regard to comparison group. My other comments and questions follow by line number:

      We agree that inter-observer coding reliability is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392.

      38-40. Whereas I am an advocate of this hypothesis and have tested it myself, the authors should probably comment here, or later in the discussion, about the reverse argument - greater communicative complexity (driven by other selection pressures) could make more complicated social structures possible. This latter view was the one advocated by McComb & Semple in their foundational 2005 Biology Letters comparative study of relationships between vocal repertoire size and typical group size in non-human primate species.

      It is true that an increase in communicative complexity could allow/drive an increase in social complexity. Unfortunately our data is correlational in nature and we cannot determine the direction of causality. We added such a statement to the discussion (lines 311-314).

      72-84 and 95-96. In the paragraph here, the authors outline an argument about increasing uncertainty / entropy mapping on to increasing complexity in a system (social or communicative). In lines 95-96, though, they fall back on the standard argument about complex systems having intermediate levels of uncertainty (complete uncertainty roughly = random and complete certainty roughly = simple). Various authors have put forward what I think are useful ways of thinking about complexity in groups - from the perspective of an insider (i.e., a group member, where greater randomness is, in fact, greater complexity) vs from the perspective of an outside (i.e., a researcher trying to quantify the complexity of the system where is it relatively easy to explain a completely predictable or completely random system but harder to do so for an intermediately ordered or random system). This sort of argument (Andrew Whiten had an early paper that made this argument) might be worth raising here or later in the discussion? (I'm also curious where the authors sentiments lie for this question - they seem to touch on it in lines 285-287, but I think it's worth unpacking a little more here!)

      In this study we used three measures of uncertainty (entropy, context specificity, and prediction error) to approximate complexity. However, maximum entropy or uncertainty would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or unpredictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy/unpredictability value. Our argument is that animal communication systems cannot possibly be random, otherwise they would not have evolved as signals. In systems where we know the highest entropy (or unpredictability) will not be due to randomness, as is the case with animal social interactions and communication, we can conclude that the system with the highest uncertainty is the most complex. We have now expanded upon this point in the discussion (lines 286-294). See also response to reviewer 1 below.

      115-129. See also:

      Maestripieri, D. (2005). "Gestural communication in three species of macaques (Macaca mulatta, M. nemestrina, M. arctoides): use of signals in relation to dominance and social context." Gesture 5: 57-73.

      Maestripieri, D. and K. Wallen (1997). "Affiliative and submissive communication in rhesus macaques." Primates 38(2): 127-138.

      On that note, it is probably worth discussing in this paragraph and probably later in the discussion exactly how this study differs from these earlier studies of Maestripieri. I think the fact that machine learning approaches had the most difficulty assigning crested data to context is an important methodological advance for addressing these sorts of questions - there are probably other important differences between the authors' study here and these older publications that are worth bringing up.

      Our study differs from these two studies in that the studies above classified facial behavior into discrete categories (e.g., bared-teeth, lip-smack), whereas we adopted a bottom-up approach and made no a priori assumptions about which movements are relevant. We broke down facial behavior down to their individual muscle movements (i.e., Action Units). Measuring facial behavior at the level of individual muscle movements allows for a more detailed and objective description of the complexity of facial behavior. This is a general point in advancing the study of facial behavior that is discussed in the introduction (lines 60-71) and discussion (lines 206-208). The reason we don’t draw a direct comparison with the studies above is because they had a slightly different focus. Our study was more focused on complexity of the (facial) communication system in general rather than comparing whether the different species use the same facial behavior in the same/different social contexts.

      220-222. What is known about visual perception in these species? Recent arguments suggest that more socially complex species should have more sensitive perceptual processing abilities for other individuals' signals and cues (see Freeberg et al. 2019 Animal Behaviour). Are there any published empirical data to this effect, ideally from the visual domain but perhaps from any domain?

      This is an interesting point. We are not aware of any studies showing differences in visual perceptions within the macaque genus. Both crested macaques and rhesus macaques are able to discriminate between individuals and facial expressions in match-to-sample tasks with comparable performances (Micheletta et al., 2015a, 2015b; Parr et al. 2008; Parr & Heinz, 2009). Similarly, several macaque species are sensitive to gaze shifts from conspecifics (Tomasello et al. 1998; Teufel et al. 2010; Micheletta & Waller, 2012).

      274-277. I am not sure I follow this - could not different social and non-social contexts produce variation in different affective states such that "emotion"-based signals could be as flexible / uncertain as seemingly volitional / information-based / referential-like signals? This issue is probably too far away from the main points of this paper, but I suspect the authors' argument in this sentence is too simplified or overstated with regard to more affect-based signals.

      Emotion-based signals could, in theory, also produce flexible signals and it is possible that some facial expressions reflect an emotional state. However, some previous studies have suggested that facial expressions are only used as a display of emotion, rather than such signals having evolved for a different function such as announcing future intentions. In our study we found that macaques used, in some cases, the same facial expressions (i.e. combination of Action Units) in at least two different social contexts that, presumably, differed in their emotional valence. Thus, it is unlikely that particular facial expressions are bound to a single emotion. We think that this is an important point to make even though it is slightly beyond the scope of our paper.

      288 on. Given there are only three species in this study, the chances of one of the species being the 'most complex' in any measure is 0.33. Although I do not believe this argument I am making here, can the authors rule out the possibility that their findings related to crested macaques are all related to chance, statistically speaking?

      We are not aware of a way to rule out this possibility. However, we believe that we are appropriately cautious throughout the paper and acknowledge that having only investigated three species is a limitation of this study in the discussion (lines 314-317, see also our response to reviewer 1 above).

      329-330. The fact that only one male rhesus macaque was assessed here seems problematic, given the balance of sexes in the other two species. Can the authors comment more on this - are the gestures they are studying here identical across the sexes?

      We agree it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences.

      354-371. Inter-observer reliability statistics are required here - one of the authors who did not code the original data set, or a trained observer who is not an author, could easily code a subset of the video files to obtain inter-observer reliability data. This is important for ruling out potential unconscious observer biases in coding the data.

      We agree this is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392:

      “An agreement rating of >0.7 was considered good [Ekman et al 2002] and was necessary for obtaining certification. To obtain a MaqFACS coding certification, AVR, CP, and PRC coded 23 video clips of rhesus macaques and the MaqFACS codes were compared to the data of other certified coders (https://animalfacs.com).

      The mean agreement ratings obtained were 0.85, 0.73, 0.83 for AVR, CP, and PRC, respectively. In addition, AVR and CP coded 7 videos of Barbary macaques with a mean agreement rating of 0.79. AVR and PRC coded 10 videos of crested macaques with a mean agreement rating of 0.74.”

      Reviewer #1 (Recommendations For The Authors):

      Given the long debate on the concept of information exchange in animal communication, I would also recommend being more careful with the term 'exchanges of information' (line 271). Perhaps it's better to be agnostic in the context of this paper.

      As suggested, we now changed the phrasing to focus on the behavior of the animals, rather than suggesting that information is being exchanged (lines 270-273),

      Line 281: "This result confirms the assumption that facial behaviour in macaques is not used randomly": the authors are knocking down a straw man. Nobody who has ever studied animal communication would consider that signals occur randomly. Otherwise, they would not have evolved as signals.

      Indeed, nobody claims that animal communication signals are used randomly. Although it may be taken for granted, we feel it is worthwhile to reiterate this point, given that we used relative entropy and prediction error as measures of complexity. For instance, maximum entropy or unpredictability would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or lowest predictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy value. But if we are working under the assumption that animal communication systems cannot possibly be random, then we can conclude that the species whose communication system has the highest entropy is in fact the most complex. We tried to make this justification clearer in the discussion (lines 285-294).

      I did not follow why there is a higher reliance on facial signals when predation pressure is higher. Apart from the fact that the authors cannot address this question, they may want to reconsider this idea altogether.

      We now expand on the logic of why predation pressure might affect the use of facial signals (see lines 308-309): “When predation pressure is higher, reliance on facial signals could be higher than, for example vocal signals, such as to not draw attention of predators to the signaller.”

      Technical comments:

      One methodological issue that requires clarification is what the units of analysis are. The authors write that each row in their analysis denoted an observation time of 500 ms. How many rows did the authors assemble? The authors mention a sample size of > 3000 social interactions in the abstract. How did they define social interactions? And how many 'time windows' of 500 ms were obtained? Did they take one window per interaction or several? If several, then how was this move accounted for in the analysis? The reporting needs to be more accurate here. Most likely, the bootstrapping took care of biases in the data, but still, this information needs to be provided.

      We have now added some additional information to the method section. Social interactions for each context had the following definitions: “Social context was labeled from the point of view of the signaler based on their general behavior and body language (but not the facial behavior itself), during or immediately following the facial behavior. An aggressive context was considered when the signaler lunged or leaned forward with the body or head, charged, chased, or physically hit the interaction partner. A submissive context was considered when the signaler leaned back with the body or head, moved away, or fled from the interaction partner. An affiliative context was considered when the signaler approached another individual without aggression (as defined previously) and remained in proximity, in relaxed body contact, or groomed either during or immediately after the facial behavior. In cases where the behavior of the signaler did not match our context definitions, or displayed behaviors belonging to multiple contexts, we labeled the social context as unclear. Social context was determined from the video itself and/or from the matching focal behavioral data, if available.” (lines 371-382). The total duration of all social interactions per social context, and thus the number of 500ms windows/rows, have been added to Table 1 (lines 395-397). There were several 500ms windows per social interaction. All 500ms time blocks per interaction were used in the statistical analyses in order to retain all the variation and complexity of the facial behavior (Action Unit combinations) used by the macaques (lines 403-405). Indeed the bootstrapping procedure was used to account for any biases in the data.

      Overall, I would recommend providing more information on the actual behaviour of the animals. The paper is strong in handling highly derived indices representing the behaviour, but the reader learns little about the animals' behaviour. Thus, it would be great if statements about the entropy ratio were translated into what these measures represent in real life. For context specificity, this is clear, but for entropy, not so much.

      A high entropy ratio essentially suggests that a species uses a high variety of unique facial behavior/signals and all signals in the repertoire are used roughly equally often (rather than one facial behavior being used 90% of the time and others rarely used). We have tried our best to better explain this point in the introduction (lines 75-81) and discussion (lines 215-222). Discussing exactly what these signals are and what they mean was beyond the scope of this paper.

      Line 106: nepotism, not kinship

      Changed as suggested (line 106).

      Line 113: I would avoid statements about how a monkey society is perceived by its members.

      We think that noting how individuals may perceive their social environment is worthwhile when defining social complexity, so have retained this point but changed the phrasing to be more speculative (lines 112-113).

      Line 329: I was very surprised that only one male was represented in the data for rhesus monkeys. The authors try to wriggle their way out of this issue in the supplementary material ("Therefore, we have no a priori reason to expect an overall difference in the diversity and complexity of facial behaviour between the sexes"), but I think this is a major shortcoming of the analysis. They should ascertain whether there are no sex differences in the other two species regarding their variables of interest. They could then make a very cautious case for there being no sex differences in rhesus either. But of course, they would not know for sure.

      As with our response to reviewer 2 above, we agree that it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences. Looking at sex differences in the use of facial behavior would be a worthwhile study on its own, but it is outside the scope of this paper.

      This paper would make a stronger contribution if it focussed on the comparative analysis of facial expressions and removed the attempt of testing the social complexity for communicative complexity hypothesis.

      A comparative analysis of the contextual use of specific facial movements is important. But this paper is focused on making a more general comparison of the communication style and complexity across species. The social complexity hypothesis for communicative complexity is one of the key theoretical frameworks for such an investigation and allows us to frame our study in a broader context. We contribute important data on 3 species with methods that can be replicated and extended to others species. Therefore, we believe that it is a worthy contribution to investigations of the evolution of complex communication.

      REFERENCES

      Micheletta, J., J. Whitehouse, L.A. Parr, and B.M. Waller. ‘Facial Expression Recognition in Crested Macaques (Macaca nigra)’. Animal Cognition 18 (2015): 985–90. https://doi.org/10/f7fvnh.

      Micheletta, Jérôme, Jamie Whitehouse, Lisa A. Parr, Paul Marshman, Antje Engelhardt, and Bridget M. Waller. ‘Familiar and Unfamiliar Face Recognition in Crested Macaques (Macaca nigra)’. Royal Society Open Science 2 (2015): 150109. https://doi.org/10/ggx9k9.

      Parr, L. A., and M. Heintz. ‘Facial Expression Recognition in Rhesus Monkeys, Macaca mulatta’. Animal Behaviour 77 (2009): 1507–13. https://doi.org/10/bbsp5n.

      Parr, L.A., M. Heintz, and G. Pradhan. ‘Rhesus Monkeys (Macaca mulatta) Lack Expertise in Face Processing’. Journal of Comparative Psychology 122 (2008): 390–402. https://doi.org/10/d7w6bv.

      Micheletta, J., and B.M. Waller. ‘Friendship Affects Gaze Following in a Tolerant Species of Macaque, Macaca nigra’. Animal Behaviour 83 (2012): 459–67. https://doi.org/10/c4f8n2.

      Thierry B. Where do we stand with the covariation framework in primate societies? Am. J. Biol. Anthropol. 128 (2021): 5–25. https://doi.org/10.1002/ajpa.24441

      Tomasello, M., J. Call, and B. Hare. ‘Five Primate Species Follow the Visual Gaze of Conspecifics’. Animal Behaviour 55 (1998): 1063–69. https://doi.org/10/bmq7xh.

      Teufel, C., A. Gutmann, R. Pirow, and J. Fischer. ‘Facial Expressions Modulate the Ontogenetic Trajectory of Gaze-Following among Monkeys’. Developmental Science 13 (2010): 913–22. https://doi.org/10/b6j5r7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) The authors need to validate that RAP1-HA still retains its essential function. As indicated above, if RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. These cells should also have the WT VSG expression pattern, and RAP1-HA should still interact with TRF.

      We demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 with HA without disrupting its nuclear localization (Yang et al. 2009, Cell), all of which indicate that the HA-tag does not affect protein function. As for the suggested experiment, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric repeats rather than direct protein-protein interactions.

      2) The authors need to remove the His6 tag from the recombinant RAP1 fragments before the EMSA analysis. This is essential to avoid any artifacts generated by the His6-tagged proteins.

      Our controls show that the His-tag is not interfering with RAP1-DNA binding. We show in Fig 3CG by EMSA and in Fig S5 by EMSA and microscale thermophoresis that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which demonstrates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G and Fig S5, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, the full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding. We have worked with many different His-tagged proteins for nucleic acid binding and enzymatic assays without any interference from the tag (Cestari and Stuart, 2013; JBC; Cestari et al; 2013, Mol Cell Biol; Cestari and Stuart, 2015, PNAS; Cestari et al. 2016; Cell Chem Biol; Cestari et al. 2019 Mol Biol Cell).

      3) More details need to be provided for ChIPseq and RNAseq analysis regarding the read numbers per sample, mapping quality, etc.

      Table S3 includes information on sequencing throughput and read length. Mapping quality was included in the Methods section “Computational analysis of RNA-seq and ChIP-seq”, starting at line 499. In summary, we filtered reads to keep primary alignment (eliminate supplementary and secondary alignments). We also analyzed ChIP-seq with MAPQ ≥20 (99% probability of correct alignment) to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq). We included Fig S4 to show the effect of filtering alignments on the active vs silent ESs. We used MAPQ ≥30 to analyze RNA-seq mapping to VSG genes, including those in subtelomeric regions. Our scripts are available at https://github.com/cestari-lab/lab_scripts. We also included in the Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”

      4) The authors should revise the Discussion section to clearly state the authors' speculations and their working models (the latter of which need solid supporting evidence). Specifically, statements in lines 218 - 219 and lines 224-226 need to be revised.

      The statement “likely due to RAP1 conformational changes” in line 228 discusses how binding of PI(3,4,5)3 could affect RAP1 Myb and MybL domains binding to DNA. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative. For lines 224-226 (now 234-235), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. To make it clear that it does not refer to telomeric ESs, we edited: “The finding of RAP1 binding to subtelomeric regions other than ESs, including centromeres, requires further validation.” Since RAP1 binding to centromeres is not the focus of the work, future studies are necessary to follow up, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers.

      Our model is based on the data presented here but also on scientific literature. We have reviewed the Discussion to prevent broad speculations. When discussing a model, we stated (line 245): “The scenario suggests a model in which …”, to state that this is a working model. Similarly, in Results (line 201) we included: “Our data suggest a model in which…”.

      5) The authors should revise the title to reflect a more reasonable conclusion of the study.

      We agree that the title should be changed to imply a direct role of PI(3,4,5)P3 regulation of RAP1, which is not captured in the original title. This will provide more specific information to the readers, especially those broadly interested in telomeric gene regulation and RAP1. The new title is: PI(3,4,5)P3 allosteric regulation of repressor activator protein 1 controls antigenic variation in trypanosomes

      6) The authors are recommended to provide an estimation of the expression level of the V5-tagged PIP5pase from the tubulin array in reference to the endogenous protein level.

      The relative mRNA levels of the exclusive expression of PIP5Pase mutant compared to the wildtype is available in the Data S1, RNA-seq. The Mut PIP5Pase allele’s relative expression level is 0.85fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared normalized RNA-seq counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We included a statement in the Methods, lines 275-278: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      7) The authors are recommended to provide more detailed EMSA conditions such as protein and substrate concentrations. Better quality EMSA gels are preferred.

      All concentrations were already provided in the Methods section. See line 356, in topic Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see lines 375-376 in topic Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show agreeable results (Fig 3 and 5, and Fig S5. Microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figures

      All figures should have their axes properly labeled and units should be indicated. For many of the ChIPseq datasets it is not clear whether the authors show a fold enrichment or RPM and whether they used all reads or only uniquely mapping reads. Especially the latter is a very important piece of information when analyzing expression sites and should always be reported. The authors write, that all RNA-seq and ChIP-seq experiments were performed in triplicate. What is shown in the figures, one of the replicates? Or the average?

      ChIP-seq is shown as fold enrichment; we clarified this in the figures by including in the y-axis RAP1-HA ChIP/Input (log 2). We included in figure legends, see line 710: “Data show fold-change comparing ChIP vs Input.”. For quantitative graphs (Fig 2B, D, and E, and Fig 5F and G), data are shown as the mean of biological replicates. Graphs generated in the integrated genome viewer (IGV, qualitative graphs) is a representative data (Fig 2A, C, and F, and Fig 5D-E). All statistical analyses were calculated from the three biological replicates. Uniquely mapped reads were used. We also included ChIP-seq analysis with MAPQ ≥10 and 20 (90% and 99% probability of correct alignment, respectively) to distinguish RAP1 binding to ESs. Fig S4 shows the various mapping stringency and demonstrates the enrichment of RAP1-HA to silent vs active ES.

      Figure 1 is very important for the main argument of the manuscript, but very difficult (impossible for me) to fully understand. It would be great if the author could make an effort to clarify the figure and improve the labels. Panel Fig 1E. Here it is impossible to read the names of the genes that are activated and therefore it is impossible to verify the statements made about the activation of VSGs and the switching.

      We have edited Fig 1E to include the most abundant VSGs, which decreased the amount of information in the graph and increased the label font. We also re-labeled each VSG with chromosome or ES name and common VSG name when known (e.g., VSG2). We included Table S1 in the supplementary information with the data used to generate Fig 1E. In Table S1, the reader will be able to check the VSG gene IDs and evaluate the data in detail. We included in the legend, line 700: “See Table S1 for data and gene IDs of VSGs.”

      Figure 1F: This panel is important and should be shown in more detail as it distinguishes VSG switching from a general VSG de-repression phenotype. VSG-seq is performed in a clonal manner here after PIP5Pase KD and re-expression. To show that proper switching has occurred place in the different clones, instead of a persistent VSG de-repression, the expression level of more VSGs should be shown (e.g. as in panel E) to show that there is really only one VSG detected per clone. For example, it is not clear what the authors 'called' the dominant VSG gene.

      We showed in supplementary information Fig S1 B-C examples of reads mapping to the VSGs. Now we included a graph (Fig S1 D) that quantifies reads mapped to the VSG selected as expressed compared to other VSG genes considered not expressed). The data show an average of several clones analyzed. Other VSGs (not selected) are at the noise level (about 4 normalized counts) compared to >250 normalized counts to the selected as expressed VSGs.

      As mentioned in the public comments, I don't see how the data from Fig 1E and 1F fit together. Based on Fig 1E VSG2 is the dominant VSG, based on Fig 1F VSG2 is almost never the dominant VSG, but the VSG from BES 12.

      In Fig 1E, the VSG2 predominates in cells expressing WT PIP5Pase, however, in cells expressing Mut PIP5Pase, this is not the case anymore. Many other VSGs are detected, and other VSG mRNAs are more abundant than VSG2 (see color intensity in the heat map). The Mut cells may also have remaining VSG2 mRNAs (from before switching) rather than continuous VSG2 expression. This is the reason we performed the clonal analysis shown in Fig 1F, to be certain about the switching. While Fig 1F shows potential switchers in the population, Fig 1E confirms VSG switching in clones.

      Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12.

      Note that in Fig 1F, tet + cells did not switch VSGs at all; all 118 clones expressed VSG2. We relabeled Fig 1F for clarity and included the VSG names. We added gene IDs in the Figure legends, see line 702 “ BES1_VSG2 (Tb427_000016000), BES12_VSG (Tb427_000008000)…”

      Statements in Introduction / Discussion

      The statement in lines 82/83 is very strong and gives the impression that the PIP5Pase-Rap1 circuit has been proven to regulate antigenic variation in the host. However, I don't think this is the case. The paper shows that the pathway can indeed turn expression sites on and off, but there is no evidence (yet) that this is what happens in the host and regulates antigenic variation during infection. The same goes for lines 214/215 in the discussion.

      We agree with the reviewer, and we edited these statements. The statement lines 82-83: “The data provide a molecular mechanism…” to “The data indicates a molecular mechanism…” For lines 224225: “and provides a mechanism to control…” to “and indicates a mechanism to control…”. We also included in lines 261-262: “It is unknown if a signaling system regulates antigenic variation in vivo.” Also edited lines 262-263: “…the data indicate that trypanosomes may have evolved a sophisticated mechanism to regulate antigenic variation...”.

      New vs old data

      In general, for Figures 1 - 4, it was a bit difficult to understand which panels showed new findings, and which panels confirmed previous findings (see below for specific examples). In the text and in the figure design, the new results should be clearly highlighted. Authors: All data presented is new, detailed below.

      Figure 1: A similar RNA-seq after PIP5Pase deletion was performed in citation 24. Perhaps the focus of this figure should be more on the (clone-specific) VSG-seq experiment after PIP5Pase re-introduction.

      This is the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG expression and switching, and repression of subtelomeric regions, is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. Hence, the relevance and difference of the RNA-seq here vs the previous RNA-seq of PIP5Pase knockdown.

      Figure 2: A similar ChIP-seq of RAP1 was performed in citation 24, with and without PIP5Pase deletion. Could new findings be highlighted more clearly?

      Our and others’ previous work showed ChIP-qPCR, which analyses specific loci. Here we performed ChIP-seq, which shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. To improve clarity in the manuscript, we edited lines 129-130: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      Figure 4: Binding of Rap1 to PI(3,4,5)P3, but not to other similar molecules, was previously shown in citation 24. Could new findings be highlighted more clearly?

      We published in reference 24 (Cestari et al. Mol Cell Biol) that RAP1-HA can bind agarose beadsconjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions. To clarify, we edited the manuscript in lines 180-182: “To determine if RAP1 binds to PI(3,4,5)P3 in vivo, we in-situ HA-tagged RAP1 in cells that express the WT or Mut PIP5Pase and analyzed endogenous PI(3,4,5)P3 levels associated with immunoprecipitated RAP1-HA”.

      Sequencing.<br /> I really appreciate the amount of detail the authors provide in the methods section. The authors do an excellent job of describing how different experiments were performed. However, it would be important that the authors also provide the basic statistics on the sequencing data. How many sequencing reads were generated per run (each replicate of the ChIP-seq and RNA-seq assays)? How long were the reads? How many reads could be aligned?

      The sequencing metrics for RNA-seq and ChIP-seq for all biological replicates were included in Table S3 (supplementary information). The details of the analysis and sequencing quality were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. To be clearer about the analysis, we also included in Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestari-lab/VSG-Bar-seq.”.

      Minor comments:

      Figure 1B: I would recommend highlighting the non-ES VSGs and housekeeping genes with two more colors in the volcano plot, to show that it is mostly the antigen repertoire that is deregulated, and not the Pol ll transcribed housekeeping genes. This is not entirely clear from the panel as it is right now.

      The suggestion was incorporated in Fig 1B. We color-coded the figure to include BES VSGs, MES VSGs, ESAGs, subtelomeric genes, core genes (typically Pol II and Pol III transcribed genes), and Unitig genes, those genes not assembled in the 427-2018 reference genome.

      Were the reads in Figure 2a filtered in the same way as those in Figure 2C? To support the statements, only unique reads should be used.

      Yes, we also added Fig S4 to make more clear the comparison between read mapping to silent vs active ES.

      It would be good if the authors could add a supplementary figure showing the RAP1 ChIP-seq (WT and cells lacking a functional PIP5Pase) for all silent expression sites.

      We had RAP1 ChIP-seq from cells expressing WT PIP5Pase already. We have it modified to include data from the Mutant PIP5Pase. See Fig S3 and S5.

      In Figure 5D, after depletion of PIP5Pase, RAP1 binding appears to decrease across ESAGs, but ESAG expression appears to increase. How can this be explained with the model of RAP1 repressing transcription?

      We included in the Results, lines 208-212: “The increased level of VSG and ESAG mRNAs detected in cells expressing Mut PIP5Pase (Fig 5D) may reflect increased Pol I transcription. It is possible that the low levels of RAP1-HA at the 50 bp repeats affect Pol I accessibility to the BES promoter; alternatively, RAP1 association to telomeric or 70 bp repeats may affect chromatin compaction or folding impairing VSG and ESAG genes transcription.”.

      Reviewer #3 (Recommendations For The Authors):

      Line 114 - typo? Procyclic instead of procyclics:

      Fixed, thanks.

      Line 233 - the phrasing here is confusing, may want to replace "whose" with "which" (if I am interpreting correctly):

      Thanks, no changes were needed. I have had the sentence reviewed by a Ph.D.-level scientific writer.

      Methods - there is no description of VSG-seq analysis in the methods. Is it done the same way as the RNA-seq analysis? Is the code for analysis/generating figures available online?

      The procedure is similar. We included an explanation in Methods, lines 503-504: “RNA-seq and VSG-seq (including clonal VSG-seq) mapped reads were quantified…”. Also, in lines 522-54: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”.

      Fig 1H - Is this from RNA-seq or VSG-seq analysis of procyclics?

      The procyclic forms VSG expression analysis was done by real-time PCR. To clarify it, we included it in the legend “Expression analysis of ES VSG genes after knockdown of PIP5Pase in procyclic forms by real-time PCR”. We also amended the Methods, under the topic RNA-seq and real-time PCR, line 402-407: “For procyclic forms, total RNAs were extracted from 5.0x108 T. brucei CN PIP5Pase growing in Tet + (0.5 µg/mL, no knockdown) or Tet – (knockdown) at 5h, 11h, 24h, 48h, and 72h using TRIzol (Thermo Fisher Scientific) according to manufacturer's instructions. The isolated mRNA samples were used to synthesize cDNA using ProtoScript II Reverse Transcriptase (New England Biolabs) according to the manufacturer's instructions. Real-time PCRs were performed using VSG primers as previously described (23).”

      Fig 2 A - Where it says "downstream VSG genes" I assume "downstream of VSG genes" is meant? the regions described in this figure might be more clearly laid out in the text or the legend

      Fixed, thanks. We included in the text in Results, line 140: “… and Ts and G/Ts rich sequences downstream of VSG genes”.

      Fig 2E - what does "Flanking VSGs" mean in this context?

      We added to line 705, figure legends: “Flanking VSGs, DNA sequences upstream or downstream of VSG genes in MESs. “

      Fig 2H - Why is the PIP5Pase Mutant excluded from the Chr_1 core visualization?

      We did not notice it. We included it now; thanks.

    1. We were not many days in the merchant’s custody before we were sold after their usual manner, which is this:—On a signal given, (as the beat of a drum), the buyers rush at once into the yard where the slaves are confined, and make choice of that parcel they like best. The noise and clamour with which this is attended, and the eagerness visible in the countenances of the buyers, serve not a little to increase the apprehension of the terrified Africans, who may well be supposed to consider them as the ministers of that destruction to which they think themselves devoted. In this manner, without scruple, are relations and friends separated, most of them never to see each other again. I remember in the vessel in which I was brought over, in the men’s apartment, there were several brothers who, in the sale, were sold in different lots; and it was very moving on this occasion to see and hear their cries at parting.

      This shows how separation from families is common because many don't see each other again which is horrible. He was talking about his personal experience about being sold into different places and suffering emotionally. It's sad and I wish to read and find out about the well-being of his sister and makes me question that did he ever got to see her again like he did last time.

    2. The first object which saluted my eyes when I arrived on the coast was the sea, and a slave-ship, which was then riding at anchor, and waiting for its cargo. These filled me with astonishment, which was soon converted into terror, which I am yet at a loss to describe, nor the then feelings of my mind. When I was carried on board I was immediately handled, and tossed up, to see if I were sound, by some of the crew; and I was now persuaded that I had got into a world of bad spirits, and that they were going to kill me. Their complexions too differing so much from ours, their long hair, and the language they spoke, which was very different from any I had ever heard, united to confirm me in this belief. Indeed, such were the horrors of my views and fears at the moment, that, if ten thousand worlds had been my own, I would have freely parted with them all to have exchanged my condition with that of the meanest slave in my own country. When I looked round the ship too, and saw a large furnace or copper boiling, and a multitude of black people of every description changed together, every one of their countenances expressing dejection and sorrow, I no longer doubted my fate, and, quite overpowered with horror and anguish, I fell motionless on the deck and fainted. When I recovered a little, I found some black people about me, who I believed were some of those who brought me on board, and had been receiving their pay; they talked to me in order to cheer me, but all in vain. I asked them if we were not to be eaten by those white men with horrible looks, red faces, and long hair? They told me I was not; and one of the crew brought me a small portion of spiritous liqour in a wine glass; but, being afraid of him, I would not take it out of his hand. One of the blacks therefore took it from him and gave it to me, and I took a little down my palate, which, instead of reviving me, as they thought it would, threw me into the greatest consternation at the strange feeling it produced having never tasted any such liquor before. Soon after this, the blacks who brought me on board went off, and left me abandoned to despair. I now saw myself deprived of all chance of returning to my native country, or even the least glimpse of hope of aining the shore, which I now considered as friendly: and even wished for my former slavery, in preference to my present situation, which was filled with horrors of every kind, still heightened by my ignorance of what I was to undergo. I was not long suffered to indulge my grief; I was soon put down under the decks, and there I received such a salutation in my nostrils as I had never experienced in my life; so that with the loathsomeness of the stench, and crying together, I became so sick and low that I was not able to eat, nor had I the least desire to taste any thing. I now wished for the last friend, Death, to relieve me; but soon, to my grief, two of the white men offered me eatables; and, on my refusing to eat, one of them held me fast by the hands, and laid me across, I think, the windlass, and tied my feet, while the other flogged me severely. I had never experienced any thing of this kind before; and although not being used to the water, I naturally feared that element the first time I saw it; yet, nevertheless, could I have got over the nettings, I would have jumped over the side; but I could not; and, besides, the crew used to watch us very closely who were not chained down to the decks, lest we should leap into the water; and I have seen some of these poor African prisoners, most severely cut for attempting to do so, and hourly whipped for not eating. This indeed was often the case with myself. In a little time after, amongst the poor chained men, I found some of my own nation, which in a small degree gave ease to my mind. I inquired of them what was to be done with us? they give me to understand we were to be carried to these white people’s country to work for them. I then was a little revived, and thought, if it were no worse than working, my situation was not so desperate: but still I feared I should be put to death, the white people looked and acted, as I thought, in so savage a manner; for I had never seen among any people such instances of brutal cruelty; and this not only shewn towards us blacks, but also to some of the whites themselves. One white man in particular I saw, when we were permitted to be on deck, flogged so unmercifully, with a large rope near the foremast, that he died in consequence of it; and they tossed him over the side as they would have done a brute. This made me fear these people the more; and I expected nothing less than to be treated in the same manner. I could not help expressing my fears and apprehensions to some of my countrymen: I asked them if these people had no country, but lived in this hollow place the ship? they told me they did not, but came from a distant one. ‘Then,’ said I, ‘how comes it in all our country we never heard of them?’ They told me, because they lived so very far off. I then asked, where were their women? had they any like themselves! I was told they had: ‘Ande why,’ said I, ‘do we not see them?’ they answered, because they were left behind. I asked how the vessel could go? they told me they could not tell; but that there were cloth put upon the mastsby the help of the ropes I saw, and then the vessel went on; and the white men had some spell or magic they put in the water when they liked in order to stop the vessel. I was exceedingly amazed at this account, and really thought they were spirits. I therefore wished much to be from amongst them, for I expected they would sacrifice me: but my wishes were vain; for we were so quartered that it was impossible for any of us to make our escape. While we staid on the coast I was mostly on deck; and one day, to my great astonishment, I saw one of these vessels coming in with the sails up. As soon as the whites saw it, they gave a great shout, at which we were amazed; and the more so as the vessel appeared larger by approaching nearer. At last she came to anchor in my sight, and when the anchor was let go, I and my countrymen who saw it were lost in astonishment to observe the vessel stop; and were now convinced it was done by magic. Soon after this the other ship got her boats out, and they came on board of us, and the people of both ships seemed very glad to see each other. Several of the strangers also shook hands with us black people, and made motions with their hands, signifying, I suppose, we were to go to their country; but we did not understand them. At last, when the ship we were in had got in all her cargo they made ready with many fearful noises, and we were all put under deck, so that we could not see how they managed the vessel. But this disappointment was the least of my sorrow. The stench of the hold while we were on the coast was so intolerably loathsome, that it was dangerous to remain there for any time, and some of us had been permitted to stay on the deck for the fresh air; but now that the whole ship’s cargo were confined together, it became absolutely pestilential. The closeness of the place, and the heat of the climate, added to the number in the ship, which was so crouded that each had scarcely room to turn himself, almost suffocated us. This produced copious perspiration, from a variety of loathsome smells, and brought on a sickness amongst the slaves, of which many died, thus falling victims to the improvident avarice, as I may call it, of their purchasers. This wretched situation was again aggravated by the galling of the chains, now become insupportable; and the filth of the necessary tubs, into which the children often fell, and were almost suffocated. The shrieks of the women, and the groans of the dying, rendered the whole a scene of horror almost inconceiveable. Happily perhaps for myself I was soon reduced so low here that it was thought necessary to keep me almost always on deck; and from my extreme youth I was not put in fetters. In this situation I expected every hour to share the fate of my companions, some of whom were almost daily brought upon deck at the point of death, which I began to hope would soon put an end to my miseries. Often did I think many of the inhabitants of the deep much more happy than myself; I envied them the freedom they enjoyed, and as often wished I could change my condition for theirs. Every circumstance I met with served only to render my state more painful, and heighten my apprehensions and my opinion of the cruelty of the whites. One day they had taken a number of fishes; and when they had killed and satisfied themselves with as many as they thought fit, to our astonishment who were on deck, rather than give any of them to us to eat, as we expected, they tossed the remaining fish into the sea again, although we begged and prayed for some as well as we could, but in vain; and some of my countrymen, being pressed by hunger, took an opportunity, when they thought no one saw them, of trying to get a little privately; but they were discovered, and the attempt procured them some very severe floggings.

      To me, this is part is kind of dark. He was on the slave ship. He saw the men who are being tortured and chained up on the ship. The men were so cruel to them. Thats sad that he had to witness about that. It's heartbreaking that at first, he got kidnapped, his sister got kidnapped, he got departed from his sister, sold into slavery, sold into many different locations, then have to witness the men who are being tortured on the ship. This is very dark and traumatizing. And also, he still feeling hopeless of the desire of returning back home and also reunite with his sister again. It makes me question that would he ever reunite with his sister again.

    1. Students can simultaneouslybe-come both "unstuck"(distanced from the ways they have always thought, nolonger so complicit with oppression) and "stuck" (intellectually paralyzed sothat they need to work through feelings and thoughts before moving on withthe more "academic"part of a lesson). Though paradoxicaland in some waystraumatic,this condition should be expected: by teaching studentsthat the veryways in which we think and do things can be oppressive, teachers should expecttheir students to get upset

      From my readings in EDSCI so far, this is the first time I have seen someone address the heaviness that may come with being a transformative learner. Many of our biases and student's biases as well as oppressive ideologies might be the only way they have learned. The word trauma exemplifies the impact of unlearning things that perhaps have been the building blocks of your identity. I think about my 7th graders, primarily Latinx, primarily Christian, primarily male, and primarily from low income households surrounding our school;they might experience this trauma when presented with ideas that deviate from what they hold to be truth. However, their truth has been their reality, and rather than negate them, they have to be part of the conversation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments. Here we provide a point-by-point response to their reviews. All additional experiments that are present in the revised manuscript, or that we plan to include in the final manuscript, are numbered.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The concept introduced by this paper is exciting and novel. However, the current paucity of presented data can lead to incorrect interpretations of the findings and speculations that might not hold true after a more rigorous assessment of the observed phenomenon.

      The premise of this study builds upon an interaction between the PAXT complex and nuclear YTH domain containing proteins. However, figures 1B and C should be improved. The interacting band for the ZFC3H1 presented in panel B does not seem to match the size of the construct used in panel C. Is the Flag version of ZFC3H1 expressing a smaller isoform for this protein? __

      The reviewer is correct in that endogenous ZFC3H1 (which migrates at 250kD with a minor band at 150kD, see Figure 1B in the initial manuscript) appears to differ from the FLAG-tagged construct as expressed from a plasmid transfected in HEK293 cells (which migrates as two bands at 180kD and 200kD, see Figure 1C in the initial manuscript). For the endogenous protein, the predicted molecular weight is 226kD and the 250kD band disappears when cells are transduced with lentivirus containing shRNAs against ZFC3H1 (see Figure 4A in the initial manuscript), indicating that it is the correct protein. Both the 250kD endogenous protein (*) and the 200kD overexpressed protein (**) in transfected HEK293 and U2OS cells are detected in immunoblots using anti-ZFC3H1 antibodies (see Figure 1 in this document) indicating that the over-expressed protein is indeed ZFC3H1.

      [ Figure 1]

      _Figure 1. Molecular Weight Size Comparison of Endogenous ZFC3H1 and FLAG-ZFC3H1 (1-1233). _Lysates from HEK293 and U2OS that were either untransfected or transfected with FLAG-ZFC3H1 (1-1233) plasmid. We labelled the bands corresponding to the endogenous ZFC3H1 “*” and FLAG-ZFC3H1 (1-1233) “**”.

      We have sequenced the plasmid, and discovered that it contains an additional sequence inserted within the middle of the ZFC3H1 cDNA with a premature stop codon. As such, the version of the protein that is expressed from the plasmid only contains amino acids 1-1233 of the endogenous protein and is missing amino acids 1234-1989. The deleted region only contains TPR repeats, and is not known to interact with any of the well characterized interactors of ZFC3H1 (Wang, Nuc Acid Res 2021, Figure 3). We have renamed this construct FLAG-ZFC3H1 (1-1233). Given these new considerations, our results are consistent with the idea that the N-terminal portion of ZFC3H1 interacts with U1-70K, YTHDC1 and YTHDC2. We will change the text to reflect this.

      We are currently in the process of deleting the small insertion to obtain a plasmid that encodes a full length version of human ZFC3H1. For the final manuscript:

      Experiment #1) We will repeat the co-immunoprecipitation experiment with the full length FLAG-ZFC3H1 to determine whether it interacts with YTHDC1 and YTHDC2. This will take a few weeks.

      __Also, the YTHDC1-2 interaction in panel C is not as convincing considering the negative controls lane show some degree of binding. __

      Although the reviewer is correct that there is substantial background binding in the YTHDC1 immunoblot, we disagree with their characterization of the results with the YTHDC2 immunoblot (see Figure 1B-C in the initial manuscript). In the new manuscript we have included:

      Experiment #2) A new co-immunoprecipitate of the FLAG-tagged ZFC3H1 (1-1233) from HEK293 cells under more stringent conditions where the background level of YTHDC1 binding to beads is negligible. We have already completed this experiment (see Figure 1D in the revised manuscript).

      __Additionally, can the authors test if their RNaseA treatment worked? __

      In the new manuscript we have included:

      Experiment #3) A new co-immunoprecipitate of FLAG-YTHDC1 immunoblotted for eIF4AIII from HEK293 cell lysates. We find that without RNAse, there is some eIF4AIII in the precipitates but that the levels diminish substantially after RNAse A/T1 treatment. We have already completed this experiment (see Figure 1B in the preliminary revised manuscript).

      __Why do you need 18 hours to observe the nuclear export of your modifiable construct when inhibiting METTL3 in figure 3? Is it possible that your observation is secondary to phenotypes these cells develop as a result of blocking METTL3? __

      We treated cells for this period of time so that during the expression of the reporter, all of the newly synthesized mRNA is expressed in the absence of m6A methyl transferase activity. For shorter treatment times, it is unclear whether the bulk of the reporter mRNA, which would be synthesized before the treatment, would lose any pre-existing m6A marks, making a negative result hard to interpret. Previously we found that although 50% of intronic polyadenylated (IPA) transcripts from our reporters are rapidly degraded, about 50% are stable and are nuclear retained over extended periods of time (see Lee at al., PLOS ONE 2015; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122743 Figure 3B-G). We believe that the bulk of the reporter mRNA that we are visualizing is stable and accumulates in the nucleus. Given that METTL3-depletion inhibits nuclear retention and that versions of the IPA reporter that lack m6A modification motifs are exported, we think that the most likely interpretation of the 18 hour STM2457 treatment experiments is that the lack of methyltransferase activity had a direct effect, rather than an indirect effect, on nuclear retention. We would be open to performing more experiments if the editors insist, however we ordered STM2457 four weeks ago and it has yet to arrive from Sigma-Millipore. Performing this experiment may substantially delay our ability to resubmit the manuscript in a timely manner.

      __Is ALKBH5 nuclear and/or cytoplasmic in the cell system used? __

      According to The Human Protein Atlas, ALKBH5 is predominantly nuclear in U2OS cells, with some present in the cytoplasm (https://www.proteinatlas.org/ENSG00000091542-ALKBH5/subcellular#human).

      In the revised manuscript we have included:

      Experiment #4) Data from subcellular fractionation demonstrating that ALKBH5 is present in both the nucleus and cytoplasm that we have already performed (see Figure 4J in the preliminary revised manuscript).

      __Reviewer #1 (Significance (Required)):

      The study is highly significant __

      ------

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.__

      We thank the reviewer for pointing this out. We have included a few sentences in the new preliminary revised manuscript pointing out that the location of the m6A modification in IPA transcripts, with respect to intact 5’SS and poly(A) signals, may play a role in promoting nuclear retention.

      __The current work builds on previous findings from these authors identifying factors critical for retention of intronic polyadenylated (IPA) transcripts. The present study identified m6A modification as one of the signals for the retention of such transcripts. The authors use reporters for their analysis and also examine validated endogenous IPA transcripts. The data presented supports the conclusions albeit they show a surprising finding for one of the m6A erasers, ALKBH5. However, there is some controversy over the mechanism by which ALKBH5 functions and whether the m6A mark is truly reversible, so these results may continue to add to this point of view.

      Major Comments: One experiment that might add to the argument would be overexpression of Mettl3 as compared to catalytically inactive Mettl3. The prediction would be that the reporter transcript with intact DRACH sequences would be even more retained in the nucleus in a manner that depends on Mettl3 catalytic activity. For some of the data presented, the reporter is already wholly nuclear so no difference could be detected, but in the U2OS cells shown in Figure 2B, it appears that an increase in nuclear localization might be evident. Such an experiment would add an orthogonal approach to demonstrate that the methylation by Mettl3 is required for retention. If such an experiment would work with the endogenous IPA transcripts shown in Figure 4, but these transcripts may already be too nuclear to detect any increase in nuclear retention.

      __

      We have performed two experiments that try to address this but they gave negative results:

      Experiment #5) We have over-expressed wildtype and a methyl transferase mutant FLAG-METTL3 and assessed the nuclear export/retention of ftz-Δi-5’SS mRNA. There was no effect (see Figure 2 in this document).

      [Figure 2]

      __Figure 2. Over-expression of METTL3 does not increase the nuclear retention of ftz-Δi-5’SS. __U2OS cells were co-transfected with ftz-Δi-5’SS reporter and either FLAG-METTL3 or FLAG-METTL3-D395A, which lacks methyl-transferase activity (Wang, Mol Cell 2016). Cells were fixed, stained for ftz mRNA by fluorescent in situ hybridization and METTL3 using anti-FLAG antibodies. The nuclear and cytoplasmic distribution of ftz mRNA was quantified as described in the manuscript. Note that this is the average of one independent experiment (each bar consisting of the average of at least 50 cells). We plan to repeat this two more times, but we anticipate that these will show the same result.

      We could include this negative data as a supplemental figure. We believe that there are two possible reasons for this experimental result. First, as the reviewer points out, the reporter transcripts are already too nuclear to detect any significant change. Second, METTL3 is part of a larger complex that includes several proteins including METTL14, WTAP and potentially other proteins (for example see Covelo-Molares, Nuc Acid Res 2021). We may need to co-express all of these proteins to see an effect.

      Experiment #6) We have also expressed versions of ftz-Δi and ftz-Δi-5’SS mRNA with optimized m6A modification (i.e. DRACH) motifs (AGACT) to enhance methylation (“e-m6A-ftz”). We only observed a slight increase in nuclear retention but it is not significant (see Figure 2A,C in the revised manuscript).

      Again, this result could be explained by the fact that the reporter is too nuclear to detect any significant increase in retention. We had originally performed this in parallel with the no-m6A-ftz-Δi-5’SS reporters but did not report this negative data in the original manuscript.

      __Some rather minor changes to the presentation of the data could enhance the impact of this study.

      Specific Comments:

      The primary question in this manuscript is comparing reporters with m6A site (intact DRACH sequences) to those without. For this reason, organizing the data to the +/- DRACH sites are adjacent to one another might make the most sense. This point is evident in Figure 1C where perhaps simply changing the order of the bars presented to put the ones directly compared adjacent would be preferable. Then the p-value would compare sets of data directly adjacent to one another. __

      We thank the reviewer for this suggestion and we have made these changes to the figures in the preliminary revised manuscript.

      __While the authors show representative fields/cells for most assays, they do an excellent job of providing quantitation as well. One exception is Figure 3D, which shows a single cell image for the most key panel (the 5'SS-containing reporter upon Mettl3 depletion). If there is not a field with more cells, the authors could create a montage. __

      In the revised manuscript, we have replaced this image with one containing multiple cells expressing the reporter.

      __Minor Comments:

      Figure presentation:

      The text in a number of the figures is VERY small (Figures 1B,1C, and 4A) for example. __

      We have fixed this in the new manuscript.

      __Figure 3A includes the label "shRNA:" at the top, but these cells are treated with Mettl3 inhibitor and there does not appear to be any shRNA employed, so this seems like a labeling error. __

      We have fixed this in the new manuscript.

      __In Figure 3C, the immunoblot of Mettl3, there are three bands that all disappear completely upon knockdown of Mettl3- are these all Mettl3? This should at least be mentioned in the legend and perhaps indicated in the figure. The authors do mention in the text employing shRNAs to target multiple Mettl3 isoforms, so likely this is the case. __

      We have clarified these issues in the new manuscript.

      __Minor points (some really minor to just polish the presentation for clarity):

      The word "since" should only be used if there is a time element- otherwise the word "as" is preferable.

      For example on p. 4, the sentence: "Since inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),.." would more precisely read "As inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),..". __

      We thank the reviewer for pointing this out. We have fixed this issue in the revised manuscript.

      __Reviewer #2 (Significance (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.

      This study would be of significant interest to those that study gene expression in any context as well as cell biologists as the data add to our understanding of export of mRNA from the nucleus. This work also adds to our understanding of the biological consequences of m6A modification, which is an area of significant interest. In my opinion, the authors could make a broader conclusion that we do, which is that the location of the modification significantly dictates function- an extension of previous findings mostly focused on processed mRNA transcripts. __

      -------

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Quality control of mRNA is vital for all types of cells. In eukaryotic cells, nuclear export of misprocessed mRNAs containing the 5' splice site is prevented. In this manuscript, Lee and colleagues demonstrate that the nuclear retention of intronic polyadenylated transcripts is dependent on m6A modification. Based on the results shown in yeast, they perform immunoprecipitation experiments and demonstrate the interaction between ZFC3H1, a component of the PAXT complex, and YTHDC1 and YTHDC 2, nuclear YTH RNA-binding proteins that recognize m6A-modified transcripts. The study also shows the interaction of U1-70K with YTHDC1 and with ZFC3H1. Depletion of YTHDC1/2 prevents the nuclear retention of IPA transcripts. Additionally, CLIP-seq analysis is performed, demonstrating that m6A modification is enriched around the 5' splice site motif and the 3' polyadenylation site in IPAs. From these observations, they conclude that m6A modification contributes to the quality control of mRNA by promoting nuclear retention of misprocessed transcripts.

      Major Points 1. The interaction between ZFC3H1 and YTHDC1 is clearly shown by immunoprecipitation of FLAG-tagged YTHDC1 in Figure 1B. However, the co-purification of YTHDC1 with FLAG-tagged ZFC3H1 in Figure 1C is rather ambiguous. Additionally, the immunoprecipitated samples do not appear to show signals corresponding to FLAG-tagged ZFC3H1, making it unclear if the immunoprecipitation is working. It is essential to provide a better quality result to clarify these observations. __

      Please see our responses to reviewer #1. We have repeated the co-immunoprecipitation of FLAG-ZFC3H1 (1-1233) with YTHDC1 under more stringent conditions and have reduced the background binding (see Figure 1B and D in the new manuscript). We have also determined why the FLAG-ZFC3H1 is smaller than expected as the construct contains a premature stop codon. As explained above, we are in the midst of generating a full-length FLAG-ZFC3H1 and we plan to repeat the co-immunoprecipitation with this new construct.

      2. While the authors demonstrate that the m6A modification is dispensable for the targeting of IPA reporter transcripts to the nuclear speckles, it would be valuable to investigate whether m6A is required for their exit from the nuclear speckles. Do reporter transcripts with m6A motifs remain in the nuclear speckles at later time points?

      We have now analyzed the colocalization of nuclear speckles (SC35) with ftz-Δi-5’SS, which contains both a 5’SS and DRACH motifs, and no-m6A-ftz-Δi-5’SS, which contains a 5’SS but lacks DRACH motifs, at steady state – i.e. after 18-24 hours of transfection (as opposed to at early time points as shown in Figure 2D-E of the initial manuscript). Unexpectedly, we see that both mRNAs continue to colocalize with nuclear speckles, although the no-m6A-ftz-Δi-5’SS mRNA is well exported from the nucleus and its signal in nuclear speckles is faint (see Figure 2F-H in the new manuscript).

      Previously, we observed that ftz-Δi-5’SS required the 5’SS motif to remain in nuclear speckles at these later time points (Lee PLOS ONE 2015 and Lee RNA 2022). Upon closer inspection, ftz-Δi-5’SS mRNA also accumulates in additional nuclear foci that are not SC35-positive. Our new results may indicate that m6A marks promote the transfer of mRNAs from nuclear speckles to other foci, but more data is required to make a firm statement. Given this, we plan to conduct further experiments which may take a month to complete:

      Experiment #7) We are now assessing whether these additional ftz-Δi-5’SS foci correspond to either YTHDC-positive foci which were previously shown to partially overlap nuclear speckles and sequester m6A-rich mRNAs (Cheng Cancer Cell 2022), or “pA+ RNA foci” which accumulate MTR4/ZFC3H1-targetted RNAs when the nuclear exosome is inhibited (Silla Cell Reports 2018). These foci are enriched in ZFC3H1. We plan on co-staining ftz-Δi, ftz-Δi-5’SS, no-m6A-ftz-Δi and no-m6A-ftz-Δi-5’SS with SC35, YTHDC1 and ZFC3H1 to determine whether m6A may help to transfer mRNAs from nuclear speckles to YTHDC1 or ZFC3H1-enriched foci.

      __3. Figures 5B and 5C suggest that ZFC3H1 is required for the degradation of IPA transcripts. However, the range of the vertical axis is inappropriate and it is difficult to assess the extent of the increase in expression levels. Please adjust the vertical axis range for improved clarity. __

      We thank the reviewer for the feedback we have added additional graphs with an expanded vertical axis to demonstrate that ZFC3H1 is required for the degradation IPA transcripts.

      Minor Points 1. page 4, line 2 "RNAse" should be corrected to "RNase".

      We thank the reviewer for catching this error. We have fixed this.

      __ 2. page 7, line 5: Is the statement "prevents the nuclear export and decay of non-functional and misprocessed RNAs" correct? m6A modification promotes the decay of such RNAs. __

      We thank the reviewer for pointing this out. We have altered the text to clarify that m6A promotes decay.

      __3. Figure 2E: ftz-∆i should be ftz-∆i-5'SS. __

      We thank the reviewer for catching this error. We have fixed this.

      __4. Figure 5A: It would be helpful to indicate the number of IPA transcripts analyzed. __

      We have included this information.

      __Reviewer #3 (Significance (Required)):

      Overall, the work is sound and generally well-controlled. This study advances our understanding of the quality control of misprocessed transcripts in higher eukaryotes. This reviewer suggests a few points for clarification or improvement. __

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We would like to thank the editorial staff and the reviewers for their handling of our manuscript. We were very pleased with the timely communications from Review Commons, and we are grateful to have been assigned this insightful and constructive group of reviewers.

      The reviewers were well-suited to evaluate our work based on their stated areas of expertise (cancer biology, image analysis, machine learning, cell-based screening, etc.). As such, we received thoughtful and constructive feedback, which we have already incorporated into our attached revision. We are confident that these reviews have improved our manuscript.

      Our goal with this manuscript is to present a proof-of-concept study where high-content imaging and morphological profiling are used to characterize drug resistance in clonal cell lines. The main criticism from reviewers was that our original manuscript may have overstated our method’s ability to discriminate the signal of bortezomib resistance and that any extension beyond cultured cells (to patient samples for example) would require significant follow-up studies. The reviewers suggested that such work would be beyond the scope of our study, and recommended toning down our language to better reflect the limitations of this proof-of-concept work. We have embraced this suggestion, extensively revising our text, and we now believe our language and tone more accurately reflects our results. The reviewers also suggested follow-up computational analyses to more robustly characterize the bortezomib resistance signature. We have performed these analyses and added their description to our revised manuscript. We feel that these analyses have improved understanding of the signature, and will help a reader to gain a deeper understanding of our results and methodology.

      The reviewers also suggested several minor changes; many of which we embraced fully, but others that we chose not to incorporate. We felt that a lack of clarity in our text contributed to these reviewer suggestions. In these cases, we improved clarity in the text and responded to each comment point-by-point in the “prefer not to carry out” section. Further, we address all reviewer comments in the following document point-by-point, grouped by common themes across reviewers (e.g., tone, clarity, analyses, etc.).

      Lastly, a common theme among reviewer comments was their appreciation for our strong methodology and data transparency (examples pasted below). We are extremely gratified by this observation as we feel this is a particular strength of our manuscript. In addition, we were pleased to see reviewers engaged by our work, acknowledging the interest this manuscript is likely to generate among a broad range of scientific disciplines.

      Examples of reviewer appreciation of our strong methodology and data transparency:

      Reviewer 1: “However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.”

      Reviewer 2: “The paper is well written, and the text is clear, as is the presentation of data and transparency of methods being utilized. The methods were applied appropriately and followed established standards in the field. The paper's premise is timely and interesting, addressing a pressing issue in cancer therapy: making informed treatment decisions fast, based on markers found in tumors early in tumor development, and using image-based screening for characterizing drug resistance before treatment could be an option. A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data.”

      Reviewer 3: “The strengths of this study are the machine learning best practice and detailed methodology. The experiments could be reproduced and statistical analysis is more than adequate. The analysis takes into account batch effects, well position, differences in cell numbers, and other sources of technical variation that complicate high-content image analysis. It is a good exemplar of how unsupervised morphological profiling can be applied to imaging data. The major limitation is the generalizability of this particular method for patient samples. This could be addressed in the Discussion.”

      1. Description of the planned revisions

      We have incorporated all planned revisions.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Text revisions already carried out

      1. [Text revision] We have materially toned down our claims in the manuscript in two distinct areas: A) model performance and B) potential clinical application. A) Model performance. We specifically balanced our discussion of the discriminative signal of the Bortezomib Signature. While the signature adequately separated never-before-seen wildtype and resistant clones with metrics well above randomly permuted baselines (accuracy near 80%, average precision about 70%, area under the ROC curve (AUROC) about 84%), there were many limitations that we should have more explicitly highlighted. For example, many individual profiles were incorrectly classified, some clones were predicted entirely incorrectly, and many profiles did not receive Bortezomib Signature scores above the randomly permuted baseline. We have more clearly discussed these limitations and used more balanced language (see key examples of text-based changes below). Additionally, we modified a figure (now Figure 3) to include boxplots of clones that explicitly show the Bortezomib Signature scores of each well profile and permit examination of the strength of the signature for each clone (previously found in Figure 2-Supplement 9). Lastly, we add a new supplementary figure (now Figure 5-Supplement 1) that describes a feature space analysis of misclassified samples. Please note that this figure rearrangement and new analysis helped to balance our claims, but were also performed in response to other tangential reviewer comments. B) Clinical application. In the abstract, introduction, and discussion, we further emphasized that this work is a proof of concept, and that more advances must be made prior to clinical application.

      We made these changes in direct response to the following reviewer comments:

      Reviewer 1 - Major Comment 1 (relevant excerpts)

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak… With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript.

      Reviewer cross-commenting - Reviewer 1

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Reviewer cross-commenting - Reviewer 2

      I agree that, perhaps, my major criticism of the paper was the manuscript's 'overselling' of claims that were only weakly supported by the data. Yes, if the authors tune down their claims and clearly state that this is an interesting starting point and proof of concept study, it might be ok to publish with only minor revisions. If the claims should be more generalized, then this study needs more data supporting the conclusions and the method's predictive power.

      Reviewer 2 - Major Comment 8

      Lastly, I find some misfits between the question, the model used, and the conclusions drawn. The authors start by exploring the problem of bortezomib resistance in cancer treatment, which they say is a devastating issue for patients with, e.g., multiple myeloma. Yet, the authors use HCT116 as their model cell line, a microsatellite instable, colorectal cell line with several intrinsic mutations that make it a difficult model to address physiologically relevant medical problems after all. The authors then go on to suppose that their method might be suitable to diagnose resistance in patient samples, but I am not convinced this conclusion can be speculated based on data from HCT cells. I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.

      Reviewer 3 - Major Comment 5

      It is not clear from the Discussion whether this type of analysis is more broadly applicable to cell lines derived from patients, rather than selected from a parental cell line, or if this approach would be more efficient than genotyping or next-gen sequencing. How many replicates and ground truth cell lines would be necessary for predictive confidence?

      We edited the last two sentences of the abstract to tone down specificity claims (“provide evidence”) and clarify that we are establishing a “proof-of-concept framework”.

      • This signature predicted bortezomib resistance better than resistance to other drugs targeting the ubiquitin-proteasome system. Our results establish a proof-of-concept framework for the unbiased analysis of drug resistance using high-content microscopy of cancer cells, in the absence of drug treatment.

      We revised the last paragraph of the introduction to contrast bortezomib predictions with ixazomib/CB-5083 predictions, and to remove claims about “using microscopy to guide therapy”.

      • This morphological signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the signature training dataset. Overall, our results establish a proof-of-concept framework for identifying unbiased signatures of drug resistance using high-content microscopy. The ability to identify drug-resistant cells based on morphological features provides a valuable orthogonal method for characterizing resistance in the absence of drug treatment.

      To tone down claims in the figures, we added boxplots to Figure 3 (previous Figure 2) showing specific distribution of signature scores per well profile and updated Figure 4 legend (previous Figure 3).

      • Figure 4. Bortezomib Signature has limited ability to characterize clones resistant to other ubiquitin-proteasome system inhibitors.

      We modify the following text in the discussion to tone down claims of specificity and clinical utility:

      • This Bortezomib Signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the training dataset and was more specific to bortezomib-resistance given its limited ability to identify clones that were resistant to other UPS-targeting drugs.

      Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      1. [Text revision] We defined LD50 in text (originally line #97), changed description of resistant clone selection to remove main text references to LD90 (originally line #87), and stated drug concentrations used for selection in Methods. We also defined LD90 in the Methods and described its role in determining the drug concentrations to use for clone selection. This change was in response to the following comments:

      Reviewer 1 - Minor Comment 2

      What is LD90 (line #87)? LD50 (line #97)?

      Reviewer 2 - Minor Comment 5

      What was the LD 90 per drug on HCT cells? Rather than LD90 foldchanges, absolute concentrations should be used in the results and discussion to allow the reader to vet the conclusions.

      • To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib (proteasome inhibitor), ixazomib (proteasome inhibitor), or CB-5083 (p97 inhibitor) (Fig. 1-Supplement 1 A-D).
      • We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B).
      • Briefly, HCT116 cells were plated in 150 mm dishes and grown in the presence of the desired drug at a concentration that resulted in the death of the majority of cells (selection concentrations: bortezomib, 12 nM; ixazomib, 150 nM; CB-5083, 600 and 700 nM).
      • Using the data from our proliferation assays, we calculated the median lethal dose (LD50) for each of our drugs of interest by fitting data of normalized growth vs. log[drug concentration] to a sigmoidal dose-response curve using GraphPad Prism (v.9.2.0) (Fig. 1-Supplement 1 D).

      • [Text revision] We thank the reviewer for allowing us an opportunity to improve clarity on the clones we used. We now describe the total number of clones generated and removed unnecessary references to specific clones for ease of reading (originally lines #96-98) (We maintain all references to specific clones in the figures, legends, supplement, and methods)

      Reviewer 1 - Minor Comment 3

      It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.

      We added the following to the second paragraph of the results:

      • Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      [Text revision] We removed duplicate text (originally lines #115-125).

      Reviewer 1 - Minor Comment 5

      1. Lines #104-111 were duplicated in lines #114-122.

      Reviewer 3 - Minor Comment 4

      Ten lines of text are duplicated on page 5.

      Reviewer 2 - Minor Comment 4

      on page 5, paragraph 4, there is a sizeable copy-and-paste error of text being identically replicated.

      1. [Text revision] We provided more intuition of the Bortezomib Signature in the results section (originally lines #150-151).

      Reviewer 1 - Minor Comment 6

      The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.

      • We used these 45 features to compute a rank-based resistance score or “Bortezomib Signature” for each well profile based on the direction-sensitive method called singscore (Foroutan et al. 2018). Singscore ranks these 45 resistance-related features on a per sample basis and calculates a normalized score between -1 and 1, with higher values expected for bortezomib-resistant clones and lower values expected for bortezomib-sensitive clones.

      • [Text revision] We clarified that DNA sequencing had been performed solely on clones A and E in a previous study (originally lines #88-90). Furthermore, one of the strengths of our approach is that it can identify resistant clones in an unbiased fashion prior to molecular characterization. It is beyond scope to perform these sequencing studies in the present paper.

      Reviewer 2 - Minor Comment 3

      The authors talk about validating the mutation - PSMB5 by RNA-seq. However, the data for the genotyping/sequencing/characterization of these newly generated BZ-resistant lines are missing.<br />

      In the results, we clarify DNA sequencing that was previously performed on clones A and E

      • We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012).

      In the last paragraph of the discussion, we highlight the strength of our unbiased approach

      • Together, our work has demonstrated the potential for morphological profiling with Cell Painting to be used as an unbiased method to characterize resistance in the absence of drug treatment. Our results indicate that different mechanisms of bortezomib resistance may generate distinct morphological profiles; with larger and broader training datasets, it may be possible to identify signatures for distinct mechanisms of bortezomib resistance as well as signatures of resistance to other drugs. Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      • [Text revision] We thank the reviewers for their suggestions. We agree that the description of the experimental design was somewhat unclear and have provided greater detail and clarity, particularly regarding the generation of clones. We used the HCT116 parental cell line to generate drug-resistant clones by identifying single surviving cells after drug treatment and allowing these cells to expand prior to isolating colonies for experimentation. We did not perform experiments to confirm whether these “clones” were isogenic and can not exclude cell migration during expansion or genetic drift as convoluting factors. However, we have provided greater detail in the descriptions of our method for clone isolation in order to address this concern.

      Reviewer 1 - Minor Comment 1

      More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.

      Reviewer 2 - Minor Comment 6

      The description of the resistant clonal populations is confusing. As I understand, no single-cell clones were isolated during the selection procedure. Thus, the training lines are not yet isogenic clones but oligoclonal sub-populations of the parental cell line. The authors could provide more details here and discuss the different characteristics of their sub-populations, e.g., their growth kinetics or molecular alterations.

      We bolstered the description in the results.

      • We first isolated and characterized drug-resistant cells (Fig. 1 A). To isolate drug-resistant clones, we used an approach we have described previously (Wacker et al. 2012; Kasap, Elemento, and Kapoor 2014) and the HCT116 cell line. These cancer cells express multidrug resistance pumps at low levels and are mismatch repair deficient, providing a genetically heterogeneous polyclonal population of cells (Umar et al. 1994; Papadopoulos et al. 1994; Teraishi et al. 2005) allowing for isolation of drug-resistant clones in 2-3 weeks. We hypothesize that a rapid selection of resistance could favor the isolation of clones with intrinsic resistance. To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib, ixazomib, or CB-5083 (Fig. 1-Supplement 1 A-D). We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two published bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012). We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) for bortezomib were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B). In contrast, bortezomib-sensitive clones had LD50s for bortezomib that ranged from ~0.7- to ~1.2-fold that of HCT116 parental cells (Fig. 1-Supplement 2 A). Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      We also updated the legend for Figure 1A.

      • Figure 1. Experimental design for using Cell Painting to examine morphological profiles of drug-resistant cells. (A) Graphic of the experimental workflow: we isolated drug-resistant clones by treating parental HCT116 cells with a high dose of the desired drug and then expanded them for experiments. We isolated drug-sensitive clones by diluting HCT116 cells and then expanded them for experiments. We then performed proliferation assays on select clones to screen for multidrug resistance. Next, we performed Cell Painting on both drug-resistant and -sensitive clones, using multiplexed high-throughput fluorescence microscopy of fixed cells followed by feature extraction and morphological profiling to search for features that contribute to a signature of drug resistance.

      • [Text revision] We clarified that the Bortezomib Signature did not correspond to well position (originally lines #155-157).

      Reviewer 1 - Minor Comment 9

      Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".

      We found that the pattern of Bortezomib Signatures did not correspond to well position relative to the plate (Fig. 2-Supplement 7 B), indicating that the well position for each clone was not strongly contributing to its Bortezomib Signature.

      1. [Text revision] We explicitly described the result that some misclassified clones (WT10, WT15, and BZ06) did not have unexpected bortezomib sensitivity as determined by proliferation assays. We also moved the supplementary figure to an updated Figure 3 to better highlight this result (described below in “Figure revisions already carried out”). Lastly, we add a new figure (Figure 5-Supplement 1) to more explicitly analyze the misclassified lines (described below in “New analyses already carried out”).

      Reviewer 3 - Minor Comment 3

      The bortezomib sensitivity of the WT lines used in the last experiments was determined and did not seem to be greater than parental. This could be mentioned in the text; the figure raises the question and the answer is provided, but it's in the supplemental material.

      While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      1. [Text revision] We clarified that the metrics (accuracy and average precision) were based on median Bortezomib Signature scores of all replicate well-level profiles per clone. We can compare samples based on rank, and difference from 95% confidence interval of permuted data. There is no current way for our method to assign a likelihood. Also note that we have updated the discussion to discuss alternative metrics (see Reviewer 1 - Minor Comment 7) These are very important distinctions, and we are grateful to the reviewer for bringing them up.

      Reviewer 3 - Major Comment 3

      The study classifies cells as binary sensitive or resistant, but would results be improved by scoring based on likelihood of being resistant/sensitive?

      Reviewer 3 - Minor Comment 2

      It is not clear whether the accuracy was based on a percentage of replicates per cell line that were classified correctly or whether that was referring to classification of the cell line overall as sensitive/resistant.

      • We next examined whether the Bortezomib Signature was able to predict the bortezomib resistance of a clone based on morphological profiling data (Fig. 3 A-E and Fig. 3-Supplement 2 A and B). We called the clone bortezomib-resistant if the median Bortezomib Signature of all replicate well profiles was greater than zero and bortezomib-sensitive if the median Bortezomib Signature less than zero. In the training dataset, the Bortezomib Signature correctly predicted the bortezomib resistance of all ten clones, with median Bortezomib Signatures for eight out of ten clones beyond the 95% confidence interval for the randomly permuted data (Fig. 3 A). The accuracy of the Bortezomib Signature was 88% while the average precision was 81% for the training dataset (Fig. 3-Supplement 2 A and B) (see Methods). The signature performed similarly well in the validation dataset (Fig. 3 B), with an accuracy of 92% and an average precision of 89% (Fig. 3-Supplement 2 A and B). In the test dataset the Bortezomib Signature correctly predicted the bortezomib resistance of all clones, though only HCT116 parental cells had a median Bortezomib Signature outside the 95% confidence interval for the randomly permuted data (Fig. 3 C). The test dataset had an accuracy of 80% and an average precision of 68% (Fig. 3-Supplement 2 A and B). Similarly, in the holdout dataset the Bortezomib Signature had an accuracy of 78% and an average precision of 69% (Fig.3 -Supplement 2 A and B), and correctly predicted the bortezomib resistance of twelve out of thirteen clones, with WT01 misclassified as bortezomib-resistant (Fig. 3 D). In the holdout dataset, four of the twelve correctly characterized clones had median Bortezomib Signatures outside the 95% confidence interval for the randomly permuted data.

      We also mirrored language when discussing the ixazomib and CB-5083 results.

      • However, only two of the four correctly identified ixazomib-resistant clones and one of the three CB-5083-resistant clones had median Bortezomib Signatures outside the 95% confidence interval of the randomly permuted data. The area under the ROC (AUROC) curve for ixazomib-resistant and CB-5083-resistant clones (0.63 and 0.60, respectively) was lower than those calculated for the training, validation, test, and holdout datasets. In addition, many of the Bortezomib Signatures for well profiles of ixazomib- and CB-5083-resistant clones, particularly those for CB-5083-resistant clones, landed within the 95% confidence interval of the randomly permuted data. These results suggest that the Bortezomib Signature is not a general signature of UPS-targeting drug resistance and instead has some specificity for bortezomib.

      • [Text revision] We added an explicit note that our image analysis pipelines are also publicly available. Our reporting of our data processing pipelines are documented fully and well above standards in our field. Linking the publicly-available resources with these methods maximizes reproducibility.

      Reviewer 1 - Minor Comment 10

      Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      We include all image analysis pipelines at https://github.com/broadinstitute/profiling-resistance-mechanisms (G. Way et al. 2023).

      1. [Text revision] We have compared our approach to the on-disease/off-disease scores as introduced in (Heiser et al. 2020). We agree with the reviewer that a discussion of these two methods would help clarify our phenotypic signature concept. The on/off score is about the degree to which a perturbation pushes disease towards a healthy state. In this case we have 3 sets of data: healthy samples (used for training), disease samples (used for training), and the sample we want to score, which should be of the form "disease + perturbation". With our approach, based on singscore, we also have 3 sets of data: sensitive samples (used for training), resistance samples (used for training), and the sample we want to score. Here, our sample we want to score could be anything, not necessarily of the form "resistance + perturbation". Furthermore, singscore does not have the concept of orthogonality to resistance/sensitivity. This would become relevant if we were exploring perturbations or conditions that would induce a resistant cell line to become sensitive, but we are not doing that here. There are other statistical differences (projection vs. rank based etc.) but the key difference is the applicability of the method to the specific problem at hand.

      Reviewer 1 - Minor Comment 7

      How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.

      We added the following to the discussion.

      • The Bortezomib Signature is conceptually similar to the on-disease/off-disease score (Heiser et al. 2020). Both require three phenotypic measurements: a target phenotype representing ideal, a disease phenotype, and a new phenotype to classify. However, our approach is technically different (non-parametric compared to linear projection) and our goals are different (phenotypic classification compared to perturbation alignment). Other methods also enable phenotype labeling, but they focus on single-sample annotation without regard to a target phenotype (Wawer et al. 2014; Rohban et al. 2017; Simm et al. 2018; Nyffeler et al. 2020).

      Figure revisions already carried out

      1. [Figure revision] We moved all boxplots from the original Fig. 2-Supplement 9 to the main text (also splitting Fig. 2 into Fig. 2 and 3). From the original Figure 2, we moved the accuracy and average precision bar graphs to the supplement. We also note that this change increases transparency of the discriminative signal of our signature.

      Reviewer 1 - Minor Comment 8

      I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.

      1. [Figure revision] We adjusted the position of the legend in the accuracy and average precision bar graphs (originally Fig. 2 C and D, now Fig. 3-Supplement 2) for clarity. We also note that keeping the bar chart here is standard best practice (compared to a dot plot).

      Reviewer 1 - Minor Comment 4

      I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.

      1. [Figure revision] We changed the point distribution in the boxplots (from expanded to standard) to minimize overlap with the boxplot lines. We also updated the legend text to indicate that individual points in boxplots represent the Bortezomib Signature for well profiles. Note, we paste a representative example of this change above (new Figure 3).

      Reviewer 3 - Minor Comment 1

      I found the box plots somewhat difficult to interpret (especially where the WT lines had a lot of overlap with the red shaded area). Do the points in these charts correspond to replicate wells?

      We also update the figure legend.

      • Plots show values for individual well profiles (points), range (error bars), 25th and 75th percentiles (box boundaries), and median.

      • [Figure revision] [Response to Reviewer 2 - Major Comment 7] We thank the reviewer for allowing us an opportunity to clarify the mechanism. We feel that it is beyond scope of this manuscript to disentangle the molecular alterations that cause bortezomib resistance based on our Cell Painting insights. This wet lab experimental process is arduous and cost prohibitive, and we argue that one of the benefits of taking a morphology approach to resistance status is that we can detect resistant cells (and therefore cells that won’t die when presented with a treatment) without knowing the molecular mechanism.

      Nevertheless, the reviewer has encouraged us to enhance the ability for a reader to view and interpret the signature to perhaps more easily facilitate future work. Previously, we presented our signature in text form in Figure 2-Supplement 4 and in heatmap form in Figure 2-Supplement 5. Here, we add a new figure (Figure 2-Supplement 6; pasted below) which will improve interpretability.

      Reviewer 2 - Major Comment 7:

      Next to feature importance, the authors do not discuss (or I missed) what biology the features represent. Such the reader is left wondering what the actual mechanism of bortezomib resistance could be and if cell painting could shed light on the molecular alterations that cause the treatment resistance. While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      • Figure 2-Supplement 6. Bortezomib Signature visualized by CellProfiler features. Visualization of CellProfiler features contributing to the Bortezomib Signature. Features with high values (mean signature estimates) in resistant cells are purple while features with low values in resistant cells are green. The mean signature estimates were based on Tukey's Honestly Significant Difference test score and the number in each box represents the number of features used to calculate the mean signature estimate.

      Additionally, we add the following to the results section:

      • We then examined the grouping of features across compartments and channels and found radial distribution features were higher in resistant cells (Fig 2-Supplement 6).

      The code change to generate the signature visualization summary is available at: https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/131

      New analyses already carried out

      1. [New analysis] [Response to Reviewer 2 - Major Comment 5] We agree that a systematic analysis of feature selection methods will provide additional insights not already in the manuscript. Therefore, we have performed two new computational experiments to compare our linear modeling feature selection approach against other standard approaches. We demonstrate that our linear modeling approach is effective at isolating the core differences between resistant and sensitive classes.

      Specifically, we performed two analyses: A) UMAP and B) k-means cluster analysis. We analyzed profiles defined by four different feature selection approaches: 1) Using all traditional CellProfiler features; 2) Using the traditional CellProfiler feature selection approach (removing low variance features, high correlating features, etc.); 3) Using 45 random features (same size as Bortezomib Signature); and 4) Using only the bortezomib signature features. We performed Fisher’s exact tests to derive odds ratios of cluster membership by resistance status and calculated Silhouette widths to quantify relative proximity of clusters.

      This analysis generates a new supplementary figure (see below), and demonstrates that the linear-modeling-based feature selection isolated the features driving the differences between the clone types (resistance vs. wildtype) while the standard approaches do not as effectively separate.

      Reviewer 2 - Major Comment 5:

      A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data. Yet, I wondered why the authors do not discuss other means of feature selection or dimensionality reduction; further, they need to show how the features cluster the cell lines or why impact (information content) different features deliver. For an audience interested in the technical aspects of cell painting analysis and machine learning based on the data, that would, IMHO, be the most exciting questions.

      • Figure 3-Supplement 3. Benchmarking linear-modeling feature selection to separate clones by bortezomib resistance. Uniform Manifold Approximation and Projection (UMAP) analysis of the qualitative separability of (A) resistance status and (B) Bortezomib Signature scores across four different feature spaces. (C) k-means clustering from k=2 to k=14 of average odds ratio, maximum odds ratio (Fisher’s exact test), and Silhouette width using Bortezomib Signature features.

      Additionally, we add the following to the results section:

      • We then compared our linear-modeling approach to feature selection against other feature spaces and found that the Bortezomib Signature clusters same-type clones (bortezomib-resistant vs. bortezomib-sensitive) with higher enrichment compared to the full feature space, standard feature selection (see Methods), or a random selection of 45 features (Fig 3-Supplement 3).

      And methods section, describing this analysis:

      • We were also interested in comparing the ability of different feature spaces to cluster clones of the same type (resistant vs. sensitive). This analysis would determine if the Bortezomib Signature features, which we derived using linear modeling to isolate biological from technical variables, had a greater ability to cluster. We compared the Bortezomib Signature against three other feature spaces: 1) the full feature space, 2) standard feature selection (see Image data processing methods), and 3) 45 randomly selected features. We performed two analyses using these four feature spaces including Uniform Manifold Approximation and Projection (UMAP) (McInnes et al. 2018) and k-means clustering. For UMAP, we used default umap-learn parameters to identify two UMAP coordinates per feature space. We then visualized the clusters by their resistance status and Bortezomib Signature score. The UMAP analysis represents a qualitative analysis. Next, we applied k-means clustering with 25 initializations across a range of 2-14 clusters (k). Prior to clustering and for each feature space, we applied principal component analysis (PCA) and transformed each feature space into 30 principal components. This step was necessary to compare k-means clustering metrics, which are sensitive to the feature space dimensionality. We applied a Fisher’s exact test to each cluster using a two-by-two contingency matrix that specified cluster membership for each clone classification (resistant vs. sensitive). We visualized the mean odds ratio and max cluster odds ratio for each feature space across k. A high odds ratio tells us that the feature space effectively clusters clones of the same resistance status. Lastly, we calculated Silhouette width (the average proximity between samples in one cluster to the second nearest cluster) for each feature space across k.

      The code change to derive the UMAP coordinates, perform clustering, and generate the figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/132

      1. [New analysis] [Response to Reviewer 3 - Major Comment 1] We thank the reviewer for this suggestion, which allowed us to explore the misclassified samples in more depth. We added a new supplementary figure in which we summarized all bortezomib clones (wildtype and resistant) in their accuracy based on the bortezomib signature (panel A). We did not include training set samples in this analysis. Using samples that were consistently incorrectly classified with high confidence (three samples: WT15, BZ06, WT10) we performed two separate two-sample Kolmogorov–Smirnov (KS) tests. Specifically, we compared high incorrect wildtype to high correct wildtype and high incorrect resistant to high correct resistant. Our results indicate that most bortezomib signatures were significantly different between correct and incorrect assignments (panel B), and that the signature features varied between resistant and wildtype misclassification tests (panel C).

      Reviewer 3 - Major Comment 1:

      While the claims are largely substantiated, there are a few points where further consideration would improve the manuscript. Several cell lines were mis-classified with what appears to be a high degree of certainty. Can the authors tell what was driving those predictions? Was there something in the morphological signature that weighed more heavily in those cases?

      • Figure 5-Supplement 1. Examining the accuracy of clone classification and misclassification of clones. (A) Proportion of high-confidence correct, low-confidence correct, low-confidence incorrect, and high-confidence incorrect predictions of well profiles across clones in the test, holdout, and validation sets. High-confidence predictions (high) had a Bortezomib Signatures greater (resistant clones) or less than (sensitive) the 95% confidence interval of randomly permuted data while low-confidence predictions (low) had Bortezomib Signatures within the 95% confidence interval of randomly permuted data. (B) Visualization of Kolmogorov-Smirnov (KS) test statistic means of feature groups across channels and cellular compartments. (C) Plot of the KS test statistic means for feature groups in bortezomib-resistant vs. -sensitive cells. Each feature group is color coded by the imaging channel.

      Additionally, we add the following to the results section:

      • While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      And methods section, describing this analysis:

      Some profiles were consistently predicted incorrectly with high confidence but in the opposite direction (see Figure 5-Supplement 1). For a well-level profile to be categorized as high-confidence (in either the correct or incorrect directions), it needed to score beyond the 95% confidence interval of the randomly permuted data range. For example, a high-confidence incorrect resistant profile would have a Bortezomib Signature below 95% confidence interval of the randomly permuted data. To evaluate the features driving the differences in these samples, we applied two-sample Kolmogorov–Smirnov (KS) tests per Bortezomib Signature feature. We applied these tests to two separate groups: 1) misclassified bortezomib-sensitive vs. high-confidence accurate bortezomib-sensitive and 2) misclassified bortezomib-resistant vs. high-confidence accurate bortezomib-resistant.

      The code change to generate the UMAP coordinates and figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/130

      Description of analyses that authors prefer not to carry out

      1. [Response to Reviewer 2 - Minor Comments 1 and 2]: These are interesting suggestions! Still, we prefer not to speculate on the biological mechanism of the Bortezomib signature. Connecting morphological features identified as contributing to the Bortezomib Signature by Cell Painting to specific biological pathways would demand considerable cell-based assays to validate. In addition, our analyses suggest that the features contributing to the Bortezomib Signature are spread across a range of cellular compartments and channels, making it difficult to pin down specific mechanisms or pathways as likely contributors to bortezomib resistance. However, we are adding a figure to increase interpretability of the signature, which will aid in developing future hypotheses. Note that the signature was not possible to detect by eye (Fig. 2 A).

      Reviewer 2 - Minor Comment 1:

      There could be some speculation on the mechanism of Bortezomib resistance concerning the literature with the existing image data. For example, Bortezomib resistance is connected to serine synthesis and how a particular feature could contribute to the known mechanism.<br />

      Reviewer 2 - Minor Comment 2:

      Along the same lines, the authors could show that larger cells lead to resistance with microscopic images.

      2. [Response to Reviewer 2 - Major Comment 8]: We appreciate the reviewer’s concern that our work using HCT116 clonal cells lines may not directly reflect results from patient samples. Our choice was based on previously published work demonstrating the efficiency with which HCT116 cells generate resistant clones due to diminished DNA mismatch repair and decreased expression of drug efflux pumps. Since our work is a proof of concept rather than a comprehensive demonstration of translating morphological profiling into clinical practice, we believe that experiments using multiple patient cell lines from different tissues as well as digital pathology records to be beyond the scope of this work. We instead chose to tone down the language of our manuscript to more clearly acknowledge the limitations of our work and clarify this as a proof of concept.

      Reviewer 2 - Major Comment 8 (relevant excerpt):

      I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.<br />

      3. [Response to Reviewer 3 - Major Comment 2]: The bortezomib sensitivity of ixazomib- and CB-5083-resistant clones was not determined, and hence can not be ruled out as a possible explanation for their high Bortezomib Signature scores. However, we prefer not to conduct additional proliferation assays for the misclassified clones (IX02, WT06, CB14, CB16) in the presence of bortezomib to determine whether coincidental bortezomib resistance might explain the signature performance. Our rationale is that three other misclassified clones (WT10, WT15, and BZ06) had the expected bortezomib sensitivity in proliferation assays (Fig. 1-Supplement 2), meaning that additional proliferation assays may not reveal any insights regarding the signature performance.

      Reviewer 3 - Major Comment 2:

      Was the bortezomib sensitivity of the IX (or CB) resistant cell lines determined? If there were differences, this could explain some of the variation in the morphological signatures. This could be easily done in one or two growth experiments.

      4. [Response to Reviewer 2 - Major Comment 7]: Thank you for pointing this out. Our goal is to keep the study multi-disciplinary. We are adding a figure to increase interpretability of the signature, and adding text-based clarifications.

      Reviewer 2 - Major Comment 7 (relevant excerpt):

      While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      5. [Response to Reviewer 2 and 3 - Major Comments 6 and 4]: We prefer not to expand the scope of the model to predict other drug signatures. This would require a substantial amount of work to generate the appropriate drug-resistant clones, collect the imaging data, and analyze it, and we think it important to convey the purpose of our paper is proof of concept. We do not feel that the time invested in performing this analysis would result in adequate returns beyond what we already demonstrate.

      Reviewer 2 - Major Comment 6.

      Interestingly, the Bortezomib signature is specific to the drug and not a broad range of proteasomal inhibitors. However, seeing the common features between all the proteasomal inhibitors would be interesting.

      Reviewer 3 - Major Comment 4

      There was some predictive ability of the Bortezomib Signature for ixazomib resistance. Were there some features that were correlated with IX-resistance, i.e. UPS pathway, versus specific to bortezomib? Do the features suggest anything about resistance mechanisms or is the feature set too abstruse to interpret?

      References

      Foroutan, Momeneh, Dharmesh D. Bhuva, Ruqian Lyu, Kristy Horan, Joseph Cursons, and Melissa J. Davis. 2018. “Single Sample Scoring of Molecular Phenotypes.” BMC Bioinformatics 19 (1): 404.

      Heiser, Katie, Peter F. McLean, Chadwick T. Davis, Ben Fogelson, Hannah B. Gordon, Pamela Jacobson, Brett Hurst, et al. 2020. “Identification of Potential Treatments for COVID-19 through Artificial Intelligence-Enabled Phenomic Analysis of Human Cells Infected with SARS-CoV-2.” bioRxiv. https://doi.org/10.1101/2020.04.21.054387.

      McInnes, Leland, John Healy, Nathaniel Saul, and Lukas Großberger. 2018. “UMAP: Uniform Manifold Approximation and Projection.” Journal of Open Source Software 3 (29): 861.

      Nyffeler, Johanna, Clinton Willis, Ryan Lougee, Ann Richard, Katie Paul-Friedman, and Joshua A. Harrill. 2020. “Bioactivity Screening of Environmental Chemicals Using Imaging-Based High-Throughput Phenotypic Profiling.” Toxicology and Applied Pharmacology 389 (January): 114876.

      Rohban, Mohammad Hossein, Shantanu Singh, Xiaoyun Wu, Julia B. Berthet, Mark-Anthony Bray, Yashaswi Shrestha, Xaralabos Varelas, Jesse S. Boehm, and Anne E. Carpenter. 2017. “Systematic Morphological Profiling of Human Gene and Allele Function via Cell Painting.” eLife 6 (March). https://doi.org/10.7554/eLife.24060.

      Simm, Jaak, Günter Klambauer, Adam Arany, Marvin Steijaert, Jörg Kurt Wegner, Emmanuel Gustin, Vladimir Chupakhin, et al. 2018. “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.” Cell Chemical Biology 25 (5): 611–18.e3.

      Wacker, Sarah A., Benjamin R. Houghtaling, Olivier Elemento, and Tarun M. Kapoor. 2012. “Using Transcriptome Sequencing to Identify Mechanisms of Drug Action and Resistance.” Nature Chemical Biology 8 (3): 235–37.

      Wawer, Mathias J., Kejie Li, Sigrun M. Gustafsdottir, Vebjorn Ljosa, Nicole E. Bodycombe, Melissa A. Marton, Katherine L. Sokolnicki, et al. 2014. “Toward Performance-Diverse Small-Molecule Libraries for Cell-Based Phenotypic Screening Using Multiplexed High-Dimensional Profiling.” Proceedings of the National Academy of Sciences of the United States of America 111 (30): 10911–16.

      Way, Gregory, Yu Han, David Stirling, and Shantanu Singh. 2023. Broadinstitute/profiling-Resistance-Mechanisms: Analysis for Preprint. Zenodo. https://doi.org/10.5281/ZENODO.7803787.

      Way, Gregory P., Maria Kost-Alimova, Tsukasa Shibue, William F. Harrington, Stanley Gill, Federica Piccioni, Tim Becker, et al. 2021. “Predicting Cell Health Phenotypes Using Image-Based Morphology Profiling.” Molecular Biology of the Cell 32 (9): 995–1005.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors use Cell Painting, a high-content image-based phenotypic assay, to distinguish between clonal cancer cell lines that are resistant versus sensitive to a proteasome inhibitor anti-myeloma drug called bortezomib. The authors characterized a high-dimensional cell morphology signature for bortezomib-resistance, evaluated it on an independent subset of cell lines, and evaluated specificity in respect to other drugs targeting the ubiquitin-proteasome system. The authors thus propose image-based morphology characterization as an alternative method for characterizing drug resistance.

      Strengths: solid methodology - cell lines validation of drug resistance, extensive data collection, thorough validation of the analysis pipeline, avoiding potential confounders, biases and proper data partitioning to test and hold-out (what the authors refer to as "machine learning best practices").

      Weakness: weak discriminative signal. Some aspects of the writing could be improved to make the manuscript easier to follow (see Minor comments).

      Major comments:

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak. Specifically, it is not clear whether the signature can effectively capture the drug resistance of a single cell line. In Figure 2-Supplement 9, considering the test (C) and the holdout (D), only 1/9 BZ clones' median signatures were beyond the 95% confidence interval, with 4/6 and 2/6 WT cell types with median signatures beyond the positive and negative 95% confidence interval correspondingly. When defining bortezomib-sensitivity according to the median signatures' sign (>0 or <0) of a cell line, Figure 2-Supplement 9 shows that in the test+holdout there are 9/9 correct bortezomib-resistance (BZ) and 6/7 correct bortezomib-sensitive (WT) predictions. However, similar discrimination levels also appeared in the other drugs (ixazomib, CB-5083), making the statements about specificity less grounded. When the authors evaluate the AUROC they report ~0.6 (line #194) for the non-specific (ixazomib, CB-5083) drugs versus ~0.75 for bortezomib-resistance (line #202). With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript. An alternative would be to increase the number of wells and see whether this weak signal can indeed be statistically amplified with many replicates to make a robust and specific characterization of a cell line's bortezomib-sensitivity (but I assume this is a lot of work and probably out of scope of this manuscript). I think it is also important to discuss in more detail the interpretation of these results (including Figure 2-Supplement 9), in this context, in the Discussion.

      Minor comments:

      Suggested clarifications (some might be less relevant if the manuscript is designed for experts in the more clinical domain who are familiar with these terms / style):

      1. More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.
      2. What is LD90 (line #87)? LD50 (line #97)?
      3. It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.
      4. I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.
      5. Lines #104-111 were duplicated in lines #114-122.
      6. The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.
      7. How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.
      8. I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.
      9. Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".
      10. Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      Referees cross-commenting

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Significance

      Cell Painting was applied to many applications, but as far as I am aware this is the first attempt for an image-based phenotypic characterization of drug resistance. While the authors established that this approach can measure, to some extent, bortezomib-sensitivity, at the current state of the results, I am not convinced that cell painting can be practically used to assess bortezomib-sensitivity of a single cell line. However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: Forer and Otsuka provide first-rate evidence for tethers fixed in place between separating anaphase chromosomes using electron tomography. The authors traced the anaphase movement of a number of living cells before fixation for examination using electron tomography. The manuscript is clearly written and provides an excellent introduction and discussion of the known literature. The reader will have an excellent background to see the importance of this work.

      Major comments:<br /> - Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      No further experiments are needed. The data are very supportive, and extremely clear.<br /> - Are the data and the methods presented in such a way that they can be reproduced? Yes.<br /> - Are the experiments adequately replicated and statistical analysis adequate? Yes.

      Minor comments:<br /> - Are prior studies referenced appropriately? Yes.<br /> - Are the text and figures clear and accurate? Absulotely.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The authors are to congratulated on their major contribution to this study on tethers between separated daughter chromosomes. It is a tpur deforce to go from the living cells to fixing and identifying the same separated chromosomes using electron tomography to see the ultrastructure of the fibers seen fir.

      Referees cross-commenting<br /> Thank you reviewer #2. The manuscript should be published. It is an excellent contribution.

      We thank the reviewer for the appreciation of the clarity and quality of our work.

      Reviewer #1 (Significance):

      Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested.

      This manuscript is the first to use electron tomography to identify the tethers between separated anaphase chromosomes. Forer and the laetMichael Berns and their co-authors have published a number of papers using phase microscopy and lasers to report on the physical nature and elastic properties of these fibres in the past. Forer and Otsuka have presented first-rate evidence for the reality of these structures using electron tomography. This manuscript should highlighted in the published journal.<br /> The chemical identity of these fibers as the authors state is unclear.

      The following aspects are important:

      • Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field?

      This exciting contribution will be read by anyone interested in mitosis. It will be of interest to all Cell Biologists because of the careful manner in which the living cells were studied before they were fixed for examination using electron tomography. The readers will be dreaming how they can use this process on their Cell Biology problems._

      • Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am a cell Biologist who has made contributions, both in light microscopy and in transmission microscopy on diving cells, both in tissue culture and in situ in aviav and zebrafish embryos.

      We thank the reviewer for appreciating the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this paper, the authors use light microscopy and electron tomography to study anaphase chromosomes in crane fly spermatocytes. They find that there are two "tether" structures that connect telomeres of sister chromatids. One tether is thicker (denser) and extends between sister chromatids during early but not late anaphase, whereas a second, less-dense tether maintains contact with both sister chromatids in all examined stages of anaphase. The paper makes arguments as to what the tethers could or could not be. Specifically, they are too numerous to be ultrafine DNA bridges seen in various normal or abnormal segregation events and they also do not affect anaphase chromosome motion the same way ultrafine DNA bridges do.

      Major comments:<br /> The major claim that there are tethers that connect sister chromatids in anaphase is supported by the data. Moreover, the data resolves two types of tethers on the basis of their density. While it is unclear what the composition of the tethers are, the paper makes a convincing case that they cannot be the DNA ultrafine bridges seen in other studies. The discussion has sufficient caveats that most readers will see that more work is needed to identify the composition of the two tethers. In my opinion, no further experiments are needed to support the modest claims of this paper. Therefore, I only have minor comments that may hopefully improve the paper's clarity.

      We thank the reviewer for the positive evaluation of our work.

      Minor comments:<br /> It was argued that the tethers reported here were also seen in other species and cellular contexts, where the imaging work was done with projection EM imaging. Presumably, what is new here is the usage of electron tomography. It would help readers if the authors explained why the electron tomography done here was essential to arrive at key conclusions.

      Thank you for the useful comment. We have added the explanation of why electron tomography was critical to visualise small tether structures to the last paragraph of the Discussion on page 7.

      p.3 mitochondria appeared to be fixed properly ... (e.g., Figs. 1C, 2B) - I don't see any mitochondria in any figures. Perhaps this observation should be noted as "not shown"?

      We thank the reviewer for pointing this out. We have added an electron micrograph of mitochondria to the Supplementary Figure 1.

      p.3 The images shown in Figs. 1, 2, 4 - The figures should be called out in the order; in this case, Fig 3 has not been called out yet.

      We have corrected the order of the figures.

      p.4 we did not find any other connecting structures - Because the sample was processed by traditional EM methods, it's safer to add a caveat that other connecting structures could be missed if they were disrupted by sample prep or if they did not pick up stain as well as the two structures presented in this paper.

      We have clarified that our sample was chemically fixed in the first paragraph of the Discussion on page 4. Because the details of how our samples were prepared are described in the Method section, we did not add further details to this paragraph.

      p.7 we expect such structures to be commonly seen in other cell types as well if they are examined carefully - Instead of saying that examinations should be done "carefully", it would be more helpful to specify how other cell types should be examined. This work shows that the bridges can be found if the cells are either sectioned parallel to the spindle axis or if a sufficiently large volume is sampled.

      We have now clarified that 3D electron microscopy techniques such as electron tomography are critical to visualise small tether structures in the last paragraph of the Discussion on page 7.

      Please use consistent spelling/hyphenation of ultrafine/ultra-fine and word choice (strands vs. bridges).

      Referees cross-commenting<br /> I agree with my co-reviewers's comments and have no further suggestions._

      Reviewer #2 (Significance):

      This may be the first use of electron tomography to study the structural details of tethers that connect chromosomes in anaphase cells. The data is of sufficient quality to reveal differences in density. Namely, one class of tether appears to be an extension of the chromosome while the other class is composed of thin filaments. This study is novel in that it characterizes a mitosis-associated complex that is poorly studied compared to the microtubule-based spindle apparatus and the kinetochore. Hopefully, the tethers will draw more attention and further characterization by methods like super-resolution microscopy and cryo-electron microscopy. My expertise is in chromatin, mitotic machines, and cryo-electron tomography.

      We thank the reviewer for appreciating the novelty and the impact of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Tethers between telomeres of chromosomes in anaphase were inferred from earlier studies of laser microbeam cutting experiments. The current paper presents images from electron tomography of crane fly spermatocytes that substantiates the earlier inference. The authors deduce that the darker filaments and the lighter filaments that they visualize may be the structural tethers at telomeres.

      Major comments:

      The experiments are carefully done, and the conclusions are appropriately worded to qualify any caveats. This short communication is well-presented, and I have only a few comments._

      We thank the reviewer for appreciating the clarity and quality of our work.

      The authors should expand their list of references on bridges to include those listed by Warecki et al (Curr Biol 33:1-17, 2023; refs 15-26, etc).

      We do not think it is necessary to expand the list of references for ultra-fine DNA bridges. In the article we submitted, we discussed the Warecki at al article in the penultimate paragraph of the Discussion; we concluded that the bridges that Warecki at al described are different from ours in having so few per cell that they couldn’t be tethers, and further that there was no evidence that those bridges were elastic. For those reasons, we do not find discussion of those proteins relevant to tethers, any more than would listing all the proteins associated with ultra-fine DNA bridges be relevant to the elastic tethers.

      In the Discussion, we discussed data suggesting that a known elastic protein titin was present; that is as far as we wanted to go on speculation of what the elastic component of tethers might be.

      The authors present arguments that the tethers are not the DNA bridges observed by others. However, they should try to address this experimentally by treatment of their preparations with DNase to see if the thick and/or thin filaments disappear.

      While we agree that it would be important to identify the components of the tethers, we are concerned that those experiments are beyond the scope of this manuscript. Nevertheless, we appreciate the constructive suggestion for the future research direction.

      Moreover, they should discuss in more detail the possible functions of (DNA) bridges, including the recent model from Bill Sullivan's lab (Warecki et al, Curr Biol, 2023) that they help to retain fragments of broken chromosomes. In addition, the authors should summarize the various proteins that may be associated with the bridges (as enumerated in the Warecki et al 2023 paper).

      As we describe above, we concluded that the bridges Warecki at al described are different from the tethers that we report in our manuscript. Therefore, we do not think it is necessary to expand the discussion on the proteins and functions associated with ultra-fine DNA.

      The authors could add a sentence to the Results or Discussion of whether the thicker tethers might become stretched as anaphase progresses to become the thinner tethers (Fig. 4G).

      We thank the reviewer for this suggestion. We actually mentioned this possibility in the third paragraph of our Discussion on page 7.

      The authors may want to add a few sentences to the Discussion about the "chromosomal bouquet" stage of leptotene of meiosis prophase I where the telomeres of chromosomes seem pulled together and associate with the nuclear envelope --- they could speculate if this might also be due to the tethers that they describe in spermatocytes.

      This is a very interesting possibility. While we would refrain from adding this speculation to our manuscript as it is beyond the scope of the main points, it is certainly an interesting avenue of future research.

      Minor comments:

      A few additional comments are as follows:

      p. 2 last sentence of first paragraph -modify the wording about "no structural evidence that identifies physical connections between separating telomeres", since there is some information from genetic and cell biology light microscopy experiments. Perhaps simply change "structural" to "ultrastructural".

      We have changed the wording as the reviewer recommended

      p. 6, 5th line of second paragraph - change "ribosome DNA" to "ribosomal DNA"

      We have corrected it.

      Figure 1D - add the chromosome to the right of the schematic model (as suggested by Fig. 1B).

      We are sorry for the confusion. In Figure 1D, the left half of the tethers are 3D modelled and shown. We have clarified this point by modifying the legend of Figure 1D

      p. 17 (Methods), line 10 of first paragraph - state if this is light or heavy Halocarbon oil (give details).

      It is a mixture of heavy and light Halocarbon oil. We have clarified it on page 17.

      p. 17 (Methods), line 12 of first paragraph- state the concentration for fibrinogen and for thrombin.

      As we wrote in the original manuscript, the procedures are described in detail in our previous publication (Forer A. & Pickett-Heaps J. (2005) Fibrin clots keep non-adhering living cells in place on glass for perfusion or fixation. Cell Biology International 29: 721–730). Nonetheless, to clarify this point, we have modified the text on page 17.

      p. 17 (Methods), line 4 of second paragraph - is there any data to show that the filaments (tethers) occur if there is no cold shock?

      Yes, we do see similar filamentous structures in the sample without cold shock. For your information, we show one of the electron micrographs below. In our manuscript, we show the data from the samples prepared with cold shock, because it better visualizes the filamentous structures. We now show these electron micrographs in the Supplementary Figure 2.

      Referees cross-commenting<br /> I concur with Reviewers #1 and #2 that this is a fine paper that should be published. My detailed comments submitted with my review are simply meant as revisions to further strengthen this paper.

      We thank the reviewer for supporting the publication of our manuscript.

      Reviewer #3 (Significance):

      Strengths: This is an important conceptual advance and the carefully done ultrastructural imaging provides the foundation for future studies that could delve into the molecular composition and functional significance of the tethers at telomeres of anaphase chromosomes seen here by 3D electron microscopy.

      Limitations: the molecular composition and functional roles are not yet known for the tethers seen here by 3D electron microscopy, but to do so would involve an entire new program of experimentation.

      Advances: there have only been two earlier ultrastructural papers on tethers at telomeres, and the tethers were peripheral to the main focus of those papers. Thus, the current paper extends our ultrastructural information about tethers.

      Audience: this work is of importance for scientists who study the mechanics of chromosome movement on spindles, including regulation to combat aneuploidy. This work will also be important for a broader audience to inform them about transmission of the hereditary information to daughter cells._

      We thank the reviewer for appreciating the significance and the impact of our work.

    1. While there are rich areas of study in animal communication and interspecies communication, our focus in this book is on human communication. Even though all animals communicate, as human beings we have a special capacity to use symbols to communicate about things outside our immediate temporal and spatial reality (Dance & Larson, 1976). For example, we have the capacity to use abstract symbols, like the word education, to discuss a concept that encapsulates many aspects of teaching and learning. We can also reflect on the past and imagine our future. The ability to think outside our immediate reality is what allows us to create elaborate belief systems, art, philosophy, and academic theories. It’s true that you can teach a gorilla to sign words like food and baby, but its ability to use symbols doesn’t extend to the same level of abstraction as ours. However, humans haven’t always had the sophisticated communication systems that we do today.

      With 126 published definitions of "communication," touching on other forms of communication other than merely speaking in a speech class is vital. With humans having some of the widest range of speech (i.e. various languages) that often times are not seamless, other universal abstract symbolism in conjunction with spoken communication is necessary to bridge the gap. Even our written language and assigned meaning to certain methodic squiggles displayed on paper varies widely, as well as other less obvious ways of communicating like gestures and body languages that could seem inconsequential to one may be monumentally offensive to others, the intricate woven methods to communicate within the complexities we as a human species have created is a fascinating study beyond merely standing in front of a group of peers and talking at them for 3-5 minutes about a chosen topic.

    2. Like other forms of communication, intrapersonal communication is triggered by some internal or external stimulus. We may, for example, communicate with our self about what we want to eat due to the internal stimulus of hunger, or we may react intrapersonally to an event we witness. Unlike other forms of communication, intrapersonal communication takes place only inside our heads.

      Everyone on this planet has intrapersonal communication. I talk to myself every day, and I have conversations with myself on what I'm going to do or what I need to do. Some people talk to themselves to calm down, or they journal to ease their minds. When something surprising happens people usually react somehow in their head, basically when anything happens people react to themselves. Just as the text states, "We also use intrapersonal communication or “self-talk” to let off steam, process emotions, think through something, or rehearse what we plan to say or do in the future." Intrapersonal communication happens almost every second throughout one person's day.

    1. In fact, it might be good if you make your first cards messy and unimportant, just to make sure you don’t feel like everything has to be nicely organized and highly significant.

      Making things messy from the start as advice for getting started.

      I've seen this before in other settings, particularly in starting new notebooks. Some have suggested scrawling on the first page to get over the idea of perfection in a virgin notebook. I also think I've seen Ton Ziijlstra mention that his dad would ding every new car to get over the new feeling and fear of damaging it. Get the damage out of the way so you can just move on.

      The fact that a notebook is damaged, messy, or used for the smallest things may be one of the benefits of a wastebook. It averts the internal need some may find for perfection in their nice notebooks or work materials.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Rigger and Brenner details the role of vimentin network, in advancing OA pathogenesis by exacerbating premature senescence. The data is well presented and the study of interest, in that there is little known about vimentin in cartilage biology.<br /> The authors used OA derived cartilage explants and chondrocytes cultures, were graded for severity and compared accordingly. Figure 1 shows that markers of senescence are increased with structural damage, which is well established and consistant with the literature. Using a DOX model the authors induce premature senescence and exhibit a disrupted vimentin network. However, upon KD of CDKN2A, a marker of senescence, but did not observe complete reversal of CSV presentation.<br /> Next the authors show in figure 4 and 5, that the reduction or dismemberment of vimentin structures are linked to senescence and may act as contributing factors.<br /> Figures 6 and 7 then go on to show that upon advanced passage chondrocytes lose their vimentin network, and tend to senesce and mineralize.

      Reviewer #1 (Significance):

      Strength:<br /> This is a very novel study showing a link between vimentin and senescence in chondrocytes. The data are in line with other data. The work is clearly written structured and well displayed.

      Author´s response:<br /> We thank reviewer #1 for their interest in our work and their overall positive report.

      Suggestions for improvement:

      While the study is very thorough ought in describing the markers of senescence and vimentin network, it lacks insight regarding mechanism which isn't completely deciphered. Are there links to key transcription factors?

      Author´s response:<br /> The transcriptional regulation of vimentin in human cells is very complex. The VIM promoter region comprises multiple elements, such as a NF-kB- binding site, a PEA3-binding site and two AP1-binding sites (Zhang et al., 2003). Moreover, it was recently demonstrated that redox signaling is involved in vimentin expression at the wound margin after tissue injury in zebra fish (LeBert et al., 2018). However, it has also been reported that IL-1ß stimulation results in reduced gene expression of vimentin via p38-signalling in cartilage degeneration and OA progression (see manuscript REF. 36,37).

      In our study, we observed that enhanced CSV levels are associated with a decreased vimentin gene expression, indicating a lower stability of the mRNA or decreased transcription of VIM in senescent chondrocytes (maybe due to enhanced p38-signalling as mentioned above). Since the transcriptome in senescent cells is radically changed, this question cannot be answered easily.

      In future studies, we will rather try to clarify the underlying mechanism of vimentin externalization. There are still many questions to be answered: is the CSV anchored in the cell membrane (which anchor protein?) and is there still a connection to the intracellular vimentin network? Which proteins are involved in the externalization process: maybe comparable to phosphatidylserine exposure, mediated by flippases, scramblases, and lipid transfer proteins or rather by vesicles?

      Literature mentioned above (not included in manuscript):

      LeBert et al., 2018: Damage-induced reactive oxygen species regulate vimentin and dynamic collagen-based projections to mediate wound repair. DOI: 10.7554/eLife.30703

      Zhang et al., 2003: ZBP-89 represses vimentin gene transcription by interacting with the transcriptional activator, Sp1. DOI: 10.1093/nar/gkg380

      It is also unclear if disruption of the network is more detrimental than KD in promoting senescence.

      Author´s response:<br /> KD of Vimentin led to a gradually decrease of intracellular Vimentin content and consequent stress. The cells were analyzed 7 days after induction of the KD and exhibited a stable senescent phenotype, comparable to Doxorubicin-treated chondrocytes (treated with very low concentrations over several days to produce only mild but ongoing stress). These models might reflect the pathophysiologic situation: We think that cellular stress due to mechanical impact and subsequent oxidative stress/ low-grade inflammation might lead to a gradual disruption or re-organization of the vimentin network, which is accompanied by decreased vimentin gene expression.

      In case of the disruption of the vimentin network by Simvastatin, the stress response was very intense and rapid (24 h), and was only conducted as a proof-of-principle experiment. Despite the upregulation of some senescence-associated markers, we don`t think that permanent Simvastatin treatment would be suitable to obtain a stable senescent phenotype, but rather expect the cells to die due to excessive stress.

      It would have been good to include models OA murine models to understand these processes better, and make a stronger physiological connection with OA of the joint.

      Author´s response:<br /> The CSV antibody is only suitable for human cells and cannot be used for immunohistochemistry. Therefore, all previous reports of CSV are based on human (isolated) cells. At the current time point, it would not be possible to stain CSV in joints of mice after induction of PTOA due to the methodological limitations. We actually tested the CSV-antibody in isolated lapine chondrocytes and found a high percentage of CSV-positive cells, even at low passages. Although stress increased the amount of CSV-positive lapine cells, we did not consider the results as reliable due to the high percentage in un-stressed cells, which might result from unspecific antibody binding.

      Overall, we think that the usage of clinical OA samples is convincing and reflect the pathophysiologic situation in the human OA joint.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript provides solid evidence for an association between cell surface vimentin (CSV) and chondrocyte senescence. Human cartilage and cultured chondrocytes are used with a wide range of approaches to provoke senescence: natural osteoarthritis, traumatic loading ex vivo, doxorubicin to cells in monolayer, vimentin siRNA, and simvastatin. In contrast, relatively little was done to try and interrupt or reverse the role of CSV in senescence, with CDKN2A siRNA representing one attempted intervention. The manuscript is well written and the data are presented in a logical and clear manner, with a high likelihood of being reproduced in subsequent studies.

      Author´s response:<br /> We thank reviewer #2 for their interest in our work and their mainly positive report.<br /> Regarding their comment on our attempts to reverse CSV on senescent chondrocytes, we would like to add the following: Reversal of cellular senescence is a very ambitious challenge. But in fact, we are currently preparing a manuscript in which we characterize an appropriate senolytic strategy to “rejuvenate” human chondrocytes and plan to use this approach to reduce the amount of senescent and thus CSV-positive cells in future experiments.

      _Major comments:

      In the doxorubicin experiments, the senescent cells show a spread morphology as expected. Given the importance of vimentin in cell spreading (as the authors own data show), the possibility that spread morphology itself (and not senescence) leads to CSV should probably be examined. This could perhaps be achieved by plating with different concentrations of fibronectin or other matrix proteins that produce a spread morphology to a degree that matches the doxo. If the cells remain spread for ~10 days but don't become senescent and don't have CSV, this would provide further support for a direct relationship.

      Author´s response:<br /> We agree that cell spreading is associated with various cellular processes (for example by the YAP signaling pathway). Moreover, we would like to thank the reviewer for the proposed experiment.

      Seeding of cartilage cells on fibronectin coated plates is a commonly used procedure to isolate chondrogenic stem progenitor cells, due to their higher affinity to fibronectin. The cells are usually cultured for several days on the coated plates and do not exhibit a flattened, senescent-like phenotype (as we observe for Doxorubicin-treated cells), but an elongated, fibroblast-/ stem cell-like shape. Our results (Figure 6E) demonstrate that CSPC have no increased CSV levels, despite their elongated (not flat) morphology.

      There are some findings supporting the assumption that CSV leads to enhanced cell adhesion, but not that adhesion or cell spreading promotes CSV: we included experiments with HeLa (low CSV levels) and SaOS-2 (high CSV levels), which demonstrated that high CSV levels are associated with increased plastic adhesion (Figure S5). In line with this, we demonstrated that higher CSV levels on chondrocytes were associated with enhanced fibronectin and vitronectin binding, which might explain increased plastic adhesion. Moreover, Simvastatin stimulation and subsequent cellular stress by Vimentin disruption resulted in enhanced CSV but did not lead to cell spreading (Actin not affected, cells rather elongated, not flattened).

      Minor comments:

      The CSV antibody and staining method appeared to have generated some signal from debris, which makes it challenging to assess the localization of true staining. Presumably the true staining would be present only on the cell surface. While the widefiled view is appreciated, perhaps insets with a higher magnification would clarify.

      Author´s response:<br /> In Figure 2h and Figure 2i, we provide insets of the IF-staining and an exemplary image made by scanning electron microscopy (SEM). CSV is not localized on debris – Figure 2h, actually represents the cell surface. The magnified, Doxo-treated cell is highly senescent and thus flattened. The uneven (rather spotted) staining pattern of CSV and the unusual shape of the cell might suggest that this is debris, not the cell membrane.

      For figure 1k, it is a bit surprising that CDKN2A would peak so early after injury and then drop off. Most studies in other systems show a gradual increase in CDKN2A levels with persistent stress as opposed to a rapid increase in response to acute stress. Could the drop-off be due to preferential death of these cells? The CSV % in 1m was taken from 7d after trauma (plus 7 days in monolayer it appears). Further discussion on the timing of traditional senescence markers as compared to the emergence of CSV would be useful.

      Author´s response:

      We would like to thank the reviewer for this comment. That CDKN1A was induced by mechanical trauma without significant decrease at the later time points was in line with the P53 expression, which we detected via immunohistochemistry (IHC; positive staining of chondrocyte nuclei in cartilage). P53 and P21 are regarded as interconnected senescence markers. Interestingly, P53 is not regulated on gene expression level upon cartilage trauma or Doxorubicine stimulation – but there is a significant increase in P53 nuclear translocation.

      Although such a discrepancy between gene expression and protein activity has not been reported in case of P16 or P21, we plan to investigate the dynamics of these cell cycle regulators and its connection to CSV after cartilage trauma in more detail in future studies.

      We included the following statement in the discussion part:

      “In the current study, we observed that CSV on chondrocytes was reduced by siRNA-mediated silencing of CDKN2A and increased after Doxo treatment or cartilage trauma. While we confirmed that mRNA levels of both CDKN1A and CDKN2A were significantly enhanced upon injury but exhibited different expression levels over time, we determined CSV-positive cells only at one time point after ex vivo cartilage trauma. Future studies might also consider earlier and later time points after cartilage injury to identify a potential time-dependent peak or decline in CSV-positive chondrocytes. In this way a potential association between CSV and the expression levels of CDKN1A and CDKN2A, which are thought to play differential roles in initiating and maintenance of senescence, respectively [50], might be clarified.”

      [50] Stein G, Drullinger L, Soulard A, and Dulić V. Differential Roles for Cyclin-Dependent Kinase Inhibitors p21 and p16 in the Mechanisms of Senescence and Differentiation in Human Fibroblasts. Mol Cell Biol. 1999;19(3): 2109–2117. https://doi.org/10.1128/mcb.19.3.2109.

      There is no CSV staining shown for figures 4 and 5. While the quantification of CSV was done by flow cytometry, it would nice confirmation to see the increase in CSV on the surface of cells with either siRNA for vimentin or the simvastatin.

      Author´s response:

      CSV-IF of simvastatin-treated chondrocytes is provided in Figure 5 (b). We did not perform exemplary staining of CSV after VIM-KD, because the quantification was performed via flow cytometry.

      Reviewer #2 (Significance):

      The strengths of the study include a rigorous design and the establishment of a potential new cell surface marker of chondrocyte senescence. The main limitation is that the conclusions are largely descriptive in nature.

      If CSV is confirmed as a robust marker of senescence, this would be of value to the field. While this marker has been explored previously in other systems, there is value in this manuscript given the wide range of contexts investigated for a cell type in which senescence likely has an important role.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study presents a sound piece of science in the puzzle about extracellular vimentin in the differentiation/dedifferentiation of human chondrocytes and senescence and osteoarthritis. Eventhough, no mechanism is elucidated, the results clearly point towards a correlation of the amount of extra cellular vimentin and the level of chondrocyte senescence, and therefore signs of osteoarthritic changes in the cultivated chondrocytes. The methods applied are state-of-the art and provide the means to generate meaningful results in this experimental setting. The paper is concise and clearly written, there are only minor remarks.

      Author´s response:

      We thank reviewer #3 for their interest in our work and their overall positive report.

      Minor comments:

      1. The main clue of the paper is extra cellular vinemtin around chondrites in culture, please provide better pictures (1g) to support this. Why is the extra cellular staining seen so broad and not concentrated on the cells surface? The picture chosen imply a huge amount of vimentin to be externilized in disease states. It also indicates that in diseased chondrocytes no intact or semi-intact vimentin network is found intracellular. Please comment.

      Author´s response:

      In Figure 1g, CSV is located on the cell membrane. The pattern of the staining was surprising to us, as well. CSV was not equally distributed on the membrane, but rather represented an inconsistent pattern. Sometimes the staining was located at the filopodia of the cells, sometimes the whole cell was covered by spots. We also observed this on cancer cells, which was in line with other studies using this antibody. It remains unclear whether the distribution of the CSV has any effect. But we assume that the high abundance in filopodia might be connected with cell adhesion and mobility, which was positively associated with CSV.

      Yes, chondrocytes isolated from highly degenerated tissue exhibited higher CSV levels as compared to cells derived from macroscopically intact regions. Although we did not investigate the vimentin network of these cells, our observations in Doxo-treated cells imply, indeed, that intracellular vimentin might be altered in diseased chondrocytes. According to this, Blain et al (Ref. 13) reported that there is a disassembly of the intracellular vimentin network in OA chondrocytes, which can disturb the chondrocyte phenotype and contributes to the development of OA (see discussion).

      1. In the doxo experiment no extracellular vimentin is found? Please explain.

      Author´s response:

      Doxo-treated cells are highly positive for CSV (= extracellular vimentin on membrane). However, the intracellular vimentin is strongly decreased and some cells seem to be negative. We have not clarified the underlying mechanism by now, but it seems that senescence/ disease progression negatively affects the transcription of vimentin and, at the same time, promotes the externalization of the existing intracellular vimentin. Altogether, this might result in a decline in intracellular vimentin.

      1. The SEM picture is showing what. IGH? The red dots are colloidal gold particles? In any case the quantity of stain gathered EM level would not correlate to the huge amount seen in LM staining. Please comment.

      Author´s response:

      For the SEM analysis, a gold particle-coated secondary antibody was used. The positive signal usually appears in white and was subsequently colored via a software. In IF and ICC staining, we had a signal amplification due to the biotin-streptavidin system and the magnification makes, of course, a huge difference.

      1. Why the ICC in Fig. 3c? The siRNA is not detected in the KD? A reduction of Vimentin could be shown via WB.

      Author´s response:

      In Figure 3c, the KD of P16 was confirmed on protein level. In addition to the gene expression analysis, we chose the ICC (IF) to confirm that there is a decline in active (nuclear) CDKN2A. In case of P53, we made the experience that gene expression and the amount of cytoplasmic/ nuclear protein might not be consistent.

      In Figure 4, we confirmed the successful KD of vimentin on mRNA and protein level (flow cytometry plus IF). Of course, WB would also be possible, but we decided to use the methods in which the antibody was well established and we wanted to visualize the disturbance of the intracellular vimentin network upon KD.

      1. Fig. 4c, why are there no remnants of the vimentin networks seen in the chondrocytes? A Knock-down, not a KO is shown.

      Author´s response:

      In fact, most of the intracellular vimentin seems to be gone. However, there are some remnants (condensed fibers/ bundles) of the former vimentin network. We applied the VIM-KD over seven days. Usually, a KD experiment is only conducted for 2-3 days. But since we were not sure how stable the vimentin protein would be, we chose seven days. This long-lasting KD might have resulted in a strong decline of the protein. Moreover, the CSV levels on these cells were very high, indicating that existing vimentin was externalized and additionally decreased the amount of intracellular vimentin.

      1. Please comment of the concentration of simvastatin, why not nmolar?

      Author´s response:

      The concentration of Simvastatin was chosen in accordance with Trogden et al. (Ref. 26), who first described the effects of simvastatin on the vimentin network. A lower concentration might have had the advantage, that the effects were less severe, allowing a longer observation time than 24h. However, as a proof-of-principle model to demonstrate the connection between vimentin network collapse ant CSV expression, the concentration worked quite well.

      1. CSV+ is misleading in Fig. 6g, it's not an over expression.

      Author´s response:

      We would like to thank the reviewer for this comment and removed the “+” to make it less misleading.

      1. The concept of EMT is debatable, at least in kidney fibrosis, and chondrocytes are not epithelial cells. Please add a more critical discussion point.

      Author´s response: The authors agree with the reviewer’s argument that chondrocytes are no epithelial cells ant that the term EMT doesn’t seem to be appropriate. However, this is one leading hypothesis proposed by the working group of Prof. Mayán, who described CX43 and other EMT-markers on/ in senescent chondrocytes (see reference 31; more recently: Cell Death Dis. 2022;13(8):681. doi: 10.1038/s41419-022-05089-w).

      We added the following passage in the discussion part to indicate that this hypothesis is a controversial concept:

      “Nevertheless, the hypothesis that chondrocytes might undergo an EMT-like process remains controversially discussed, because chondrocytes are mesenchymal and not epithelial cells. In a recent review, Gems and Kern propose to consider senescent chondrocytes as activated and hyperfunctional remodeling cells occurring during OA progression [49]. Accordingly, chondrosenescence might represent an unsuccessful attempt of tissue repair. They further suppose that the senescent or activated chondrocytes are associated with a hypertrophic, bone-forming phenotype, following the process of bone development rather than hyaline cartilage formation. In line with this, we observed that CSV was associated with enhanced osteogenic capacities and a decline in chondrogenic properties.”

      [49] Gems and Kern, 2022): Geroscience. 2022;44(5):2461-2469. doi: 10.1007/s11357-022-00652-x.

      Reviewer #3 (Significance):

      The manuscript provides novel insight in the role of intermediary filaments, i.e. vimentin, on chondrocyte senescence and osteoarthritic changes in vitro. It's strength is a thorough elucidation of the connection with a wealth of experimental data, a weakness is the missing elucidation, or first experiments in the direction, of the cell biological mechanism.<br /> It is well suited for a broad audience, because it deals with fundamental cell biological phenomena, definitely it's important for the OA /chondrocyte biology community.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We don't see the case for 1,5-IP8 as settled in plants, and none of the papers mentioned above draws this strong conclusion. This may be due to several limitations in the available data. The mentioned studies do not allow to differentiate the effects of 1-IP7 and 1,5-IP8 and, where binding or competition experiments have been performed, e.g. on the transcription factors, the differences in the Kd values for IP7 and IP8 were minor. Furthermore,1,5-IP8 levels and Pi starvation response do not always correlate. IPTK1 mutants, for example, show Pi overaccumulation, and low 5-IP7, but normal 1,5-IP8 (Riemer et al., 2021). Finally, plants are complex organisms with multiple tissue types that serve for accumulating, exporting, transporting or finally consuming Pi. Therefore, correlating inositol pyrophosphate levels from whole-plant extracts with a Pi starvation response is problematic, except if these data could both be obtained from the same cell types or at least tissues.

      The comment of the reviewer made us recognize that the complex situation in plants deserves a more detailed coverage and we have therefore adjusted the introduction accordingly.

      Results: "We determined the corresponding lysines in Pho81 (Fig. S3), created a point mutation in the genomic PHO81 locus that substitutes one of them, K154, by alanine, and investigated the impact on the PHO pathway."

      In my opinion, it would be important to test here in a quantitative in vitro binding assay if (i) the SPX domain of Pho81 can bind PP-InsPs including 1,5-InsP8, (ii) if the dissociation constant is in agreement with the cellular levels of 1,5-InsP8 in yeast (compare Fig. 2) and (iii) if the K154A mutation blocks or reduces the binding of 1,5-InsP8. Without such experimentation, I find the statement "this result underlines the efficiency of the K154A substitution in preventing PP-IP binding to the Pho81 SPX domain." to be overly speculative, as no binding experiment has been conducted.

      We agree with the comment of the reviewer concerning the overstatement in the phrase. It has been deleted.

      As mentioned already in our previous work (Wild et al., 2016), Pho81SPX counts among the SPX domains that we could not express recombinantly. Likewise, full-length Pho81, which would be the relevant object for correlating in vitro binding studies with the cellular concentrations, has not been accessible. Expression in yeast did not provide sufficient material for ITC or other quantitative techniques. Therefore, we refrained from pursuing binding studies. Nevertheless, given the high conservation of the positively charged patch on SPX domains and the fact that, in every case where it has been tested so far, SPX domains showed inositol polyphosphate binding activity, we find it a conservative assumption that the Pho81SPX binds them as well. This is supported by the effects of the binding site mutant, which mimics the effect of ablating IP8 synthesis.

      Results: "Inositol pyrophosphate binding to the SPX domain labilizes the Pho81-Pho80 interaction." Again, in the absence of any protein - protein interaction assay I find this statement not to be supported by the experiments outlined in the manuscript. The best way to address this point would be to perform either co-IP or in vitro pull-down experiments between Pho81-SPX and Pho81-85, in the pre- and absence of 1,5-InsP8 and/or using the Pho81 point-mutants described in the text.

      Since Pho81 could not be produced recombinantly, neither by us nor by others who worked on this protein previously, quantitative in vitro binding assays are not accessible for now. A simple IP suffers from the problem that Pho81 interacts with Pho85-Pho80 not only through the SPX domain but also through the minimum domain. The latter interaction may be constitutive. Since the main point of the manuscript is not to dissect the exact mechanisms of Pho85-Pho80 regulations, but only to address the point why the postulated inactivation of this kinase by an 1-IP7/minimum domain complex makes no sense, we prefer not to show a profound (and more complex) analysis of how the different Pho81 domains contribute to binding.

      To test the potential of the SPX domain for binding Pho85/Pho80 in vivo, we have created a GFP-fusion of the SPX domain of Pho81. This fusion protein localizes mainly to the cytosol when cells are on high-Pi. Upon Pi starvation, it concentrates in the nucleus. This concentration is not observed in pho80 mutant background (New Fig. S7).

      In line with this, I would suggest to move the molecular modelling/docking studies from the discussion into the results section and to use these models to design some interface mutations that could be tested in coIP and/or pull-down assays. Alternatively, the authors may choose to omit the discussion section starting with: "Even though the minimum domain is unlikely to function as a receptor for PP-IPs this does not ... and ending with . In sum, multiple lines of evidence support the view that the SPX domain exerts dominant, 1,5-IP8 mediated control over Pho81 activity in response to Pi availability."

      We have now moved the modelling data to the Results section. The structure prediction of the interface is experimentally validated. Data on the effect of interface substitutions are already published, although these substitutions had not been recognized as affecting a common interface at the time. Substituting the interface residues either on the side of Pho80 or of Pho81 constitutively activates Pho85-Pho80 kinase and destabilizes its interaction with Pho81. This was shown by Co-IP experiments from cell extracts by Huang et al. We mention the respective substitutions in the manuscript and cite the paper in which their effect on PHO pathway activation had been described.

      Reviewer #2 (Recommendations For The Authors):

      Some points need additional attention by the authors:

      • In general, it would be helpful to introduce abbreviations more thoroughly (certain enzyme names, PA, MD, ...)

      We paid more attention to this.

      • Also in general, the authors may want to think about the nomenclature of inositol pyrophosphates. Given the expansion of PP-IPs that are being detected in different organisms these days it may be a good time to convert to a more precise nomenclature, i.e. 5PP-IP5 instead of 5-IP7; and 1,5(PP)2-IP4, instead of 1,5-IP8. The latter could just be stated once, and then be abbreviated as IP8.

      To our understanding the field has not yet come up with a unified nomenclature. Therefore, we prefer to stick with the more practical nomenclature that we have chosen, which also corresponds to what is commonly used in presentations and discussions among colleagues. We have now introduced a sentence making the link to the nomenclature that the reviewer has proposed.

      • p. 1, Abstract: "negative bioenergetic impacts" - the phrasing seems really vague

      Agreed, but we find it difficult to be more explicit and precise in the abstract while remaining concise and not distracting from the main message. This aspect is better explained in the introduction.

      • p. 3, Significance statement: "... unified model across all eukaryotic kingdoms" While the intended meaning of this wording is better explained in the text later, the phrasing here suggests a more all-encompassing study at hand, instead of a conclusion that fits more closely with established reports from other organisms. Please rephrase.

      We have adapted the phrase to avoid this impression.

      • p. 4: "IPTKs" - are the ITPKs meant here?

      Yes, that was a typo.

      • p. 7, the introduction ends abruptly and could use a concluding sentence.

      Done

      • p.7, "enzymes diphosphorylation either the..."; I understand what the authors are trying to say with diphosphorylating, but the enzymes are phosphorylating a phosphorylated substrate.

      Yes. We changed the phrase to "....adding phosphate groups at the 1- or 5-positions....".

      • p. 7, subtitle "...concentrations and kinetics of..."; kinetics of what? Synthesis/turnover?

      We corrected this subtitle

      • p. 8, with regards to the recovery experiment: Was this recovery determined elsewhere (please cite)? Otherwise it would be beneficial to include an extra figure to illustrate these recoveries in the supplementary information. And do the authors suspect some hydrolysis of IP8 given the lower recovery?

      We have now added the experiment testing recovery of IPPs as the new Fig. S1.

      • p. 9: It is appreciated that the authors point out the concentration of IP6 in S. cerevisiae. I found that concentration rather low, and the authors could highlight this a bit more, given their ability to carry our absolute quantification.

      This was a leftover from a previous version of the paper. Since the paper does not treat IP6 or lower inositol polyphosphates, we have deleted this phrase.

      • p. 9, Fig 2: The exponential decay of 5-IP7 is very nicely shown in Figure 2c. But one of the most important discussion points is IP8 being the key controller of the PHO pathway - it would therefore be beneficial for the argument to also show the same kind of graph for IP8 and if possible, fit a function to the data points to better quantify and compare the decay processes (e.g. via "half-life time" of PP-IPs during starvation, in addition to the suggested "critical concentration" which was only discussed for 5-IP7 thus far).

      Kinetic resolution is an issue here. The approach shown in Figs. 2 and 5 is not apt to determine a critical concentration of IP8 because the decline upon transfer to starvation conditions is too fast and difficult to relate to the equally rapid induction of the PHO pathway. We shall address this point in a more appropriate setup in a future study.

      • p.9, Fig 2a: Where does the 5-IP7 come from in the kcs1Δ strain? In the text the authors state that 5-IP7 in kcs1Δ was not detected, but the figure suggests otherwise. Please explain.

      Currently, we do not know where these residual signals stem from. One possibility is that they represent other isomers that exist in minor concentrations and that are not resolved from 5-IP7 in CE. We added a sentence to the figure legend to indicate this.

      • p. 10: "IP8 was undetectable in kcs1Δ and decreased by 75% in vip1Δ. kcs1Δ mutants also showed a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesisof 1-IP7 depends on 5-IP7. This might be explained by assuming that a significant source of 1-IP7 is synthesis of 1,5-IP8 through successive action of Kcs1 and Vip1, followed by dephosphorylation to 1-IP7." - Please specify this statement. Do the authors mean that 1,5-IP8 is only produced transiently below the detection capabilities of the method but that there still is a (reduced) flux from 5-IP7 to 1,5-IP8 to 1-IP7? Otherwise it would seem paradoxical to have a dependency on a non-existing metabolite in that cell line.

      This was not clearly expressed. The revised version now says: " ... a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesis of 1-IP7 depends on 5-IP7. This might be explained by assuming that, in the wildtype, most 1-IP7 stems from the conversion of 5-IP7 to 1,5-IP8, followed by dephosphorylation of 1,5-IP8 to 1-IP7.". We hope that this clarifies the matter.

      • p. 10: "pulse-labeling approaches are not available for PP-IPs." While this statement is correct, a recent paper co-authored by Qui and Jessen showed nice pulse-labeling data for the lower Ips and could be cited here (PMID: 36589890)

      Yes, indeed, we should have been more precise here. What we wanted to express was that rapid pulse-labeling methods for following phosphate group turnover were lacking, with a temporal resolution of minutes rather than hours. Existing pulse labeling approaches, including the study mentioned by the reviewer, do not provide that. We have changed the phrase accordingly.

      • p. 10: continuation of caption of Fig 2: "were extracted [and] analyzed"

      Corrected. Thank you.

      • p. 12: How is 1-IP7 made in the vip1 kcs1 double mutant?

      As explained above, we suspect that these may be side products of IPMKs, which accumulate in the absence of vip1 phosphatase.

      • p. 13, caption to Figure 3: "XXX cells were analyzed" please replace the place holder XXX.

      Done. Thank you.

      • p. 13, Fig 3B, C, D and p. 50, Fig. S4: On screen the contrast between the different shades of grey of the bars are just visible enough, but not on paper, I suggest using a higher contrast/ different colouring scheme.

      We enhanced the contrast.

      • p. 24, 25, Fig 7.: I could not really appreciate the AlphaFold part, and found it unnecessary. No docking or molecular dynamics simulations were carried out here, and it was not clear to me what information should be gleaned from this part.

      Following this comment, we have modified the respective part of the text. This part refers to a publication from the O'Shea lab (Nat. Chem Biol. 4,25) proposing the model that 1-IP7 and the Pho81 minimum domain bind competitively to the active site of Pho85 to inhibit its kinase activity. Modeling of complexes between Pho81, Pho80 and Pho85, which we present in the manuscript, rather suggests binding of the minimum domain to a groove in Pho80. This is important because it provides a viable alternative model for the action of the minimum domain. It suggests the minimum domain as a constitutive linker that attaches Pho80 to Pho85. Importantly, this model accounts perfectly for the results of previous random mutagenesis studies on Pho80 and on the minimum domain, which had independently identified both the Pho80 groove and the minimum domain residues that bind it in the prediction as critical residues for inhibition of Pho85, and for integrity of the Pho85/Pho80/Pho81 complex. We find this alternative explanation for Pho85-Pho80 regulation by Pho81, which we can derive by combining the predictions with already published experimental data, an important element to re-evaluate the relevance of 1-IP7 in PHO pathway regulation and resolve one of the existing discrepancies.

      • p. 28: No experiments were carried out with plants or mammals. The relevance for plants or mammalian systems therefore seems to be overstated at this point in time.

      We are not quite sure how to interpret this remark. We do not claim that our data support a role for IP8 in mammals and plants. But we refer to and cite studies providing the strongest evidence in favor of it in these systems. The relevance of our current study relies in refuting seemingly strong evidence from yeast, which had been diametrically opposed to the data obtained in plants and mammals. The revision of the situation in yeast now paves the way to drawing a coherent concept for fungi, plants and mammals. We feel that this is important and should be underlined.

      • p. 31: "300 mL of 3% ammonium" - 300 µL?

      Yes. Thank you.

      • p. 45, CE-ESI-MS parameters: "1IP8"

      Corrected.

      • p. 47: Figure S1: Please include more experimental details in the caption and/or methods section. Was a similar analysis software used as e.g. Figure S2 (NIS Elements Software)? Please also include all the analysis software in the Methods section under "fluorescence microscopy". Unless these additional experimental details already clarify the following point: Can the authors briefly comment on why the morphological determination in S1 requires trypan blue staining while in later experiments the yeast cells are readily recognized by the software in "simple" brightfield images?

      Trypan blue staining is not strictly required for this. It is just a simple method to fluorescently stain the cell wall. There are many other ways of delineating the cells. It could also have been done in a brightfield image.

      We updated the figure legend to better describe how these measurements were done and deposited the script and training file on figshare.

      • p. 48: "can be downloaded from **" please insert the link once the script is available online.

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      Reviewer #3 (Recommendations For The Authors):

      1) Italicize the scientific names of the organisms; this was inconsistent throughout the manuscript. Also, gene names should be italicized; this was also inconsistent (e.g., p.12 "... did not induce the PHO84 and PHO5 [sic] promoters...).

      Done

      2) Summary of the Figure 2A data in the text (p.9) probably has swapped the determined concentrations for 1-IP7 and IP8 (0.3 µM or 0.5 µM) as compared with the data figure.

      Yes, indeed. We have corrected this.

      3) Figure 2A: which of the mutant PP-IP levels are significantly different from the WT control?

      We have now added asterisks to indicate the significance for every mutant.

      4) In the discussion on the data (Fig. 2A), I was tripped up by the verb tense in this phrase "5-IP7 has not been detected in the kcs1Δ mutant and 1-IP7 has been strongly reduced..."; I think you want to use the past tense "was" in both cases [as is used in the next sentence]. It made me wonder if there was a difference in the detection of 5-IP7 and IP8 in the kcs1Δ mutant, you could detect 5-IP7 but not IP8; if so, where did the 5-IP7 come from?

      We have corrected the tense. Thank you for highlighting this. For the residual inositol pyrophosphate signal in kcs1Δ. We do not know its origin. One possibility, which we now mention in the text, is that it stems from IPMK side activity. It should be underlined that all signals disappear upon PI starvation.

      Figure 2C, include the data points that the lines are built from (suggestion).

      We refrained from that for the line graphs. For reasons of consistency, we should do this for every line graph. If we did that, Fig. 4B would become quite hard to read.

      6) Figure 3B-D, please check that the stipples or hatches are in the figure - the printed copy lacked them although I could see them in the electronic version; this was also true for Figures 5 and 6 (I do not know if it is a printer issue, but other hatches were visible: e.g., not seen in S4 but seen in S5).

      They are visible in our copies, also after printing. They may have been lost during file conversion at the journal.

      7) The text description of the Pho4-yEGFP, Pho5-yEGFP and Pho84-yEGFP says that the kcs1Δ mutant "showed Pho4-yEGFP constitutively in the nucleus already ... and PHO5 and PHO84 were activated". However, the data is more complex than that: whereas the localization of Pho4-yEGFP is constitutively nuclear, there is a higher basal (repressed) expression of both Pho5 and Pho84 as well as increased expression of both proteins under -Pi conditions. What accounts for the increased expression when Pho4 is already nuclear? This is also seen in the vip1Δ kcs1Δ mutant.

      We agree with the reviewer, but we cannot explain this effect with certainty. One possibility could be a wider dysregulation of Pi metabolism in kcs1 mutants. To name a few possibilities: Wildtype cells have polyphosphate reserves that are gradually mobilized during the first hours of P-starvation. kcs1 mutants don't have those and might fall into a "deeper" state of starvation faster. It should be kept in mind that the starvation response is also regulated at the level of chromatin structure, and by antisense transcripts. The influence of kcs1 on these processes is unclear.

      8) Figure 9 legend: please add a definition of the MP region (in red) and include it more explicitly in the described model.

      We now mention the relevant region also in the legend and have labeled the relevant regions in the images (Huang et al., 2001).

      9) Figure S2 legend: information is missing (downloading link).

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      10) Figure S4 and S5, missing statistics.

      They have been added to the new Fig. S6, which interprets differences between strains and conditions. Fig. S4 (now S3) shows timecourses of IPPs down to zero. Adding statistics for all pairwise differences between the timepoints would be almost an overkill.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      It is very important to find practical and efficient means in order to increase agricultural productivity. Drawing on data from variable field environments, this study provides a useful theoretical framework to identify new factors that could increase agricultural production. There is solid evidence to support the authors' claims, though following the fate of candidate species after introduction into rice fields would have strengthened the study. Plant biologists and ecologists working in nature and fields will find the work interesting.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our framework useful. We have revised our manuscript according to the "Recommendations for the Authors" to improve our manuscript.

      Public Review

      Reviewer #1 (Public Review):

      This manuscript describes the identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you so much for evaluating our manuscript. We have carefully read and responded to your comments. We hope our responses resolve your concerns on our study.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain levels. For example, eDNA analysis of different time points on rice growth stages resulted in two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      We agree with your comments that we did not have the fitness data of the two species and/or rice seedlings. Thus, it is still difficult to obtain deep understanding of the mechanisms of our findings that the species introduced in the system would influence rice growth. Nonetheless, our study demonstrated the effectiveness of our research framework as we found evidence that the species that were discovered by the eDNA monitoring and time series analysis indeed cause changes in the system. We believe that the first step is to show that the framework is workable and that detailed understanding of the mechanisms or genetic pathway was not a focus of our study. To avoid misunderstanding, we have added several explanations regarding this point in L426–431 and L447. For example, in L426, we have added the following statement: "... the detailed dynamics of the two introduced species was unclear (i.e., the fate of the introduced species). This is particularly important for understanding how the introduced organisms affected rice performance...".

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      We did not check the fate of the two species (except measuring the eDNA concentrations of the species), and it is true that we cannot show evidence of "how" these two species influence the rice gene expression. Understanding molecular mechanisms of the phenomenon that we found is important (especially from the viewpoint of molecular biology), but our primary objective was to demonstrate that our "eDNA x time series analysis" framework is feasible for detecting previously overlooked but influential organisms. To this end, we believe that we achieved our objective and repeating the validation experiment should be for a different purpose (i.e., for understanding molecular mechanisms). We have clarified these points in L426–431 and L447 as explained above.

      Reviewer #2 (Public Review):

      The manuscript "Detecting and validating influential organisms for rice growth: An ecological network approach" explores the influence of biotic and abiotic entities that are often neglected on rice growth. The study has a straightforward experimental design, and well thought hypothesis for explorations. Monitoring data is collected to infer relationships between species and the environment empirically. It is analyzed with an up-to-date statistical method. This allowed the manuscript to hypothesize and test the effects most influential entities in a controlled experiment.

      Thank you so much for your careful evaluations. We are pleased to see that you evaluated our manuscript positively. We have further revised our manuscript according to your comments and hope the revision has resolved your concerns.

      The manuscript is interesting and sets up a nice framework for future studies. In general, the manuscript can be improved significantly, when this workflow is smoothly connected and communicated how they follow each other more than the sequence and dates provided. It is valuable philosophical thinking, and the research community can benefit from this framework.

      Thank you for your suggestions. In order to improve the logic flow and readability of our manuscript, we have revised the descriptions of workflow and clarified how the experimental and statistical steps were connected to each other. To do so, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). Also, we have moved all of the Supplementary Materials and Methods to the main text. We have thoroughly revised the manuscript, and we hope that all the parts of our manuscript have been connected more smoothly than in the original manuscript.

      I understand the length and format of the manuscript make it difficult to add more details, but I am sure it can refer to/clear some concepts/methods that might be new for the audience. How/why variables are selected as important parts of the system, a tiny bit of information about the nonlinear time series analysis in the early manuscript, and the biological reasoning behind these statistically driven decisions are some examples.

      We have explained how/why variables are selected (in L125), added more information about the nonlinear time series analysis (in L129 and L175) , and added the biological reasoning behind the statistical decisions (L195).

      Reviewer #3 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      Thank you so much for your evaluation of our manuscript. We have revised our manuscript based on your comments, and hope it has been improved compared with the original version.

      Strength

      In actual field data, since many variables are involved in a specific phenomenon, it is necessary to effectively eliminate false positives. Based on the metabarcoding technique, various variables that may affect rice growth were quantitatively measured, although not perfectly, and the causal relationship between these variables and rice growth was analyzed by using information transfer analysis. Using this method, two new players capable of manipulating rice growth were verified, despite their unknown functions until now. I found this process to be very logical, and I think it will be valuable in subsequent ecological studies.

      We are very pleased to see that you found our framework is very logical and potentially beneficial for future ecological studies.

      Weaknesses

      CK treatment's effectiveness remains questionable. Rice's growth was clearly altered by CK treatment. The validation of the CK treatment itself is not clear compared to the GN treatment, and the transcriptome data analysis results do not show that DEG is not present. The possibility of a side effect caused by a variable that the author cannot control remains a possibility in this case. Even though this part is mentioned in Discussion, it is necessary to discuss various possibilities in more detail.

      We agree that the effectiveness of the CK treatment was questionable. We have added some more discussion about this point in L376: "The unclear effects of the CK treatment relative to those of the GN treatment could be due to the relatively unstable removal method (i.e., C. kiiensis larvae were manually removed by a hand net) or incomplete removal of the larvae (some larvae might have remained after the removal treatment)."

      Reviewer #1 (Recommendations For The Authors):

      Comment #1-1 This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you for your careful evaluations of our manuscript. We are pleased to see you found that our approach is novel. We have revised our manuscript in accordance with your comments, and we hope that the revision and responses resolved your concerns.

      Comment #1-2 1. Experimental setting: Authors made up small scale pot system in 2017 and then expanded manipulative experiment. I do not understand how two influencing organism sequences were identified from the single treatment depending on different time points. How they can be convince the two organisms affect the rice growth rather than other biological and environmental factors.

      In 2017, we performed an intensive monitoring of the experimental rice plots and obtained large time series data (122-day consecutive monitoring x 5 plots = 610 data points). The time series data were analyzed using the information-theoretic causal analysis. The analysis is critically different from correlational analyses and designed to identify causal relationships among variables. Although we understand that field manipulation experiments are a common and straightforward approach to identify causal relationships among organisms, we chose the "fieldmonitoring + time-series-based causal analysis" approach. This is because, as explained in the main text, there are numerous factors that could influence rice performance, and it is practically impossible to perform manipulative experiments for all the potential factors that could influence rice growth. On the other hand, our "field-monitoring + timeseries-based causal analysis" approach has a potential to identify multiple factors under field conditions, even by the single experimental treatment.

      Nonetheless, we must admit that our time-series-based approach still has a chance to misidentify causal factors. Our framework relies on statistics, so the chance of false-positive detection of causality cannot be zero. This was exactly the reason why we performed the "validation" experiment in 2019. To complement the statistical results of the 2017 experiments, we performed another experiment in 2019.

      Comment #1-3 2. eDNA technology: The eDNA analysis based on four universal primers 16s rRNA, 18s rRNA, ITS, and COI regions must not be enough to identify specific species. The resolution of species classification may not meet to confirm exact species. Thus, the accuracy of two species that they selected for further experiment is difficult to be confirmed. Authors also referred to "putative Globisporangium".

      Your point is correct. The DNA barcoding regions we selected are short and it is often difficult to identify species. However, this limitation could not have been overcome even if we had chosen a different genetic marker. The long-read sequencing technology could partially solve the issue, but the number of sequence reads generated by the long-read technique is less than that by the short-read sequencing technology, and comprehensive detection of all species in an ecological community was still challenging. Our approach struck a balance among the identification resolution, comprehensiveness of the analysis, and sequencing costs. In addition, even though we could not identify most ASVs at the species level, some ASVs could be identified at the species level (52 ASVs among the 718 ASVs which had causal influences on rice growth), and we selected the two species (G. nunn and C. kiiensis) from the 52 species.

      Further, the taxa assign algorithm we used here (i.e., Claident; Tanabe & Toju 2012 PLoS ONE 10.1371/journal.pone.0076910) adopted conservative criteria for species identification and has a low falsepositive probability.

      More importantly, this is also the reason why we performed the "validation" experiment in 2019. The species identified in the 2017 experiment are still "potential" organisms that influence rice growth (i.e., the hypothesis-generating phase), and we tested the hypothesis in 2019.

      Nonetheless, we must admit that clear description of potential limitations is important. Thus, we have discussed this in L418: "As for the second issue, short-read sequencing has dominated current eDNA studies, but it is often not sufficient for lower-level taxonomic identification. Using long-read sequencing techniques (e.g., Oxford Nanopore MinION) for eDNA studies is a promising approach to overcome the second issue".

      Comment #1-4 3. Biological relevance 1: Authors identify two organisms as influencing organism for rice growth. As conducting the first experiment in 2017, the 2019 experiment was different from natural condition. The two experiments in 2017 and 2019 were conducted under different conditions. How do they compare the experiments? At least, the eDNA analyses in 2017 and 2019 should be very similar. I cannot find such data.

      The experimental conditions were different between 2017 and 2019 because they were conducted in different years. Theoretically, it is ideal if the experimental conditions in 2019 are covered by the range of experimental conditions in 2017 (e.g,. rice variety, air temperature, rainfall, and solar radiation). If this condition were satisfied, the attractor (i.e., rice growth trajectory delineated in the state space) in 2019 would be within that in 2017, and our model prediction in 2017 would be used to predict dynamics in 2019 accurately. To fulfill the conditions, we made as much effort as possible: we used the same rice variety and soils in 2019 as those used in 2017, and started our experiment at the same timing in 2019 as that in 2017.

      Although natural ecological dynamics cannot be precisely controlled, our monitoring revealed that the ecological dynamics in 2019 was qualitatively similar to that in 2017. To demonstrate that the experimental conditions and eDNA community data were similar between the two experiments, we have presented the climate and eDNA data in an inset figure in Figure 3a, Figure 1–figure supplement 2, Figure 3–figure supplement 2. We must admit that these dynamics are not identical, but we hope that this resolves your concern.

      Comment #1-5 4. Lack of detail description: In the Materials and Methods, there are many parts which lack on detail description. For instance, authors must described the two species cultivation, application concentrations, and application methods.

      We have moved Supplementary Materials and Methods to the main text and added more detailed descriptions in Materials and Methods. Also, to improve the logical flow and readability of our manuscript, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). We have added the reference for how to cultivate G. nunn in L608 (Kobayashi et al., 2010; Tojo et al., 1993) (C. kiiensis was not cultivated but removed from the system as in Materials and Methods), and application concentrations. Application methods were described in Materials and Methods, the section Field manipulation experiments in 2019 in L596.

      Comment #1-6 5. Validation: Application of one species clearly resulted to promote rice growth. They must include appropriate control treatment. If they pick same genus but different species that identified no specific effect on rice growth through eDNA analysis, no effect on growth can be provided. Generally application of large population of certain non-harmful organism confer plant growth promotion. It is not surprising result. Authors need to prove effectiveness of eDNA analysis. In addition, the field experiments required at least two years of consistent data for publication because environmental factors are so dynamic.

      Thank you for pointing this out. We agree with your comment that species that were predicted to have no effect should not promote rice growth in a validation experiment. It was also one of our inititial experimental plans to include such species in our manipulation experiment in 2019, but we could not include them because of the limitation of time, labor, and money. More extensive validation of the statistical results of the 2017 data, including multi-year experiments, would further validate the effectiveness of our approach, which should be done as future studies. To clarify this point, we have added statements in the paragraph starting at L396.

      Comment #1-7 In conclusion, I suggest that authors need more large data analysis and validate with more accurate and meaningful protocol.

      As we explained in the revised manuscript and the Response to Comments #1-2 to #1-7, our study demonstrated a novel research framework to detect previously overlooked influential organisms under field conditions. We agree that larger data analysis would be ideal to further validate our approach, but whether and how to collect larger data is constrained by time, money, and labor. We believe that our study was designed carefully and could provide meaningful avenues for developing an ecological-network based, novel, and environment-friendly agriculture solutions.

      Reviewer #2 (Recommendations For The Authors):

      Comment #2-1 Lines 97-110: This is so cool. Modeling with empirical data is very powerful. But a rice field is an open system consisting of metacommunity dynamics. Maybe a tiny bit of biological and biogeochemical background here would be good.

      Thank you for your comments. We have added a few examples of how and in which systems these methods were used to evaluate community dynamics and detect biological interactions in L109-L118.

      Comment #2-2 Lines 111-126: I like the summary of the study here. I think the influential species concept can be a little more elevated. Paine's famous keystone species work has been cited but a couple more pieces of literature can help to enhance the ecological importance of this work.

      We have explained the work by Paine (1966) a bit more and added one more paper that showed the effect of multiple predator species on the system dynamics at L88. We have also added a relevant sentence at L137 to emphasize the ecological/agricultural significance of our work.

      Comment #2-3 Experimental design/Figure 1:

      Is there any rationale behind choosing red individuals to measure the growth?

      Is there any competition between the individuals in the pots?

      Figure 1e: It is nice to show the ASVs in time. I wonder how the plot would look like when normalized by biomass/DNA content/coverage/rarefaction because of the seasonality.

      As for the first question, we chose the four individuals to minimize the edge effects (i.e., effects of microclimates and neighboring rice would be different between the four rice individuals and those planted in the edge regions). We have mentioned this in the legend of Figure 1.

      As for the second question, there might be competition among the individuals in the pot. However, we did not measure the effect of competition (e.g., by comparing the growth with/without other rice individuals).

      As for the third question, we published detailed dynamics of ecological community in the Supplementary Figures in Ushio (2022) Proceedings B https://doi.org/10.6084/m9.figshare.c.5842766.v1. In addition, we have uploaded a video showing the temporal dynamics of some top (= most abundant) ASVs in https://doi.org/10.6084/m9.figshare.23514150.v2.

      We have mentioned the supporting information in L153.

      Comment #2-4 Line 146-147: Is this damage influence the inferences? Maybe it is better to justify.

      While we occasionally observed physical damages, it is unlikely that they affected our causal inference because the changes in the rice heights due to the damages were smaller and less frequent than those due to growth. We have noted this at L151.

      Comment #2-5 Line 161-162: Maybe refer readers to the methods section where you explain UIC analysis. It'd be easier to interpret the figures.

      Mentioned.

      Comment #2-6 Line 175-176: I believe very brief information in the intro about the organisms might help explain the hypothesis and interpret the results better.

      We have included brief information of the two species at L197.

      Comment #2-7 Figure 2: Species interaction strength: Are these proxies to the Jacobians? Is there a threshold for the influence we can consider strong/weak? For example, influential species compared to diagonal elements of the Jacobians (intraspecies interactions) could be shown as a mean vertical line in Figure 2b.

      "Influences to rice growth" in Figure 2b is transfer entropy (TE) from a target ASV to rice growth. They are not proxies of the Jacobians, but they might positively correlate with the absolute value of the Jacobians. We have clarified this point in the legend (L953). More direct estimations of the Jacobian can be done using the MDR S-map method (Chang et al. 2021 DOI:10.1111/ele.13897), but we did not perform the MDR S-map in the present manuscript (see Ushio et al. 2023 https://doi.org/10.7554/eLife.85795 for the application of the MDR S-map). As for TE, there is no clear threshold to distinguish strong/weak interactions.

      Comment #2-8 Figure 2: Looking at panels c and d, it looks like there is a negative frequency selection between two influential species. Is it a reasonable observation?

      This is an interesting point. In this manuscript, we have not carefully examined the interspecific relationship between these two particular species. However, the interspecific interactions were examined in detail and reported in Ushio (2022) Proceedings of the Royal Society B DOI:10.1098/rspb.2021.2690). We re-checked the result in Ushio (2022); although there is a negative correlation between them, we did not find any (statistical) causal relationship between them.

      Comment #2-9 Line 209: What is t-SNE analysis? Because of the manuscript's format, maybe methods should be shortly referred to in the relevant section or explained in brackets.

      We have spelled out t-SNE.

      Comment #2-10 Line 212-214: Maybe briefly explain what the hypotheses are for the alternative analysis, and what is the contribution of the results to the study.

      We have added a brief explanation at L241: "Alternative statistical modeling that included the treatments (the control versus GN or CK treatments) and manipulation timing (i.e., before or after the manipulation), which simultaneously took the temporal changes of all the treatments into account, also showed qualitatively similar results (Supplementary file 4), further supporting the results."

      Comment #2-11 Figure 3b/c: Maybe species names as panel titles could be helpful. d: Treatment names with initials in the legend could be also helpful to read the plots.

      We have added species name as panel titles of Figure 3b,c. Treatment names were included in the legend of Figure 3.

      Comment #2-12 Line 233: Maybe mention why the manuscript uses the word "clear".

      We have mentioned this in L185.

      Comment #2-13 Line 234-236: I think that these alternative tests should be explained somewhere.

      We have revised the sentence so that it includes some explanations (L241). Also, we have referred to Materials and Methods.

      Comment #2-14 Figure 4: The title says ecological community compositions, and panels show the growth rates and cumulative growth.

      Thank you for pointing this out. This was a typo and we have corrected it.

      Comment #2-15 Lines 246-269: Can these expression patterns be transient and relevant to the time point that the sample is taken?

      Yes, these expression patterns were transient. We collected rice leaf samples for RNA-seq 1 day before the first manipulation and 1, 14, and 38 days after the third manipulation (see Supplementary file 3 for the sampling design). When we merged the pot locations, we observed no difference in the gene expression for samples 1 day before the first manipulation and 14 and 38 days after the third manipulation (except for two genes in samples 38 days after the manipulation), and thus, we consider the DEGs that appeared only in the short period after the manipulation. We have mentioned this in L278 and L383: "We found almost no DEGs for leaf samples taken one day before and 14 and 38 days after the third manipulation (the leaf sampling event 1, 3, and 4), suggesting that the influences of the treatments on the gene expression patterns were transient." (L278) and "These changes were observed relatively quickly and transient." (L383)

      Comment #2-16 I wonder if a conceptual framework figure would help to generalize the workflow that can be used for other studies.

      Thank you for your suggestion. Although we agree with your comment that such a figure would be helpful to generalize the workflow, we believe that our framework is clear and decided not to include it in the present manuscript. We might consider including such a figure (like Figure 1a in Ushio 2022) if we have an opportunity to write a review paper regarding this topic.

      Comment #2-17 Lines 329-335: I feel this information is unclear in the early manuscript. Maybe it's necessary to clearly communicate in the beginning.

      We have explained that we could not find any relevant information at least at the time we detected the ASVs in L189.

      Comment #2-18 Lines 336-337: Can these species be identified in the previous data set from the ASV sequences?

      Yes, these species were identified in the DNA data set obtained in 2017.

      Comment #2-19 Lines 387-397: Are there any measurements such as total biomass, and statistical methods to help with the eDNA bias and data compositionality?

      We have confirmed that our quantitative eDNA metabarcoding generates comparable results with the fluorescence-based method and quantitative PCR (e.g., see Supplementary Figures in Ushio 2022) (mentioned in L310 in the revised manuscript). However, at least in this study, we could not perform a direct comparison of the eDNA data with species abundance and/or biomass. This is partly because the number of our target species was too large (> 1,000 species). The accurate estimation of species abundance and/or biomass is one of our next goals.

      Comment #2-20 Line 472: Maybe mention transfer entropy somewhere in the early manuscript.

      We have mentioned this in L175.

      Comment #2-21 Lines 494-503: Maybe a summary of this reasoning should be mentioned somewhere in the early manuscript too.

      We have described a brief summary of the reasoning in L195.

      Comment #2-22 Lines 29-33 If this sentence is simplified it might be easier to follow.

      The sentence has been divided into two sentences in L28. Also, each sentence has been simplified.

      Comment #2-23 Line 38 Maybe "macrobes" can be explicitly mentioned. Fungi, protozoa, etc.

      Mentioned.

      Comment #2-24 Line 139: I am not sure if the date should be in the title.

      Similar monitoring was done in 2017 and 2019. Thus, we think the date is necessary in the section title.

      Comment #2-25 Figure 1: There are 4 red individuals in the design but 5 measurements in the plots.

      Heights and SPAD of the four individuals were measured for each plot and the averaged values were used as representative values for each plot. Therefore, 20 measurements (= 4 rice individuals 5 plots) were done every day, but each plot has one rice height for each day. We have clarified this in the legend of Figure 1: "the average values of the four individuals were regarded as representative values for each plot."

      Comment #2-26 Figure 1b: Maybe use the same axis length for the temperature as the other plots?

      Corrected.

      Comment #2-27 Lines 259-261: Are there the names of the genes in databases?

      Yes, these are gene names used in the rice databases (e.g., The Rice Annotation Project Database; https://rapdb.dna.affrc.go.jp/inde x.html).

      Reviewer #3 (Recommendations For The Authors):

      Comment #3-1 Additionally, RGR is not statistically significant, but statistical significance is observed only in cumulative growth because data presentation does not reflect plant characteristics. RGR changes according to the developmental stage of the plant. Therefore, if RGR data are shown separately according to the rice growing season, the cumulative growth pattern and the pattern will appear similar.

      RGRs were calculated daily (i.e., cm/day) and they changed depending on the developmental stage of the rice (Figure 1 and Figure 4–figure supplement 1). Therefore, we might find similar RGR patterns if we focus on a specific period of the growing season. However, unfortunately, we performed the intensive (i.e., daily) monitoring in 2019 only during the field manipulation period (middle June to middle July 2019), and we cannot investigate the changes in cumulative growth throughout the growing season (this depends on how many days we add up RGR to calculate the cumulative growth, though). We agree that, if we had investigated the detailed pattern of RGR throughout the growing season in 2019, we could have found similar pattens between RGR and cumulative growth rate at a certain period in the growing season. In Figure 4, the cumulative growths were calculated based on the RGRs before the third manipulation or during 10 days after the third manipulation. We clarified this in the legend of Figure 4.

    1. 12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever. https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://constitutioncenter.org/blog/vote-now-an-amendment-to-end-the-electoral-college https://www.nytimes.com/2020/02/09/opinion/letters/electoral-college.html https://www.latimes.com/opinion/readersreact/la-ol-le-electoral-college-20180904-story.html you are offline https://slate.com/news-and-politics/2014/05/amending-the-constitution-is-much-too-hard-blame-the-founders.html we the people rise again https://slate.com/news-and-politics/2012/06/fix-the-constitution-amending-by-national-referendum.html safe souls, safe fu https://slate.com/news-and-politics/2012/06/fixing-the-constitution-protecting-informational-privacy.html https://slate.com/news-and-politics/2020/05/new-reconstruction-constitution-democracy.html We the People of Slate … The U.S. Constitution, as you mighta been, shoulda [“come” on … its someday] rewrϕte it. "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state’s population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn’t like in the Constitution and pencil in their hearts’ desires. Here’s what the document would look like with their best ideas." Slate: u_s_constitution as_rewritten by_slate_legal_experts_and_readers 多也了了夕 "with a wand of scheffilara, 并#亦太 he begins … "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon." Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land … to a story of many “two cities” that culminates in a cultural or societal or “evolutionary” link to Sodom and Gomorrah and the city-state of Babylon (and it’s Hanging Gardens) and also of course to Paris and Troy and “Masstodon” and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the “Cable” to see state to “London” … recently I called it “the city of realms” … I started out logically intending to link “game theory” and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness … turned into a premonitory discourse of “two cities” and how sometimes even things as obvious as the number of letters in the word “two” don’t do a good enough job of conveying … how and/or why one is simply never enough, and two isn’t much better–but in the end a circle … is drawn; the perfect circle in our imaginary mathematical perfection … I see a parted “line” in the letter pronounced “tea” (and beginning that word); and two “vee” (pron. of “v”) symbols joined together in a word we pronounce as “double-you” … and symbolically because I know “V” is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals– It’s important to pause; here. I am going to write a more detailed piece on “the two cities” as I work through this maze like crossroads between “them” and “demo…” … here demorigstrably I am trying to fuse together an evolutionary change in … lit. biological evolution as well as an echelon leap forward in "self-government" … in a place where these two things are unfathomable and unspokenly* connected. https://www.google.com/search?q=prometheuslocke+%2Bsite%3Agodlikeproductions.com “Silence is betrayal” -MLK To a question on the idiom; is Bablyon about “the law” or “of the land of Nod?” “What is democracy” … the song, Metallica’s “ONE” echoes and repeats; as we apparently scrive together the word “THEM” … I question myself … if Babylon were the capital city of some mythical Nation of Time … if it were the central “turning point” of Sheol; ... >|< Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?" ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;" It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... by the framers [not just of the USA; but English .. and every language]  ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I. Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" .... in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ... and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance." https://www.google.com/search?q=%22I+AM%22+%22WE+ARE%22+%2Bsite%3Afromtaws "SOIS" a key--in two languages conjugated literally as both "I AM" and "WE ARE" simultaneously; Search: I know that if I am than so are you ... and it is because we have overcome .... something I truly cannot figure out, fathom, or believe ... was truly here before us--a spiralling series of failures ... speaking: to the heavens; but in secret and in action; "doing everything possible to succeed." It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell. I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space--- it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl. I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things--- I can't figure out why I am the only person screaming "this is Hell." That's also, Hell. ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time. This name may be viewed either as meaning "father of many" in Hebrew or else as a contraction of ABRAM (1) and הָמוֹן (hamon) meaning "many, multitude". The biblical patriarch Abraham was originally named Abram but God changed his name (see Genesis 17:5). https://en.wikipedia.org/wiki/Yeshua#Yeshua,_Yehoshua,_and_Yeshu_in_the_Talmud K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ... ---and that's a relationship of "3 is to 11" as [the SAT style "analog]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]" I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as 2 MANY ALSO ICI; 1twoⅱ ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain." http://www.goodmath.org/blog/2015/07/21/arabic-numerals-have-nothing-to-do-with-angle-counting/ https://gizmodo.com/no-this-viral-image-does-not-explain-the-history-of-ar-1719306568 https://en.wikipedia.org/wiki/Chinese_word_for_%22crisis%22 https://dictionary.hantrainerpro.com/chinese-english/translation-ji_howmany.htm https://dictionary.hantrainerpro.com/chinese-english/translation-duo_many.htm https://en.wikipedia.org/wiki/Euripides, Iphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens. The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect. J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this." Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is" I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN." Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?" NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems. Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters. This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....   Since the very earliest days of this story, I have asked for better for you, even than see Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean--- https://www.youtube.com/watch?v=M8CyN1awWls https://link.springer.com/chapter/10.1057/9780230366688_16 https://www.youtube.com/watch?v=YDo5zvYNn3A Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius--- It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies." ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss." "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free. https://archive.org/details/texts http://zlibraryexau2g3p.onion.pet/ Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov? * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion between hex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML. Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work. Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded" Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool." If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure. While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage. Menelaus, in Greek mythology, king of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared. https://www.britannica.com/topic/Menelaus-Greek-mythology This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York. Μυκῆναι, Μυκήνη The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece 37°43′49″N 22°45′27″ECoordinates: 37°43′49″N 22°45′27″E This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols. Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern Peloponnese, Greece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2] In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3] 3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..." https://en.wikipedia.org/wiki/Clymene_(mythology) Melpomene (/mɛlˈpɒmɪniː/; Ancient Greek: Μελπομένη, romanized: Melpoménē, lit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other. Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto. In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2] See also [AREXMACHINA] Muses in popular culture The Nine Muses Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] Navajo: Kinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097. Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city. Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s. The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National Park, Oak Creek Canyon, the Arizona Snowbowl, Meteor Crater, and Historic Route 66. #PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ... pps: a magnanimous decision ... I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind. Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes. I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra. Apparently the name of a castle; though I wasn't aware of that until much later. It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all." I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true. I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit. "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight." Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm. It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope. Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss." Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander. These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s). I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve. It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ... I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ... that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of-- "waking up" tio a nu def of #Neopoliteran. ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world. Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis" ... search high and lo ... the depths all the way to above the heavens ... for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;] — @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020 I ... TERON; Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see. I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me." I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all. The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk. The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think. I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously. I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask. I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come." Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well-- Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon" The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana. — Chapter 1, Verse 10[3] In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :). While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol). The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards." THE SOFTWARE, SINGERS, AND SHIELD(S) OF HEIROSOLYMITHONEYY Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully .... Narayana (Sanskrit: नारायण, IAST: Nārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism. andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast; ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ... . there's no place like home. there's no place like home. there's no place like home. and so it begins ... "f: r e l i g i o n find out what it means to me. faucet, ever single one, stream of purity ... from Fort Myers ... f ... flicks ... Flint. " ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above. A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium" ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table*" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec.* to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant). The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant*"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e., e.g, for "iv" example, id est, exemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al.. To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements) alas, babylon. i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens." i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place. "fallen, fallen, is [the city of] babylon the great" ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible" "a dwelling place [of] (the) demons (say), it has become." www.icann.org/news/blog/the-problem-with-the-seven-keys kauri on IPFS: has-abaslom-and-the-ethos-of-arcadia

      12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever.

      you are offline

      we the people rise again

      safe souls, safe fu


      We the People of Slate ...

      The U.S. Constitution, as you [mighta been, shoulda "come" on ... its somedayrewrϕte it.

      "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state's population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn't like in the Constitution and pencil in their hearts' desires. Here's what the document would look like with their best ideas."

      多也了了夕 "with a ~~wand~~ of scheffilara, 并#亦太 he begins ... "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon."

      Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land ... to a story of many "two cities" that culminates in a cultural or societal or "evolutionary" link to Sodom and Gomorrah and the city-state of Babylon (and it's Hanging Gardens) and also of course to Paris and Troy and "Masstodon" and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the "Cable" to see state to "London" ... recently I called it "the city of realms" ... I started out logically intending to link "game theory" and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness ... turned into a premonitory discourse of "two cities" and how sometimes even things as obvious as the number of letters in the word "two" don't do a good enough job of conveying ... how and/or why one is simply never enough, and two isn't much better--but in the end a circle ... is drawn; the perfect circle in our imaginary mathematical perfection ... I see a parted "line" in the letter pronounced "tea" (and beginning that word); and two "vee" (pron. of "v") symbols joined together in a word we pronounce as "double-you" ... and symbolically because I know "V" is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals--

      It's important to pause; here. I am going to write a more detailed piece on "the two cities" as I work through this maze like crossroads between "them" and "demo..." ... here demorigstrably I am trying to fuse together an evolutionary change in ... lit. biological evolution as well as an echelon leap forward in "self-government" ... in a place where these two things are unfathomable and unspokenly* connected.

      To a question on the idiom; is Bablyon about "the law" or "of the land of Nod?"

      "What is democracy" ... the song, Metallica's "ONE" echoes and repeats; as we apparently scrive together the word "THEM" ... I question myself ... if Babylon were the capital city of some mythical Nation of Time ... if it were the central "turning point" of Sheol; ... >|<

      Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?"

      ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;"

      It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... *by the framers [not just of the USA; but English .. and every language] *

      ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I.

      Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" ....

      in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ...

      and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance."

      It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell.

      • I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space---

      it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl.

      • I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things---

      • I can't figure out why I am the only person screaming "this is Hell." That's also, Hell.

      ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time.

      K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ...

      ---and that's a relationship of "3 is to 11" as [the SAT style "analogy)]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]"

      • I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as

      2 MANY ALSO ICI; 1two ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain."

      • https://en.wikipedia.org/wiki/EuripidesIphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens.

      • The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect.

      J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this."

      Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is"

      I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN."

      Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?"

      NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems.

      Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters.

      This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key

      my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....

      Since the very earliest days of this story, I have asked for better for you, even than see

      Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean---

      Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius---

      It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies."

      ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss."

      "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free.

      Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov?

      * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion betweehex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML.

      Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work.

      Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded"

      Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool."

      If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure.

      While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage.

      Menelaus, in Greek mythologyking of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared.

      This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York.

      Μυκῆναι, Μυκήνη

      Lions-Gate-Mycenae.jpg

      The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece

      37°43′49"N 22°45′27"ECoordinates37°43′49"N 22°45′27"E

      This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

      Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern PeloponneseGreece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2]

      In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3]

      3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..."

      Melpomene (/mɛlˈpɒmɪniː/Ancient GreekΜελπομένηromanizedMelpoménēlit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other.

      Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto.

      In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2]

      See also [AREXMACHINA]

      Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] NavajoKinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097.

      Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city.

      Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s.

      The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National ParkOak Creek Canyon, the Arizona SnowbowlMeteor Crater, and Historic Route 66.

      PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ...

      pps: a magnanimous decision ...

      I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind.

      Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes.

      I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra.

      Apparently the name of a castle; though I wasn't aware of that until much later.

      It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all."

      I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true.

      I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit.


      "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight."

      Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm.

      It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope.


      Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss."


      Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander.

      These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s).

      I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve.

      It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ...

      I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ...

      that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of--

      "waking up" tio a nu def of #Neopoliteran.

      ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world.

      Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis"

      ... search high and lo ... the depths all the way to above the heavens ...\ \ for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;]

      --- @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020

      I ... TERON;

      Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see.

      I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me."


      I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all.

      The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk.

      The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think.


      I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously.


      I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask.

      I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come."


      Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well--

      Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon"

      The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel

      The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana.

      --- Chapter 1, Verse 10[3]

      In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :).

      While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol).

      The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards."

      THE SOFTWARE, SINGERS, AND SHIELD(S)

      OF

      HEIROSOLYMITHONEYY

      Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully ....

      Narayana (Sanskrit: नारायण, IASTNārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism.

      andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast;

      ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ...

      .

      there's no place like home. there's no place like home. there's no place like home.

      and so it begins ... "f:

      r e l i g i o n

      find out what it means to me. faucet, ever single one, stream of purity ...

      from Fort Myers ... f ... flicks ... Flint.

      "

      ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above.

      A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium"

      ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec. to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant).

      The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process*. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e.e.gfor "iv" exampleid estexemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al..

      To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements)

      alas, babylon.

      i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens."

      i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place.

      "fallen, fallen, is [the city of] babylon the great"

      ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible"

      "a dwelling place [of] (the) demons (say), it has become."

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the Review Commons editor and three reviewers for their enthusiastic response, including their constructive suggestions and appreciation of the high impact and originality of our study. We have completed the revisions and new analyses suggested by the reviewers, and we thank the reviewers for their suggestions to increase the impact and interest in this work and for guiding us towards this much improved manuscript.

      In this response letter, we present the response to each reviewer comment and associated revisions made to the text and figures as bullet points below the reviewers' text (black text).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Yang et al. took advantage of recently published long-read-based genomic sequences of nearly homozygous genomes from complete hydatidiform moles to retrieve allelic sequences of LINE-1, the currently only active and autonomous retrotransposon of the human genome, and produced the repertoire of intact LINE-1 in a genome. The authors performed cell-culture-based retrotransposition assays measurements and in vivo fitness estimations of all identified intact LINE-1 to infer evolutionary dynamics. In this article, the authors further validate the major contribution of polymorphic LINE-1 to the de novo retrotransposition events in the human genome. They also described, at unprecedented resolution, allelic variations among LINE-1 loci and the potential impact of these variations to the interpretation of mutagenic potential of each LINE-1 locus.

      Major comments:

      1 - The key conclusions of the article are mostly convincing. However, it would be a substantial improvement to consolidate the data of the article with information about known active LINE-1s in germ cells or in cancer by using data from recent publications of the Devine and Tubio labs (for example PMID: 34772701, 32024998, 25082706). Across the article, no mention is made of the transductions generated during LINE-1 de novo retrotransposition, which is instrumental to monitor in vivo activity of a group of LINE-1 active copies. It would be of particular interest to make a link between in vitro activity from this study with LINE-1 classification based on their observed activity in cancer (PMID: 32024998, Figure 3b).

      • We thank this and the other reviewers for this suggestion. We agree that a more explicit comparison to the often-reported counts of 3’ transductions would be a valuable addition to our analyses. We have added the 3’ transduction counts from PMID:34772701, PMID:32024998 and PMID:25082706 to Table S2 (column Y, Z and AA), and made a comparison between these data and our Hamming-distance-based in vivo activity, as the new Figure S5. We found correlations between the two measurements in a significant proportion of LINE-1s, but some interesting exceptions exist which likely reflects the fact that most catalogued 3’ transductions come from cancer genomes, and cancer and germline cells represent distinct cellular environments in which distinct sets of LINE-1s are able to replicate (and leave 3’ transductions). In addition to the new figure (Figure S5), we have added a discussion paragraph focused on this interesting comparison.

      2 - The use of CHM1 BAC library Sanger sequencing validation and comparison with CHM13 and hg38 sequences is instrumental to support the building of LINE-1 repertoire in CHM1 genome, which is a valuable contribution of the article. The use of a distance-based metric to infer fitness of a LINE-1 is an interesting approach and allow to group LINE-1 copies based on their in vivo activity potential. Again, it would be beneficial to correlate the inferred fitness and retrotransposition activity of copies/alleles, when known, from the above-mentioned literature.

      • The sequence validation of LINE-1 sequences in CHM1 is an important point which we have addressed in the edited manuscript. Specifically, we used three forms of sequence validation including end-sequencing of one clone of each LINE-1 after it was cloned into the retrotransposition vector and whole-plasmid sequencing of select LINE-1s with discrepant activity amongst the three clones we assayed. In addition, we sequenced the entire LINE-1 sequence for four LINE-1s which had the largest number of mutations relative to their allelic counterpart in CHM13. Please see the above response to ‘Major comment 1’ for details of our new analysis comparing the previous literature to our data.

      3 - Some aspects of the writing of the article should be improved to better support the conclusions.

      • We thank the reviewer for providing these examples of parts of the text that were particularly difficult to read and comprehend. We have deeply streamlined and improved the text throughout the manuscript based upon detailed editing for readability and clarity by two experienced scientific writers. Below, we detail how we revised the particular sections presented by the reviewer, but we think the entire manuscript is now more succinct and clearer.

      • In general, the descriptions are dense, and details could be provided in a more direct way to lighten the results section. Several redundancies in the discussion can be combined to increase clarity.

      • We have spent considerable time tightening up the text, including removing several overlapping sections from the discussion which can be seen in the included version with changes tracked.

      • There is a lack of clarity in the description of how was handled each pair of alleles for which retrotransposition measurements vary between the study and the literature (last paragraph of the "Comprehensive measurement of LINE-1 in vitro activity in a human genome" section). It is not completely clear how the analysis was done and the way the data is presented in File S3 is not helping to support the conclusion. It could be useful to include some illustrative examples in a panel of Figure 2.

      • We agree that this description was hard to parse, and we have rewritten this and accompanying methods to simplify our explanation of these results. In addition, we have revised Figure 2 to show the data in much more detail. To further aid the logic flow related to this section, we moved the previous Figure 5B to Figure 2B, updated it with more suitable examples and edited the associated descriptions.

      • Regarding inferred in vivo activity, the text contains alternative description with the use of "fit" / "unfit", in vivo "active" / "inactive" or "no closely related LINE-1s" terms. The authors should find a way to clearly define and systematically use one set of terms to enhance clarity along the article. To parallel with in vitro active/inactive, it would be useful to use in vivo fit/unfit.

      • We thank the reviewer for this suggestion and agree with their suggested unified use of ‘in vivo fit/unfit’. To clarify and simplify these terms as much as possible, we added detailed explanations of in vivo / in vitro activity and systematically defined in vitro "active/inactive" (page 5, right column, line 50) and in vivo "fit/unfit" (page 8, left column, line 26) at their first appearance in the article, and we changed most instances of "in vivo activity" to "in vivo fitness" when context permits.

      4 - The authors suggest that in vitro activity can be predicted by integration of population frequency and in vivo activity (/fitness) (second paragraph of the "An analysis of LINE-1 evolutionary history [...] and in vivo activity" section). It would be beneficial to strengthen the writing of this section and ultimately validate/test the model by including data from some of the previous studies (e.g. Brouha 2003, Lutz 2003, Seleme 2006, Beck 2010, Rodriguez-Martin 2020, Chuang 2021).

      • We have thoroughly revised this section of the results (see response to ‘Major comment 3’ above), per the reviewers suggestion, to increase reader comprehension of this important data. In addition, we greatly appreciate the reviewer’s suggestion of a very interesting experimental direction – moving beyond a single long-read-based genome to many diverse genomes, and ultimately calculating the in vivo fitness of the LINE-1s from these diverse genomes. For a long time this has not been possible, but the recent publication of the Human Pangenome presents an opportunity to study this interesting question. Though beyond the scope of this paper, our lab is actively working on this fascinating question, and we appreciate the reviewer’s shared interest in this question.

      5 - The identification of adaptive mutations is only partially described and not strongly supported by experimental or analytical data. It would be interesting to explore the role of phylogenetically informative sites described in Figure 5B/C by testing non CHM1 alleles in retrotransposition assay (by introducing amino acid changes into the cloned CHM1 LINE-1 alleles) or by positioning the sites in ORF1p or ORF2p structure and/or domains to infer impact on functionality.

      • The reviewer rightly points out that this is one of the most interesting and novel findings of our manuscript. However, the testing of potentially adaptive mutations is potentially complicated and nuanced. Specifically, we don’t know the mechanism by which these mutations might be adaptive. It is possible that they simply increase in vivo germline retrotransposition activity and this increase would be reflected by an increase of in vitro retrotransposition activity. However, another possibility is that these adaptive phenotypes only show themselves in vivo or in the context of the host restriction factors expressed in the germline. We strongly agree with the reviewer that experimental and analytical data on the phylogenetic informative sites associated with the Figure 5 phylogeny is the key to finding out the mechanisms for these changes to affect LINE-1 activity/fitness, and we are, indeed, exploring this very question in the lab now with related projects. We respectfully suggest that these (extremely cool) experiments are beyond scope of this paper, but we have also added some more detailed description and analyses of the potentially adaptive LINE-1 variations from Figure 5 (from page 9, right column, line 50 to page 10, left column, line 5).

      Minor comments:

      1 - Regarding the in vitro retrotransposition assay, it would be beneficial to provide more data. The current Figure 2 could be enriched by the addition of data related to the variation in the replicates of the experiment (technical but mostly biological with the three clones per LINE-1 tested). Figure 2 could include a dashed line for 100% L1RP and 5% (since it is used as a threshold). It would be useful to provide an additional panel in Figure 2 to illustrate alleles of LINE-1 that are active in this study and compare the values obtained previously in other studies. Similarly, a supplemental table or alignment could be provided to document amino acid changes in the two alleles of each pair (see comment above in the Major Comment 5). The L1Hs subfamilies could also be included in the graph of Figure 2 to support the conclusions of remaining active old L1Hs at allelic forms in the human genome.

      • Upon consideration of this helpful comment, we now augment the presentation of our in vitro activity data with a remade Figure 2 with boxplots to show the variation of the data, as well as a horizontal dashed line showing the active-cutoffs and star signs showing which LINE-1s belong to L1Hs or L1PA2.

      2 - Also, the validation of cloning is not well described. The choice of PCR validation must be supported by more technical details on the design of the primers used to validate each copy. The authors should clearly state that the strategy chosen for retrotransposition assay does not rely on the transcription from LINE-1 5UTR but from an upstream strong promoter, ruling out the role of potential mutations in LINE-1 promoter.

      • As detailed above in the response to ‘Major Comment 1’, we used a combination of end sequencing, whole plasmid sequencing, and multi-read Sanger sequencing to validate the sequences of each LINE-1 cloned from a CHM1 clone. When cloning each LINE-1, we used a specific set of primers designed for the ends of the UTRs for each LINE-1. We have updated the methods and text to clarify this cloning step, and the sequences of these oligos are included in Table S2.
      • To clarify the fact that our retrotransposition assays use a common, strong promoter, we added text in several places stating this setup and discussing (paragraph that starts at page 11, right column, line 18) how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity.

      3 - There are discrepancies with the reported numbers of LINE-1s between Figure 1A and Table S1: 154 vs. 151 in CHM1, 144 vs. 143 in CHM13, respectively.

      • We thank the reviewer for spotting this error on our part. The numbers in Figure 1 and the main text were correct, and we have revised Table S1 to reflect this data.

      4 - The choice of colors in Figure 3 is not perfectly clear and sometimes not as reported in the text (green highlight and orange highlight). Part of the Figure 3 legend is missing. It should include a description of the color code chosen for the right histogram.

      • We thank the reviewer for bringing this inconsistency to our attention. Based upon feedback from all reviewers, we have simplified the color scheme in Figure 3 and Figure 5 to focus on the core conclusions of these two figures. Specifically, in Figure 3, we have removed the quadrant shading and more clearly presented the cutoffs of ‘polymorphic/high frequency’ and ‘in vitro active/inactive’ as dashed lines in the scatter plot. In Figure 5, we have simplified to two colors – black for in vivo unfit and orange to show the in vivo fit LINE-1s which is also used in Figure 4 to show the definition of in vivo activity. These updated colors are now defined in the figure legends and main text, and we have made references to these colors consistent throughout.

      5 - For Figure 4, it would be useful to define in the legends the color code for the top histogram. To better read the scatter plot, the words "fit" and "unfit" could be added on each side of the vertical dashed line.

      • We thank the reviewer again for suggestions to improve the clarity of our figures. As mentioned above in ‘Minor comment 1’, we have removed unnecessary colors including the gradient of the histograms in Figure 3 and Figure 4, since the boundaries of each bin are already defined by the axis labels and tics. As suggested, we have also added ‘fit’ and ‘unfit’ labels to the dashed cutoff line in Figure 4 to clarify the meaning of this line.

      6 - In panel B of Figure 5, it seems that the color code and hot/cold description is not fully formatted.

      • This formatting error has been corrected.

      Reviewer #1 (Significance (Required)):

      In this article, Yang and colleagues present an unprecedented view of the allelic diversity of young LINE-1 copies related to variable retrotransposition activity in an individual genome. One key aspect of their work is the description of the presence of young active LINE-1 alleles that are absent or non-intact in other genome assemblies, while described at a lower scale in initial work from the Kazazian and Moran labs, cited in the manuscript. The work of Yang et al. demonstrates the requirement of multiple approaches and long-read-based sequencing of individual genomes to fully infer the mutagenesis risk of LINE-1 activity.

      The data and methods provided by the authors open the door to a more systematic analysis of mutations and rare allelic forms to understand both mechanistic aspects and evolution of LINE-1 retrotransposition in the human genome. The identification of rare allelic forms of old LINE-1 that retain activity despite previously being considered as inactive is particularly interesting in the light of LINE-1 evolution in the human genome. The authors also describe allelic diversity inside of the Ta1d subfamily, suggesting further diversification and emergence of LINE-1 subgroups. Together with the identification of nucleotide polymorphism among LINE-1 copies, these findings strengthen the notion of individual genomes with individual set of potentially mutagenic LINE-1 alleles.

      The findings and methods described in this article are of great interest to a wide audience including the fields of research focusing on human genome evolution, transposable elements, genomic instability, human genetic variation, and personalized medical diagnostic.

      Aurélien J. Doucet CNRS - Université Côte d'Azur

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript is an interesting and well-crafted study of LINE-1 activity at the single genome human genome level using long read-based haploid assemblies. The manuscript has some real gems and address critical aspects of LINE- biology that are typically not rigorously examined. The authors are to be commended for undertaking this exercise and for providing interesting perspectives that challenge the dogma that dominates the field in several areas. Despite the noted strengths of the contributions, the manuscript ignores the clear limitations inherent to the approaches taken and at times appears as dogmatic as the dogma that they themselves are trying to challenge. These deficiencies should be addressed before this manuscript is published.

      • We thank Reviewer 2 for their enthusiastic appreciation of the value and innovation of our manuscript. We also thank the reviewer for encouraging us to make careful consideration of the missing references relevant to our findings. We have had two researchers with experience in relevant fields edit our text for both readability, clarity, and proper inclusion of relevant references. We have added these throughout and taken careful effort to replace ‘dogmatic’ statements with clear presentations of the data and thorough referencing of the relevant literature.

      Several major and minor points to consider during revision include:

      Major:

      1. Several strategies have been published in the past that have confidently assign LINE-1s to specific loci despite use of shorter reads. These works should be acknowledged, even if as stated in the manuscript, use of longer reads will only continue to add confidence and validity to future assignments.

      2. We thank the reviewer for this suggestion, and we apologize for the omission of these important publications. As noted above, we have added numerous relevant references (reference 17-27 in the revised text) throughout the text including previous work that used short reads to confidently assign polymorphic/non-reference LINE-1s to specific loci. For example, we now cite the MELT pipeline to detect de novo L1 insertions with short reads (PMID: 28855259), and Iskow et al. 2010, which detects LINE-1s with junction fragment sequencing (PMID: 20603005). We have also added additional text to clarify that short reads are, indeed, often sufficient to place new LINE-1 insertions, while long reads are especially useful for resolving the sequence and location of these insertions. The new text (page 2, left column, line 22-30) presents the advantages/disadvantages of both short reads and long reads.

      3. One of the important requirements for precise quantification of LINE-1 activity and predicted risk scores cited in the manuscript was the need to predict activity based on sequence and location. This requirement, as posited in the manuscript, ignores the critical role of epigenetic control in the regulation of LINE-1 activity. As such, a discussion that acknowledges the critical roles of histone and DNA covalent modifications, and that integrates epigenomic insight into predictions of LINE-1 activity must be included in the manuscript.

      4. We thank the reviewer for suggesting this important discussion point. In response, we have expanded our discussion of this topic to place our data in the context of other literature on the effects of epigenomic regulation on in vivo LINE-1 activity, including histone and DNA modifications, as well as the effects of post transcriptional restriction factors (paragraph starting at page 11, right column, line 42).

      5. The limitations associated with the use of the CHMI were not addressed in the manuscript. While CHMI contain a paternal only genome, with no maternal contribution, the moles may arise from fertilization of an anuclear empty ovum by a haploid 23,X sperm or fertilization by two sperm giving rise to 46,XX or 46,XY karyotype. As such, generalizable conclusions about CHMI genetics should be carefully made given that the loss of maternal epigenetic imprinting and gain of paternally imprinted expression may result in abnormal gene expression, including that of LINE-1s. These variances will in turn impact LINE-1 activity profiles.

      6. We thank the reviewer for pointing out this confusingly written section of our manuscript, and we agree with the reviewer that LINE-1 activity measurements could be complicated in the CHM cell lines; however, all of our retrotransposition assays were carried out in the common background of 293T cells (chosen because of their low expression of know LINE-1 restriction factors (PMID: 25182477). We have modified the text (page 11, right column, line 52) to clarify these points.

      Minor

      1. Important citations of previously published work are not properly referenced throughout the manuscript. These are too numerous to identify individually, but the authors should carefully read the manuscript to ensure that proper documentation and reference to previous work is duly acknowledged.

      2. Please see our above response to ‘Major point 1’.

      There are several typos and missing prepositions that should be corrected. For instance, on page 7, the word "great" should be "greater".

      • Please see our above response to ‘Major point 1’ and Reviewer 1’s ‘Major comment 3’ for details on our in depth editing of the manuscript.

      Reviewer #2 (Significance (Required)):

      The contribution is highly significant as it challenges previously held concepts and advances our understanding of critical structure and function relationships of Line-1s.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Yang et al. perform an in-depth analysis of potentially mobile source L1 alleles in a single human genome (CHM1) previously subjected to Pacbio whole genome sequencing. The retrotransposition efficiencies of source L1 alleles with intact ORFs were tested in vitro, and these efficiencies compared to a model of in vivo activity based on Hamming distance to other ORF-intact L1 alleles. Comparisons of CHM1 L1 alleles are made to CHM13 (used for the recent T2T reference assembly), and also to population-scale sequencing efforts to establish how widespread each source L1 allele is. These data showcase the advantages of being able to resolve L1 alleles with long-read sequencing, allowing the field to make much more accurate predictions of retrotransposition potential in a given genome. The core analyses appear robust and for the most part enough detail is provided to follow what was done.

      • We thank Reviewer 3 for their in depth reading and analysis of our manuscript and data, and for their enthusiasm about the importance of this work in the context of foundational research from their lab and many others in the field. We have carefully considered each comment and completed several new analyses of our data and related data from other publications. We feel that our manuscript is much improved with this new data, as detailed below. Comments:

      1) The text overlooks the potential importance of L1 5'UTR mutations in L1 activity and evolution, as per PMID:25274305, PMID:1701022, and other studies, as well as the impact of genomic context on source L1 activity, as per PMID:27016617, PMID: 33186547 etc. L1 promoter evolution is arguably a major driver of L1 lineage emergence.

      • We thank the reviewer for suggesting these important additions. To present the relevance of 5'UTR mutations on LINE-1 activity and evolution, we added a discussion paragraph (paragraph starting at page 11, right column, line 16) to address how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity. Several key references have been added and discussed in the paragraph: PMID:25274305 reported the regulation of human LINE-1 by the evolution of its 5'UTR; PMID:1701022 was one of the earliest papers that found the effect the 5'UTR promoters on human LINE-1 retrotransposition; PMID: 27016617 and PMID: 33186547 reported specific L1 loci regulated by different promoters and was included in the discussion; PMID:9430649 was one of the examples of non-human LINE-1 lineages emerging because of different promoters and was cited in the added discussion paragraph. We have also added discussion points to make clear that genomic content has a clear role in the activity of source LINE-1s (paragraph starting at page 11, right column, line 42).

      2) The way the retrotransposition assay is done here (I think) removes parts of the UTRs as part of introducing L1s into retrotransposition vectors, meaning that the assay tests the biochemical activity of the ORFs. It would be helpful to readers to have a more detailed method for this assay, including the origins of the reporter plasmids, whether there is a CMVp boosting the L1 promoter etc, and some clarity about how much of each L1 was cloned into the assay.

      • We have added relevant details to the results (page 6, left column, line 5), discussion (page 11, right column, line 52), and methods (page 13, right column, line 16 and 30) sections to clarify the reviewer’s important points. The LINE-1s tested for in vitro activity were cloned in their entirety (UTRs and ORFs) but driven by both their native promoters in the 5'UTR as well as an upstream CMV promoter. Also, please see our response to Reviewer 1 ‘Minor comment 2’ above.

      3) Pacbio long-read sequencing has been used previously to locate and characterise L1 alleles in human DNA. The Introduction states: "These represent the first scalable methods to catalog LINE-1 locations and sequences in individual human genomes". The "first" here is questionable. Citations to PMID:31853540 and PMID:34772701 should be included. The latter is particularly relevant at it not only resolves source L1 sequences with PacBio sequencing but also summarises their retrotransposition efficiencies in vitro and population frequencies.

      • We apologize for leaving out these and other important references, and we agree that the “first” claim is unnecessary. We have added the references suggested for the reviewer as well as several other important references as detailed in the above response to Reviewer 2 ‘Major point 1’. In addition, we have revised the adjacent text and deleted any references to our work as the “first” in these approaches.

      4) I am very interested in the two source L1s (on chr7 and chr9) that were found here to be more active in vitro than L1RP (to my knowledge the most active such element isolated to date, or close to it). Is there anything unusual about these two L1s? A quick look at the supplemental suggested the chr9 element was 5' truncated, was it tested as such in vitro? Also I think it would be worth contrasting the assay (all in HEKs) used here to test efficiency with the assay used by Brouha ... I feel readers may be surprised to find two L1s more mobile than L1RP in one genome.

      • To provide more details about the two active L1s (chr7 and chr9), we investigated key changes that could be related to the in vitro activity of these elements and now show them in Figure 2B and File S3. In the process of this updated analysis and suggested modifications to Figure 2 by this reviewer and Reviewer 1, we saw that the chr7 L1, mentioned here, had one very high activity measurement pulling its activity above L1RP. As such, we decided to more rigorously normalize our data by using the positive and negative controls across all plates of each day instead of normalizing to the controls of individual plates, as we had previously done. In addition, for any L1 with discrepant activity among the three clones we assayed, we used whole plasmid sequencing to confirm the identity and consistency of all three clones. In three cases, we found that one or two of the three clones was the wrong L1, and hence excluded them for the in vitro activity calculation. After this validation and testing of additional clones, all clones from the same L1 have consistent in vitro activity (see updated Figure 2). The updated in vitro activity of the chr7 L1 is at 86.7% L1RP, and the chr9 L1 is at 261.4% L1RP in addition to the chr17 LINE-1 with 117% L1RP and two additional LINE-1s that have near-L1RP activity levels (Table S2, column S). These changes in L1 activity were updated in the text, figures, and supplemental materials. Also, we note that the chr9 element is 6019bp in length and was tested as such in vitro. Current work in the lab is attempting to understand the mechanisms of increased LINE-1 in vitro and in vivo activity, as described in detail in response to Reviewer 1’s ‘Major comment 5’.

      5) In several places it is mentioned how L1 alleles may differ from sequences provided in reference assemblies, and may therefore explain discrepancies between assay results here and in other studies (e.g. Brouha). The Seleme and Lutz papers are correctly mentioned here, but arguably the most complete demonstration of this concept, from PMID:31230816, is overlooked. This study reports a chr13 source L1 that was previously found to be inactive by Brouha, and with broken ORFs in the reference genome, has both mobile and immobile alleles in the human population. This L1 is actually in CHM13, but not CHM1, and is "hot" in some individuals and not others. There are several places in the manuscript where this earlier study is very relevant and it would be fair to ask it to be mentioned, especially as the results are concordant. The same concept is reinforced by an even more recent paper (PMID:35728967), except in macaque, showing that this is a general consideration for primate L1 lineages, and actually that source L1 is relatively old and yet jumps extremely well in vitro, which fits an observation made in the present study. Mutually supporting observations like these really add confidence that what is reported in the present study is robust.

      • We thank the reviewer for their suggestion to include these highly relevant and important papers; we apologize for this initial omission. We have now added several sentences to the introduction and discussion (top left paragraph page 11) in addition to citations of these papers.

      6) Hamming distance between ORF-intact source L1 alleles is used to assess in vivo activity. This seems reasonable. However, in other works, transductions have been used to identify families of very closely related L1s. I realise that many highly mobile source L1s will rarely generate insertions carrying transductions, and yet I wonder if any of the youngest L1s in the present study form transduction families, and whether estimates of in vivo activity based on transductions found in population-scale data would reconcile better with in vitro retrotransposition assay data.

      • We thank the reviewer for pointing out our exclusion of data on 3' transductions, the most commonly used surrogates of in vivo activity, while also acknowledging that only a small percent of new L1 retrotranspositions carry 3' transduction. Please see our above response to Reviewer 1’s ‘Major comment 1’ for details on our newly added comparison of our in vivo activity data to the 3' transduction-based somatic LINE-1 retrotransposition landscape of those reported in PMID:34772701, PMID:32024998 and PMID:25082706.

      7) In the Introduction, it is stated that L1 only transmits vertically. It may be prudent to mildly qualify this position, based on PMID:29983116.

      • The referenced text in the introduction has been changed from "LINE-1s only transmit vertically" to "LINE-1s generally transmit vertically with few exceptions", with the addition of the suggested citation.

      8) A column in Table S2 looks mislabelled: Column R should be CHM1 not CHM13?

      • We thank the reviewer for seeing this error. Column P (Column R in the previous version) of Table S2 is now correctly labeled as "CHM1 L1 intactness".

      Geoff Faulkner (University of Queensland)

      Reviewer #3 (Significance (Required)):

      This is a well-executed study of considerable interest to the mobile DNA field, and anyone working with long-read DNA sequencing. Its strengths are the genomic and bioinformatic analysis, leveraging the PacBio long-read data and BAC library available for CHM1 to full effect. One limitation (in current form) is its near-exclusive focus on ORFs to encapsulate how mobile a given L1 allele is, when genomic context and L1 promoter mutations could also contribute heavily. Although I liked the manuscript very much and enjoyed reviewing it, some of the conceptual advances are encroached upon by other work (including some very relevant and yet uncited literature). These issues can very likely be addressed via a revision, additional analyses may be required but not new experiments.

      Geoff Faulkner (University of Queensland)

    1. Author Response

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We are planning the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, we plan to explore the implementation of this option and assess its impact on the overall results. We will also investigate replacing the ANOVA test with a KRUSKAL test. Instead of upfront data normalization, we will consider using the PLINK –pheno-quantile-normalize option. Both options will be compared on a set of phenotypes where we can analyze the output (i.e., for phenotypes where we expect to find specific variants), to determine whether these strategies enhance the detection power.

      2) We also agree with both reviewers that gene expression information is of interest. However, we recognize that incorporating such information would entail substantial work (as elaborated in our response to comments below). We feel that this extensive work is beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we are committed to enhancing user experience by including more gene-level outlinks to Flybase. Additionally, we will link variants and gene results to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      3) In agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta-analyses for users. Therefore, we are in the process of developing a gene-centric tool that will allow users to query the database based on gene names. Moreover, we intend to integrate ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      4) Finally, we also concur with both reviewers about making minor edits to the manuscript to address their feedback.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We are relatively well-connected to the D. melanogaster community and aim to leverage this connection to render the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are often bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we consider it acceptable to perform linear regression (including ANOVA tests) on non-normally distributed data, as only the prediction errors need to follow a normal distribution.

      Furthermore, the ANOVA test is solely conducted to assess whether any of the potential covariates (such as well-established inversions and symbiont infection status) are associated with the phenotype of interest. PLINK2 automatically corrects for the effects of these covariates during GWAS by considering them as part of the regression model.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we commit to exploring the impact of data normalization on the overall outcomes. Additionally, we will consider replacing the ANOVA test with a KRUSKAL test, and using the PLINK –pheno-quantile-normalize option. We intend to compare both approaches using a set of phenotypes where we can compare the output (i.e., where specific variants are expected to be identified). This comparison will help us determine if either method enhances the detection power.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We will cite these resources in the Introduction and check the GWAS catalog submission guidelines to compare to the ones we are proposing in this paper.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We appreciate this potential issue and we will make this distinction clearer in the manuscript to avoid any confusion.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point raised by the reviewer, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. Then, it also requires adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. So, as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue, especially given the strong transcriptomics background of our lab, in a potential follow-up paper.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies. Nevertheless, we will revisit these results and explore how a more stringent threshold may impact the results.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      This section of the paper was intended to investigate anecdotal evidence suggesting that certain DGRP lines consistently rank at the top or bottom when examining fitness-related traits. If accurate, this observation could imply that inbreeding might have made these lines generally weaker, potentially introducing bias into studies aimed at uncovering the genetic basis of complex traits. However, as per the analyses presented, we did not discover support for this phenomenon. Nevertheless, we consider this message important to convey. In response to the reviewer's feedback, we intend to provide a clearer explanation of the reasoning behind this section of the paper and its main conclusion.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      Thank you. We will update the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      Thank you. We will update the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we intend to introduce a gene-centric tool that enables users to query the database based on gene names. Additionally, we will establish links to ortholog databases within the GWAS results, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we have plans to link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes. We are considering integrating this feature into the new gene-centric tool as well.

      Another potential downstream analysis we are considering is gene-set enrichment. This analysis would involve assessing the enrichment of genes in Gene Ontology or other pathway databases directly from the GWAS results page.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the positive feedback of the reviewers and have modified the manuscript to address their comments, including changes to the text, figures, and methods. We believe that these revisions have strengthened and improved the manuscript. Reviewers’ comments in blue and detailed responses in black are below.

      Reviewer #1 Weaknesses:

      • Is "function" of the ISNs to balance "nutrient need" or osmolarity? Balancing hemolymph osmolarity for physiological homeostasis is conceptually different from balancing thirst and hunger.

      We have added the following text to the introduction to address this: “Thus, the ISNs sense both AKH and hemolymph osmolality, arguing that they balance internal osmolality fluctuations and nutrient need (Jourjine, Mullaney et al., 2016).” (ln 80-82).

      • The final schematic nicely sums up how the different peptidergic pathways might work together, but it is unclear which connections are empirically-validated or speculative. It would be informative to show which parts of the model are speculative versus validated. For example, does FAFB volume synapse = functional connectivity and not just anatomical proximity? A bulk of the current manuscript relies on "synapses of relatively high confidence" (according to Materials and methods: line 522). I recommend distinguishing empirically tested & predicted connections in the final schematic, and maybe reword/clarify throughout the manuscript as "predicted synaptic partners"

      We modified the schematic to clarify EM based connections versus functionally validated connections. We also clarified the EM predicted synaptic partners, using “predicted synaptic partners” throughout the manuscript.

      Reviewer #2 Areas for further development:

      • Does BIT inhibit all of the IPCs or some of them? I think it is critical to indicate the ROIs used for each neuron in the methods. Which part of the neuron is used for imaging experiments? Dendrites, cell bodies, or synaptic terminals?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

      • The discussion section is not giving big picture explanation of how these neurons work together to regulate sugar and water ingestion. Silencing and activation experiments are good, but without showing the innate activity of these neural groups during ingestion, it is not clear what their functions are in terms of regulating fly behavior.

      We agree that how these peptidergic neurons coordinately regulate feeding is unclear. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is challenging. Acute imaging during feeding would only in part address this challenge, as cumulative changes in nutrient need signals may impart circuit changes that are not apparent by monitoring the acute activity of peptidergic neurons. We modified a paragraph in the discussion to address this (ln 434-443).

      “Overall, our work sheds light on neural circuit mechanisms that translate internal nutrient abundance cues into the coordinated regulation of sugar and water ingestion. We show that the hunger and thirst signals detected by the ISNs influence a network of peptidergic neurons that act in concert to prioritize ingestion of specific nutrients based on internal needs. We hypothesize that multiple internal state signals are integrated in higher brain regions such that combinations of peptides and their actions signify specific needs to drive ingestion of appropriate nutrients. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is a future challenge to further illuminate homeostatic feeding regulation.”

      Reviewer #1 (Recommendations For The Authors):

      • For the final schematic figure, it may be informative to include nanchung and AKHR in the schematic.

      We now include this (Fig 6).

      • For the ingestion duration with optogenetic activation, I don't think the right way to represent the data is by normalizing them to the no LED control. I think it should show raw ingestion time. I understand that the normalized data make the figure "cleaner" (no need to show +/- LED separately) but I think visualization of the raw data is important.

      We now include this in a new Supplemental Figure (Fig S6).

      • Methods for ingestion with optogenetic activation should be detailed in the Methods section.

      We expanded upon this in the ‘Temporal consumption assay (TCA)’ methods section. (ln 461-466).

      Reviewer #2 (Recommendations For The Authors):

      1) I think the authors are not following the recommendations of the Flywire community which recommends that people who contributed to the tracing of neurons are offered authorship in the published papers. I see the authors are thanking other lab members who have done tracing for the neurons described in this study, but I would like them to clarify whether they are following the guidelines provided by Flywire.

      We followed the Flywire guidelines and contacted all Flywire users contributing more that 10% to neuron edits for permission to publish with acknowledgements. (see Flywire guidelines https://docs.google.com/document/d/1bUkOB5JnT3u__JDvAoVDHJ3zr5NXQtV_63yx2w6Tcc/edit).

      2) The method section for voltage imaging is missing.

      We now include a section on voltage imaging (ln 496-498).

      3) ROIs for imaging are not indicated in the methods or in the figures. It is hard to judge what is the origin of neural activity plotted in the figures; are they imaging cell bodies, dendrites, or axons?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers and the editor for their insightful comments and suggestions. We are particularly glad to read that our so<ware package constitutes a set of “well-written analysis routines” which have “the potential to become very valuable and foundational tools for the analysis of neurophysiological data”. We have updated the manuscript to address their remarks where appropriate.

      Additionally, we would like to stress that this kind of tools is in continual development. As such, the manuscript offered a snapshot of the package at one point during this process, which in this case was several months ago at initial submission. Since then, several improvements were implemented. The manuscript has been further updated to reflect these more recent changes.

      From the Reviewing Editor:

      The reviewers identified a number of fundamental weaknesses in the paper.

      1) For a paper demonstrating a toolbox, it seems that some example analyses showing the value of the approach (and potentially the advantage in simplification, etc over previous or other approaches) are really important to demonstrate.

      As noted by the first reviewer, the online repository (i.e. GitHub page) conveys a better sense of the toolboxes’ contribution to the field than the present manuscript. This is a fair remark but at the same time, it is unclear how to illustrate this in a journal article without dedicating a great deal of page space to presenting raw code, while online tools offer an easier and clearer way to do this. As a work-around, our strategy was to illustrate some examples of data analysis in Figures 4&5 by comparing each illustrated processing step to the corresponding command line used by the Pynapple package. Each step requires a single line of code, meaning that one only needs to write three lines of code to decode a feature from population activity using a Bayesian decoder (Fig. 4a), compute a cross-correlograms of two neurons during specific stimulus presentation (Fig. 4b) or compute the average firing rate of two neurons around a specific time of the experimental task (Fig. 4c). We believe that these visual aides make it unnecessary to add code in the main text of this manuscript. However, to aid reader understanding, we now provide clear references to online Jupyter notebooks which show how each figure was generated in figure legends as well as in the “Code Availability” section.

      https://github.com/pynapple-org/pynapple-paper-2023

      Furthermore, we have opted-in for the “Executable Research Articles” feature at eLife, which will make it possible to include live scripts and figures in the manuscript once it is accepted for publication. We do not know at this stage what it entails exactly, but we hope that Figures 4&5 will become live with this feature. The readers will have the possibility to see and edit the code directly within the online version of the manuscript.

      2) The manuscript's claims about not having dependencies seem confusing.

      We agree that this claim was somewhat unfounded. There are virtually no Python packages that do not have dependencies. Our intention was to say that the package had no dependencies outside the most common ones, which are Numpy, Scipy, and Pandas. Too many packages in the field tend to have long list of dependencies making long-term back-compatibility quite challenging. By keeping depencies minimal, we hope to maximise the package’'s long term back-compatibility. We have rephrased this statement in the manuscript in the following sections:

      Figure 1, legend.

      “These methods depend only on a few, commonly used, external packages.”

      Section Foundational data processing: “they are for the most part built-in and only depend on a few widely-used external packages. This ensures that the package can be used in a near stand-alone fashion, without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      3) Given its significant relevance, it seems important to cite the FMATool and describe connections between it (or analyses based on it) and the presented work.

      Indeed, although we had already cited other toolboxes (including a review covering the topic comprehensively), we should have included this one in the original manuscript. Unfortunately, to the best of our knowledge, this toolbox is not citable (there is no companion paper). We have added a reference to it in plain text.

      4) Some discussion of integration between Pynapple and the rest of a full experimental data pipeline should be discussed with regard to reproducibility.

      This is an interesting point, and the third paragraph of the discussion somewhat broached this issue. Pynapple was not originally designed to pre-process data. However, it can, in theory, load any type of data streams a<er the necessary pre-processing steps. Overall, modularity is a key aspect of the Pynapple framework, and this is also the case for the integration with data pre-processing pipelines, for example spike sorting in electrophysiology and detection of region of interest in calcium imaging. We do not think there should be an integrated solution to the problem but, instead, to make it possible that any piece of code can be used for data irrespective of their origin. This is why we focused on making data loading straightforward and easy to adapt to any particular situation. To expand on this point and make it clear that Pynapple is not meant to pre-process data but can, in theory, load any type of data streams a<er the necessary pre-processing steps, we have added the following sentences to the aforementioned paragraph:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data have already been pre-processed (for example, spike sorting and detection of ROIs).”

      5) Relatedly, a description of how data are stored a<er processing (i.e., how precisely are processed data stored in NWB format).

      We agree that this is a critical issue. NWB is not necessarily the best option as it is not possible to overwrite in a NWB file. This would require the creation of a new NWB file each time, which is computationally expensive and time consuming. It also further increases the odds of writing error. Theoretically, users who needs to store intermediate results in a flexible way could use any methods they prefer, writing their own data files and wrappers to reload these data into Pynapple objects. Indeed, it is not easy to properly store data in an object-specific manner. This is a long-standing issue and one we are currently working to resolve.

      To do so, we are developing I/O methods for each Pynapple core objects. We aim to provide an output format that is simple to read and backward compatible in future Pynapple releases. This feature will be available in the coming weeks. To note, while NWB may not be the central data format of Pynapple in future releases, it has become a central node in the neuroscience ecosystem of so<ware. Therefore, we aim to facilitate the interaction of users with reading and writing for this format by developing a set of simple standalone functions.

      Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience o<en includes a set of analyses that o<en share common components. For example, an investigator might want to generate plots that relate one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the waste of time, the potential for errors, and the greater difficulty inherent in sharing highly customized code.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well-written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object- oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      There are two main weaknesses of the paper in its present form.

      First, the claims relating to the value of the library in everyday use are not demonstrated clearly. There are no comparisons of, for example, the number of lines of code required to carry out a specific analysis with and without Pynapple or Pynacollada. Similarly, the paper does not give the reader a good sense of how analyses are carried out and how the object-oriented architecture provides a simplified user interaction experience. This contrasts with their GitHub page and associated notebooks which do a better job of showing the package in action.

      As noted in the response to the Reviewing Editor and response to the reviewer’s recommendation to the authors below, we have now included links to Jupyter notebooks that highlight how panels of Figures 4 and 5 were generated (https://github.com/pynapple-org/pynapple-paper-2023). However, we believe that including more code in the manuscript than what is currently shown (I.e. abbreviated call to methods on top of panels in Figs 4&5) would decrease the readability of the manuscript.

      Second, the paper makes several claims about the values of object-oriented programming and the overall design strategy that are not entirely accurate. For example, object-oriented programming does not inherently reduce coding errors, although it can be part of good so<ware engineering. Similarly, there is a claim that the design strategy "ensures stability" when it would be much more accurate to say that these strategies make it easier to maintain the stability of the code. And the authors state that the package has no dependencies, which is not true in the codebase. These and other claims are made without a clear definition of the properties that good scientific analysis so<ware should have (e.g., stability, extensibility, testing infrastructure, etc.).

      Following thFMAe reviewer’s comment, we have rephrased and clarified these claims. We provide detailed response to these remarks in the recommendations to authors below.

      There is also a minor issue - these packages address an important need for high-level analysis tools but do not provide associated tools for preprocessing (e.g., spike sorting) or for creating reproducible pipelines for these analyses. This is entirely reasonable, in that no one package can be expected to do everything, but a bit deeper account of the process that takes raw data and produces scientific results would be helpful. In addition, some discussion of how this package could be combined with other tools (e.g., DataJoint, Code Ocean) would help provide context for where Pynapple and Pynacollada could fit into a robust and reliable data analysis ecosystem.

      We agree the better explaining how Pynapple is integrated within data preprocessing pipelines is essential. We have clarified this aspect in the manuscript and provide more details below.

      Reviewer #1 (Recommendations For The Authors):

      Page 1

      • Title

      The authors should note that the application name- "Pynapple" could be confused with something from Apple. Users may search for "Pyapple" as many python applications contain "py" like "Numpy". "Pyapple" indeed is a Python Apple that works with Apple products. They could consider "NeuroFrame", "NeuroSeries" or "NeuroPandas" to help users realize this is not an apple product.

      We thank the referee for this interesting comment. However, we are not willing to make such change at this point. The community of users has been growing in the last year and it seems too late to change the name. To note, it is the first time such comment is made to us and it does not seem that users and collaborators are confused with any Apple products.

      • Abstract

      The authors mentioned that the Pynapple is "fully open source". It may be better to simply say it is "open source".

      We agree, corrected.

      Assuming the authors keep the name, it would be helpful if the full meaning of Pynapple - Python Neural Analysis Package was presented as early as possible.

      Corrected in the abstract.

      • Highlight

      An application being lightweight and standalone does not imply nor ensure backward compatibility. In general, it would be useful if the authors identified a set of desirable code characteristics, defined them clearly in the introduction, and then describe their so<ware in terms of those characteristics.

      Thank you for your comment. We agree that being lightweight and standalone does not necessarily imply backward compatibility. Our intention was to emphasize that Pynapple is designed to be as simple and flexible as possible, with a focus on providing a consistent interface for users across different versions. However, we understand that this may not be enough to ensure long-term stability, which is why we are committed to regular updates and maintenance to ensure that the code remains functional as the underlying code base (Python versions, etc.) changes.

      Regarding your suggestion to identify a set of desirable code characteristics, we believe this is an excellent idea. In the introduction, we briefly touch upon some of the core principles that guided our development of Pynapple: a lightweight, stable, and simple package. However, we acknowledge that providing a more detailed discussion of these characteristics and how they relate to the design of our so<ware would be useful for readers. We have added this paragraph in the discussion:

      “Pynapple was developed to be lightweight, stable, and simple. As simplicity does not necessarily imply backward compatibility (i.e. long-term stability of the code), Pynapple main objects and their properties will remain the same for the foreseeable future, even if the code in the backend may eventually change (e.g. not relying on Pandas in future version). The small number of external dependencies also decrease the need to adapt the code to new versions of external packages. This approach favors long-term backward compatibility.”

      Page 2

      • The authors wrote -

      "Despite this rapid progress, data analysis o<en relies on custom-made, lab-specific code, which is susceptible to error and can be difficult to compare across research groups."

      It would be helpful to add that custom-made, lab-specific code can lead to a violation of FAIR principles (https://en.wikipedia.org/wiki/FAIR_datadata). More generally, any package can have errors, so it would be helpful to explain any testing regiments or other approach the authors have taken to ensure that their code is error-free.

      We understand the importance of the FAIR principles for data sharing. However, Pynapple was not designed to handle data through their pre-processing. The only aspect that is somehow covered by the FAIR principles is the interoperability, but again, it is a requirement for the data to interoperate with different storage and analysis pipelines, not of the analysis framework itself. Unlike custom-made code, Pynapple will make interoperability easier, as, in theory, once the required data loaders are available, any analysis could be run on any dataset. We have added the following sentence to the discussion:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data has already been pre-processed (for example, spike sorting and ROI detection). According to the FAIR principles, pre-processed data should interoperate across different analysis pipelines. Pynapple makes this interoperability possible as, once the data are loaded in the Pynapple framework, the same code can be used to analyze different datasets”

      • The authors wrote -

      "While several toolboxes are available to perform neuronal data analysis ti–11,2ti (see ref. 29 for review), most of these programs focus on producing high-level analysis from specified types of data and do not offer the versatility required for rapidly-changing analytical methods and experimental methods."

      Here it would be helpful if the authors could give a more specific example or explain why this is problematic enough to be a concern. Users may not see a problem with high-level analysis or using specific data types.

      Again, we apologize for not fully elaborating upon our goals here. Our intention was to point out that toolboxes o<en focus on one particular case of high-level analysis. In many cases, such packages lack low level analysis features or the flexibility to derive new analysis pipelines quickly and effortlessly. Users can decide to use low-level packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background. The simplicity of Pynapple, and the set of examples and notebooks, make it possible for individuals who start coding to be quickly able to analyze their data.

      As we do not want to be too specific at this point of the manuscript (second paragraph of the intro) and as we have clarified many of the aspects of the toolbox in the new revised version, we have only added the following sentence to the paragraph:

      “Users can decide to use low-level data manipulation packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background.”

      • The authors wrote -

      "To meet these needs, a general toolbox for data analysis must be designed with a few principles in mind"

      Toolboxes based on many different principles can solve problems. It is likely more accurate to say that the authors designed their toolbox with a particular set of principles in mind. A clear description of those principles (as mentioned in the comment above) would help the reader understand why the specific choices made are beneficial.

      We agree that these are not “universal” principles and clearly more the principles we had in mind when we designed the package. We have clarified these principles and made clear that these are personal point of views.

      We have rephrased the following paragraph:

      “To meet these needs, we designed Pynapple, a general toolbox for data analysis in systems Neuroscience with a few principles in mind.“

      • The authors wrote -

      "The first property of such a toolbox is that it should be object-oriented, organizing so<ware around data."

      What facts make this true? For example, React is a web development library. A common approach to using this library is to use Hooks (essentially a collection of functions). This is becoming more popular than the previous approach of using Components (a collection of classes). This is an example of how Object-oriented programming is not always the best solution. In some cases, for example, object- oriented coding can cause problems (e.g. it can be hard to find the place where a given function is defined and to figure out which version is being used given complex inheritance structures.)

      In general, key selling points of object-oriented programming are extension, inheritance, and encapsulation. If the authors want to retain this text (which would be entirely reasonable), it would be helpful if they explained clearly how an object-oriented approach enables these functions and why they are critical for this application in particular.

      The referee makes a particularly important point. We are aware of the limits of OOP, especially when these objects become over-complex, and that the inheritance become unclear.

      We have clarified our goal here. We believe that in our case, OOP is powerful and, overall, is less error- prone that a collection of functions. The reasons are the following:

      An object-oriented approach facilitates better interactions between objects. By encapsulating data and behavior within objects, object-oriented programming promotes clear and well-defined interfaces between objects. This results in more structured and manageable code, as objects communicate with each other through these well-defined interfaces. Such improved interactions lead to increased code reliability.

      Inheritance, a key concept in object-oriented programming, allows for the inheritance of properties. One important example of how inheritance is crucial in the Pynapple framework is the time support of Pynapple objects. It determines the valid epoch on which the object is defined. This property needs to be carried over during different manipulations of the object. Without OOP, this property could easily be forgotten, resulting in erroneous conclusions for many types of analysis. The simplest case is the average rate of a TS object: the rate must be computed on the time support ( a property of TS objects), not the beginning to the end of the recording (or of a specific epoch, independent of the TS). Finally, it is easier to access and manipulate the meta information of a Pynapple object than without using objects.

      • The authors wrote -

      "drastically diminishing the odds of a coding error"

      This seems a bit strong here. Perhaps "reducing the odds" would be more accurate.

      We agree. Now changed.

      Page 3

      • The authors wrote -

      ". Another property of an efficient toolbox is that as much data as possible should be captured by only a small number of objects This ensures that the same code can be used for various datasets and eliminates the need of adapting the structure"

      It may be better to write something like - "Objects have a collection of preset variables/values that are well suited for general use and are very flexible." Capturing "as much data as possible" may be confusing, because it's not the amount that this helps with but rather the variety.

      We thank the referee for this remark. We have rephrased this sentence as follows:

      “Another property of an efficient toolbox is that a small number of objects could virtually represents all possible data streams in neuroscience, instead of objects made for specific physiological processes (e.g. spike trains).”

      • The authors wrote -

      "The properties listed above ensure the long-term stability of a toolbox, a crucial aspect for maintaining the code repository. Toolboxes built around these principles will be maximally flexible and will have the most general application"

      There are two issues with this statement. First, ensuring long-term stability is only possible with a long- term commitment of time and resources to ensure that that code remains functional as the underlying code base (python versions, etc.) changes. If that is something you are commisng to, it would be great to make that clear. If not, these statements need to be less firm.

      Second, it is not clear how these properties were arrived at in the first place. There are things like the FAIR Principles which could provide an organizing framework, ideally when combined with good so<ware engineering practices, and if some more systematic discussion of these properties and their justification could be added, it would help the field think about this issue more clearly.

      The referee makes a valid point that ensuring long-term stability requires a long-term commitment of time and resources to maintain the code as the underlying technology evolves. While we cannot make guarantees about the future of Pynapple, we believe that one of the best ways to ensure long-term stability is by fostering a strong community of users and contributors who can provide ongoing support and development. By promoting open-source collaboration and encouraging community involvement, we hope to create a sustainable ecosystem around Pynapple that can adapt to changes in technology and scientific practices over time. Ultimately, the longevity of any scientific tool depends on its adoption and use by the research community, and we hope that Pynapple can provide value to neuroscience researchers and continue to evolve and improve as the field progresses.

      It is noteworthy that the first author, and main developer of the package, has now been hired as a data scientist at the Center for Computational Neuroscience, Flatiron Institute, to explicitly continue the development of the tool and build a community of users and contributors.

      • The authors wrote -

      "each with a limited number of methods..."

      This may give the impression that the functionality is limited, so rephrasing may be helpful.

      Indeed! We have now rephrased this sentence:

      “The core of Pynapple is five versatile timeseries objects, whose methods make it possible to intuitively manipulate and analyze the data.”

      • The authors wrote that object-oriented coding

      "limits the chances of coding error"

      This is not always the case, but if it is the case here, it would be helpful if the authors explain exactly how it helps to use object-oriented approaches for this package.

      We agree with the referee that it is not always the case. As we explained above, we believe it is less error-prone that a collection of functions. Quite o<en, it also makes it easier to debug. We have changed this sentence with the following one:

      “Because objects are designed to be self-contained and interact with each other through well-defined methods, users are less likely to make errors when using them. This is because objects can enforce their own internal consistency, reducing the chances of data inconsistencies or unexpected behavior. Overall, OOP is a powerful tool for managing complexity and reducing errors in scientific programming.”

      • Fig 1

      In object-oriented programming, a class is a blueprint for the classes that inherit it. Instantiating that<br /> class creates an object. An object contains any or all of these - data, methods, and events. The figure could be improved if it maintained these organizational principles as figure properties.

      We agree with the referee’s remark regarding the logic of objects instantiation but how this could be incorporated in Fig. 1 without making it too complex is unclear. Here, objects are instantiated from the first to the second column. We have not provided details about the parent objects, as we believe these details are not important for reader comprehension. In its present form, the objects are inherited from Pandas objects, but it is possible that a future version is based on something else. For the users, this will be transparent as the toolbox is designed in such a way that only the methods that are specific to Pynapple are needed to do most computation, while only expert programmers may be interested in using Pandas functionalities.

      • The authors wrote that Pynapple does -

      "not depend on any external package"

      As mentioned above, this is not true. It depends on Numpy and likely other packages, and this should be explained. It is perfectly reasonable to say that it depends on only a few other packages.

      As said above, we have now clarified this claim.

      Page 5.

      • The authors wrote -

      "represent arrays of Ts and Tsd"

      For a knowledgeable reader's reference, it would be helpful to refer to these either as Numpy arrays (at least at first when they are defined) or as lists if they are native python objects.

      Indeed, using the word “arrays” here could be confusing because of Numpy arrays. We have changed this term with “groups”.

      • The authors wrote -

      "Pynapple is built with objects from the Pandas library ... Pynapple objects inherit the computational stability and flexibility"

      Here a definition of stability would be useful. Is it the case that by stability you mean "does not change o<en"? Or is some other meaning of stability implied?

      Yes, this is exactly what we meant when referring to the stability of Pandas. We have added the following precision:

      “As such, Pynapple objects inherit the long-term consistency of the code and the computational flexibility computational stability and flexibility from this widely used package.”

      Page 6

      • Fig 2

      In Fig 2 A and B, the illustrations are good. It would also be very helpful to use toy code examples to illustrate how Pynapple will be used to carry out on a sample analysis-problem so that potential users can see what would need to be done.

      We appreciate the kind works. Regarding the toy code, this is what we tried to do in Fig. 4. Instead of including the code directly in the paper, which does not seem a modern way of doing this, we now refer to the online notebooks that reproduce all panels of Figure 4.

      • The authors wrote -

      "While these objects and methods are relatively few"

      In object-oriented programming, objects contain methods. If a method is not in an object, it is not technically a method but a function. It would be helpful if the authors made sure their terminology is accurate, perhaps by saying something like "While there are relatively few objects, and while each object has relatively few methods ... "

      We agree with the referee, we have changed the sentence accordingly.

      • The authors wrote -

      "if not implemented correctly, they can be both computationally intensive and highly susceptible to user error"

      Here the authors are using "correctly" to refer to two things - "accuracy" - gesng the right answer, and "efficiency" - gesng to that answer with relatively less computation. It would be clearer if they split out those two concepts in the phrasing.

      Indeed, we used the term to cover both aspects of the problem, leading to the two possible issues cited in the second part of the sentence. We have changed the sentence following the referee’s advice:

      “While there are relatively few objects, and while each object has relatively few methods, they are the foundation of almost any analysis in systems neuroscience. However, if not implemented efficiently, they can be computationally intensive and if not implemented accurately, they are highly susceptible to user error.”

      • In the next sentence the authors wrote -

      "Pynapple addresses this concern."

      This statement would benefit from just additional text explaining how the concern is addressed.

      We thank the referee for the suggestion. We have changed the sentence to this one: “The implementation of core features in Pynapple addresses the concerns of efficiency and accuracy”

      Page 9

      • The authors wrote -

      This is implemented via a set of specialized object subclasses of the BaseLoader class. To avoid code redundancy, these I/O classes inherit the properties of the BaseLoader class. "

      From a programming perspective, the point of a base class is to avoid redundancy, so it might be better to just mention that this avoids the need to redefine I/O operations in each class.

      We have rephrased the sentence as follows:

      “This is implemented via a set of specialized object subclasses of the BaseLoader class, avoiding the need to redefine I/O operations in each subclass"

      • The authors wrote -

      "classes are unique and independent from each other, ensuring stability"

      How do classes being unique and independent ensure stability? Perhaps here again the misunderstanding is due to the lack of a definition of stability.

      We thank the referee for the remark. We first changed “stability” for “long-term backward compatibility”. We further added the following sentence to clarify this claim. “For instance, if the spike sorting tool Phy changes its output in the future, this would not affect the “Neurosuite” IO class as they are independent of each other. This allows each tool to be updated or modified independently, without requiring changes to the other tool or the overall data format.”

      • The authors wrote -

      "Using preexisting code to load data in a specific manner instead of rewriting already existing functions avoids preprocessing errors"

      Here it might be helpful to use the lingo of Object-oriented programming. (e.g. inheritance and polymorphism). Defining these terms for a neuroscience audience would be useful as well.

      We do not think it is necessary to use too much technical term in this manuscript. However, this sentence was indeed confusing. We have now simplified it:

      “[…], users can develop their own custom I/O using available template classes. Pynapple already includes several of such templates and we expect this collection to grow in the future.”

      Page 10

      • The authors wrote -

      "These analyses are powerful because they are able to describe the relationships between time series objects while requiring the fewest number of parameters to be set by the user."

      It is not clear that this makes for a powerful analysis as opposed to an easy-to-use analysis.

      We have changed “powerful” with “easy to use".

      Page 12

      "they are built-in and thus do not have any external dependencies"

      If the authors want to retain this, it would be helpful to explain (perhaps in the introduction) why having fewer external dependencies is useful. And is it true that these functions use only base python classes?

      We have rephrased this sentence as follows:

      “they are for the most part built-in and only depend on a few common external packages, ensuring that they can be used stand-alone without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      Other comments:

      • It would be helpful, as mentioned in the public review, to frame this work in the broader context of what is needed to go from data to scientific results so that people understand what this package does and does not provide.

      We have added the following sentence to the discussion to make sure readers understand:

      “The path from data collection to reliable results involves a number of critical steps: exploratory data analysis, development of an analysis pipeline that can involve custom-made developed processing steps, and ideally the use of that pipeline and others to replicate the results. Pynapple provides a platform for these steps.”

      • It would also be helpful to describe the Pynapple so<ware ecosystem as something that readers could contribute to. Note here that GNU may not be a good license. Technically, GNU requires any changes users make to Pynapple for their internal needs to be offered back to the Pynapple team. Some labs may find that burdensome or unacceptable. A workaround would be to have GNU and MIT licenses.

      The main restriction of the GPL license is that if the code is changed by others and released, a similar license should be used, so that it cannot become proprietary. We therefore stick to this choice of license.

      We would be more than happy to receive contributions from the community. To note, several users outside the lab have already contributed. We have added the following sentence in the introduction:

      “As all users are also invited to contribute to the Pynapple ecosystem, this framework also provides a foundation upon which novel analyses can be shared and collectively built by the neuroscience community.”

      • This so<ware shares some similarities with the nelpy package, and some mention of that package would be appropriate.

      While we acknowledge the reviewer's observation that Nelpy is a similar package to Pynapple, there are several important differences between the two.

      First, Nelpy includes predefined objects such as SpikeTrain, BinnedSpikeTrain, and AnalogSignal, whereas Pynapple would use only Ts and Tsd for those. This design choice was made to provide greater flexibility and allow users to define their own data structures as needed.

      Second, Nelpy is primarily focused on electrophysiology data, whereas Pynapple is designed to handle a wider range of data types, including calcium imaging and behavioral data. This reflects our belief that the NWB format should be able to accommodate diverse experimental paradigms and modalities.

      Finally, while Nelpy offers visualization and high-level analysis tools tailored to electrophysiology, Pynapple takes a more general-purpose approach. We believe that users should be free to choose their own visualization and analysis tools based on their specific needs and preferences.

      The package has now been cited.

      Reviewer #2 (Public Review):

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user- friendly toolset that can also serve as a wrapper for NWB.

      The scope of the manuscript is not clear to me, and the authors could help clarify if Pynacollada and other toolsets in the making become a future aspect of this paper (and Pynapple), or are the authors planning on building these as separate publications.

      The author writes that Pynapple can be used without the I/O layer, but the author should clarify how or if Pynapple may work outside NWB.

      Absolutely. Pynapple can be used for generic data analysis, with no requirement of specific inputs nor NWB data. For example, the lab is currently using it for a computational project in which the data are loaded from simple files (and not from full I/O functions as provided in the toolbox) for further analysis and figure generation.

      This was already noted in the manuscript, last paragraph of the section “Importing data from common and custom pipelines”

      “Third, users can still use Pynapple without using the I/O layer of Pynapple.”.

      We have added the following sentence in the discussion

      “To note, Pynapple can be used without the I/O layer and independent of NWB for generic, on-the-fly analysis of data.”

      This brings us to an important fundamental question. What are the advantages of the current approach, where data is imported into the Ts objects, compared to doing the data import into NWB files directly, and then making Pynapple secondary objects loaded from the NWB file? Does NWB natively have the ability to store the 5 object types or are they initialized on every load call?

      NWB and Pynapple are complimentary but not interdependent. NWB is meant to ensure long-term storage of data and as such contains a as much information as possible to describe the experiment. Pynapple does not use NWB to directly store the objects, however it can read from NWB to organize the data in Pynapple objects. Since the original version of this manuscript was submitted, new methods address this. Specifically, in the current beta version, each object now has a “save” method. Obviously, we are developing functions to load these objects as well. This does not depend on NWB but on npz, a Numpy specific file format. However, we believe it is a bit too premature to include these recent developments in the manuscript and prefer not to discuss this for now.

      Many of these functions and objects have a long history in MATLAB - which documents their usefulness, and I believe it would be fisng to put further stress on this aspect - what aspects already existed in MATLAB and what is completely novel. A widely used MATLAB toolset, the FMA toolbox (the Freely moving animal toolbox) has not been cited, which I believe is a mistake.

      We agree that the FMA toolbox should have been cited. This ha now been corrected.

      Pynapple was first developed in Matlab (it was then called TSToolbox). The first advantage is of course that Python is more accessible than Matlab. It has also been adopted by a large community of developers in data analysis and signal processing, which has become without a doubt much larger than the Matlab community, making it possible to find solutions online for virtually any problem one can have. Furthermore, in our experience, trainees are now unwilling to get training in Matlab.

      Yet, Python has drawbacks, which we are fully aware of. Matlab can be very computationally efficient, and old code can usually run without any change, even many years later.

      A limitation in using NWB files is its standardization with limited built-in options for derived data and additional metadata. How are derived data stored in the NWB files?

      NWB has predetermined a certain number of data containers, which are most common in systems neuroscience. It is theoretically possible to store any kind of data and associated metadata in NWB but this is difficult for a non-expert user. In addition, NWB does not allow data replacement, making is necessary to rewrite a whole new NWB file each time derived data are changed and stored. Therefore, we are currently addressing this issue as described above. Derived data and metadata will soon be easy to store and read.

      How is Pynapple handling an existing NWB dataset, where spikes, behavioral traces, and other data types have already been imported?

      This is an interesting point. In theory, Pynapple should be able to open a NWB file automatically, without providing much information. In fact, it is challenging to open a NWB file without knowing what to look for exactly and how the data were preprocessed. This would require adapting a I/O function for a specific NWB file. Unfortunately, we do not believe there is a universal solution to this problem. There are solutions being developed by others, for example NWB Widgets (NWB Widgets). We will keep an eye on this and see whether this could be adapted to create a universal NWB loader for Pynapple.

      Reviewer #2 (Recommendations For The Authors):

      Other tools and solutions are being developed by the NWB community. How will you make sure that these tools can take advantage of Pynapple and vice versa?

      We recognize the importance of collaboration within the NWB community and are committed to making sure that our tools can integrate seamlessly with other tools and solutions developed by the community.

      Regarding Pynapple specifically, we are designing it to be modular and flexible, with clear APIs and documentation, so that other tools can easily interface with it. One important thing is that we want to make sure Pynapple is not too dependent of another package or file format such as NWB. Ideally, Pynapple should be designed so that it is independent of the underlying data storage pipeline.

      Most of the tools that have been developed in the NWB community so far were designed for data visualisation and data conversion, something that Pynapple does not currently address. Multiple packages for behavioral analysis and exploration of electro/optophysiological datasets are compatible with the NWB format but do not provide additional solutions per se. They are complementary to Pynapple.

    1. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript by Lan et al. addresses the still incompletely resolved question as to how branching morphogenesis of the embryonic mammary epithelium is regulated at the molecular and cellular level. Using (combinatorial) primary explant cultures of wildtype and genetically engineered mouse embryos, in which the authors have developed a unique expertise over many years, together with imaging and RNAseq analyses, they (i) show that the timing of epithelial branching is dictated by the biological age of the epithelium, but that an epithelial-mesenchymal interaction is required to bestow branching ability on the mammary epithelium somewhere between E13.5 and E16.5, (ii) seek to determine if and how lineage and cell proliferation affect branching, (iii) show that while salivary mesenchyme can promote growth (i.e. branching density) of the E16.5 mammary epithelium, the mode of branching (i.e. lateral branching vs tip-clefting) is an intrinsic property of the mammary epithelium, (iv) use transcriptomics to identify genes that are likely to control either mammary- or salivary gland specific growth and/or branching patterns, (v) hypothesize that low levels of WNT signaling in the mammary gland mesenchyme (due to relatively high expression of WNT signaling inhibitors) are responsible for mammary specific branching, (vi) show that hyperactivation of WNT/CTNNB1 signaling in the mesenchyme indeed induces hyperbranching, (vii) identify Eda and Igf1 as putative mediators and paracrine signaling factors that regulate branching of the mammary epithelium upon secretion from the mesenchyme downstream of WNT/CTNNB1 signaling and (viii) show that mammary gland branching is impaired in Igfr1 null embryos.

      Major comments: 1. Overall, this is a solid study that is well controlled and technically of high quality. The materials and methods should allow follow up and replication by others and the transcriptomic data have been made available via NCBI GEO. I think the authors convincingly demonstrate points (i), (iii), (iv) and (vi) and (viii). I have some questions regarding (ii), (v) and (vii) and (viii) that I will pose below.

      Our response:

      We thank the reviewer for the careful assessment and recognition of our work. In the subsequent sections, we have tried to address all the concerns raised by the reviewer.

      Re: (ii): The authors try to study the link between basal cell fate and branching. They use position of the cells (which they describe clearly and which is a good choice), since they cannot use specific markers due to the fact that the basal and luminal linages have not yet segregated at this point. This part of the manuscript is not the most straightforward to follow. The most obvious experiment would have been to focus on the location of the cells and their associated cell cycle profile - but the authors themselves have just recently published a pre-print (their REF #54, now also out in JCB) that is an in-depth study of the link between cell proliferation + cell motility and branching, but this only becomes apparent in the discussion. In that sense, Fig2 of the current manuscript is less novel, although it is nice to see that it holds up in a slightly different analysis.

      Our response:

      We thank the reviewer for acknowledging our recently published work, which is focusing on the active branching phase during late embryogenesis/around birth. In the current proliferation analysis, however, our focus was on a different aspect of embryonic mammary gland development: understanding the mechanism underlying the ability to acquire competence to branch, i.e. how the epithelium changes between late bud and sprout stages. Our data obtained from tissue recombination and 3D culture experiments suggest that heterotypic mesenchymes or mesenchyme-free 3D organoid culture conditions do not provide sufficient signals to support branching of mammary epithelia before E16.5. We have rephrased the text to better emphasize this point.

      Instead of focusing on the cell cycle markers, the authors turn to a K14-Eda mouse model - which shows precocious branching and a temporary reduction in K8 expression. They also analyze Eda-KO embryos. Quite frankly, I find the authors' reasoning difficult to follow here and I cannot deduce how these experiments really address the question at hand (i.e. how lineage and cell proliferation affect branching), so I hope they can rewrite this section of the paper to make the arguments more clear and easy to follow for the reader who, at this point, knows little about Eda. For example, the authors present the argument that K14-Eda mice show a transient reduction in K8 expression - but we don't know if that also really means a (temporary?) change in (future?) luminal cell fate. In fact, since Eda later also makes an appearance as a candidate factor to be secreted by the mesenchyme together with Igf1, I wonder if their K14-Eda data would not be better suited to underscore that point instead and if the authors should perhaps eliminate this section altogether and just refer to their prior work in REF #45. If the authors think the current data add something more, than they need to be more explicit about this (and then also introduce the link to REF #45 in the results section).

      __Our response: __

      We agree with all the reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own and should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      Re: (v): Do the authors have any WNT/CTNNB1 target genes that they can include in their transcriptomics analysis to show that the WNT/CTNNB1 signaling levels are indeed lower in the mammary mesenchyme? Axin2 comes to mind, but there are some other negative feedback targets that are often induced across tissues, e.g. Rnf43 and/or Znrf3 and/or Sp5?E.g. to include in FIg6E?

      __Our response: __

      In the original manuscript (lines 339-342), we have performed the GSVA analysis comparing the KEGG database, and the significantly altered pathways comparing different mammary mesenchymes with salivary gland mesenchyme have been pooled and displayed as heatmap in Supplementary Fig 4b. The WNT signaling pathway is lower in the mammary mesenchyme, especially at E16.5.

      As suggested by the reviewer, we have analyzed Axin2, the most commonly used readout of WNT/CTNNB1 signaling activity in our RNA-seq data that we include as a __new Supplemental Fig. 4c __in the preliminary revised manuscript. Axin2 data indicate that Wnt/β-catenin signaling activity is lower in the E16.5 fat pad, where branching takes place, compared to younger stages of mammary gland and the salivary gland.

      Plan for the final revision:

      Additionally, we will provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      Re: (vii) and (viii): The authors convincingly show the phenotype of the Igfr1 KO mice, but I hope the authors concur that an epithelial only Igfr1 KO (or alternatively a mesenchymal only Igf1 KO, or epithelial/mesenchymal recombination experiments with WT vs IGFR1 null or IGF1 null tissue, or experiments with small molecule inhibitors of IGF1/IGFR1 signaling) would have given more solid mechanistic evidence regarding the presumed paracrine effect of IGF1 signaling. I am not asking the authors to perform another mouse experiment or even generate or use these conditional strains, but if the authors agree, then I do think this would merit some attention in the discussion section. See also my comments regarding Eda in point 1.

      Our response:

      As shown in the current manuscript, Igf1 is expressed in the mammary and salivary gland mesenchyme. This finding is in line with E14 in situ expression data available in Genepaint (https://gp3.mpg.de/results/Igf1) showing that overall in embryonic tissues, Igf1 is mainly produced in mesenchymal tissues. Of note, in Genepaint, a clear signal can be detected in the salivary gland mesenchyme, not the epithelium. Published E16 and E18 datasets indicate low level of Igf1 expression in the mammary epithelium (https://wahl-lab-salk.shinyapps.io/Mammary_snATAC/). Hence, we conclude that Igf1 is mainly produced by mesenchymal cells. Instead, Igf1r appears to be rather ubiquitously expressed.

      A previous study assessed BrdU incorporation in Igf1r-/- mammary buds at E14.5, and reported a specific proliferation defect in the epithelium, while no difference was detected in the mesenchyme (Fig. 9, Heckman et al., 2007; PMID:17662267). However, we cannot exclude the possibility of autocrine, mesenchymal Igf1/Igf1r signaling, which in turn could lead to upregulation of a paracrine factor to regulate epithelial growth.

      We agree with the reviewer in that novel conditional mouse models are beyond the scope of the current study. However, we do not think that small molecule drugs could be used to block Igf1r activity in a tissue-specific manner neither.

      Plan for the final revision:

      To further delineate the paracrine and/or autocrine role of Igf1/Igf1r pathway during mammary epithelial growth and branching, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme, as suggested by the reviewer.

      Minor comments: - A few minor spelling/grammar errors, including a couple of "the"s missing (first line of the abstract, and also preceding "Majority" in line 148.

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      • Line 517-518: please also include the details for the Eda mice.

      Our response:

      We apologize for missing this important information in materials and methods. We have included a short introduction of the K14-Eda mice, a new reference for the original publication producing them, as well as the Jackson Laboratories strain number for Eda-/- (a.k.a. Tabby) mice in the revised manuscript.

      • 1f spelling error: separation

      Our response:

      The spelling error has been corrected in the revised manuscript.

      **Referees cross-commenting**

      Having read all three review reports I think they are pretty much in agreement, with shared questions about the inclusion/meaning/discussion of the lineage specification data and also agreement about the overall technical solidity of the data and this approach.

      I gather that reviewer #2 asks for more controls than myself or reviewer #3 and while I think all of their points are valid, in principle, I don't think all of these are required. I should add that I am inclined to trust the authors on their ability to separate mesenchyme and epithelium as they have been developing and optimising this system over many years.

      Our response:

      We are grateful to the reviewer for the reliance on the technical aspect of our experiments. We do routinely monitor tissue purity in the recombinants (for more details, see our response to reviewer #2). To demonstrate this, we have included new data in new Supplementary Fig. 1a,b and new Supplementary Fig. 3. We believe these additions will further enhance the validity of our findings and effectively address the concerns raised by reviewer 2.

      Reviewer #1 (Significance (Required)):

      General assessment: This is a carefully executed study in which an impressive amount of (combinatorial) embryonic mammary tissue explant experiments are combined with quantitative imaging and transcriptomics analysis.

      The main limitations of the work lie in the fact that the investigation of a potential link between branching and the cell cycle is not entirely novel, as the authors themselves recently published an nice pre-print (now also out in JCB) describing similar analyses. In addition, the mechanistic link between WNT/CTNNB1 signaling in the mesenchyme and the paracrine signaling activities of the presumed downstream effectors EDA and IGF, while plausible, is not yet complete. The work also does not yet addresses what exactly the branching identity is that is bestowed upon the mammary epithelium between E13.5 and E16.5 and how this then becomes an intrinsic (epigenetic?) feature of the mammary gland.

      Advance: This work provides more insight into the embryonic branching of the mammary gland - a stage of mammary gland development that is still poorly understood and that is, in general, understudied. In part, the work confirms prior work in the literature (their REF #19) regarding mammary and salivary gland tissue recombination experiments. It supplements this with a more elaborate time series of heterochronic and heterologous epithelium/mesenchyme explant cultures, using genetically engineered (and fluorescently labeled) mouse tissues to allow better and quantitative imaging. The transcriptomic analysis of different mesenchyme populations is also informative and allows the researchers to propose a putative mechanism for why the mammary gland branches differently from the salivary gland. The advance is both technical and functional, as well as conceptual, with some advance in terms of mechanism.

      Audience: This works should appeal to mammary gland biologists interested in the molecular and cellular mechanisms of (early) mammary gland development, as well as to a broader community of developmental biologists studying branching morphogenesis in tissues such as lung, kidney and salivary gland.

      My expertise: WNT signaling and mammary gland biology, at the intersection of developmental, stem cell and cancer biology

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      The mammary gland is a branched structure that consists of a bilayered epithelium embedded in a specialized mesenchyme. In mice, at 11,5 days of embryogenesis, the ectoderm thickens forming 5 pairs of peculiar structures called placodes. During the following days, the placodes will grow and invaginate into the surrounding mammary mesenchyme and they will finally start to branch by the end of embryogenesis (E16). It has been suggested that the bidirectional communication between the growing mammary gland and the surrounding mesenchyme plays a pivotal role in the determination of each step of mammary gland development (placode formation, mammary bud invagination, gland outgrowth, branching). The role of different signalling molecules has already been shown, particularly for the placode growth and mammary bud invagination. Nevertheless, the pathways regulating embryonic mammary gland branching are still incompletely understood. In this manuscript, Lan and colleagues aim to decipher the correlation between different stages of mammary gland development such as proliferation, lineage segregation and ductal branching. Furthermore, they want to define which stage of mammary development is intrinsically determined by the epithelium and which one requires the supportive guidance of the mesenchyme. Lastly, they aim to discover the key signal for the growth and branching of mammary epithelium. To these purposes, they used an ex vivo model of heterochronic epithelial-mesenchymal recombination. In particular, they micro-dissected the epithelium and/or the mesenchyme from murine mammary glands at different stages of embryonic development (i.e. at E13,5 for the quiescent phase or 16,5 for branching phase) and explanted them together in different combinations using fluorescent reporters. To assess the role of the mesenchyme they also cultured the epithelium in a mesenchyme free 3d structure. Through this model they demonstrated that the presence of the mesenchyme is necessary for the priming of mammary epithelium for branching, since only E16,5 epithelial cells were able to grow and branch in a mesenchyme free 3D experiment. Nevertheless, intrinsic properties of the epithelium are necessary for the timing of branching, since E16,5 mesenchyme was not able to accelerate the outgrowth of E13,5 epithelia. In order to determine which epithelial properties are important, the authors correlated the beginning of cell proliferation in the embryonic mammary gland to the beginning of the branching phase. They indeed used the Fucci2a mouse model to carefully characterise the timing of mammary cells proliferation at different stages of embryonic development, concluding that the great majority of proliferating cells reside in the inner part of the mammary bud until E14,5, while in the external part at later stages. Regarding the importance of cell proliferation, Lan and colleagues claim that the beginning of the branching phase is not its direct consequence, thanks to the use of the K14Cre- Eda mouse model, known to have anticipated mammary gland development. Using this and the Eda-/- models, the authors also sustain that the branching occurs independently of the lineage specification of the epithelium. The use of salivary mesenchyme instead the mammary one was able to increase the number of branching of E16,5 mammary epithelium. Nevertheless, this model demonstrated that the branching pattern (side branching vs tip bifurcation) is an intrinsic feature of the epithelium. Lan and colleagues also defined the transcriptomic profiles of the mammary and salivary mesenchymes at different stages. In particular, they observed an increased expression of negative regulators of Wnt pathway in the mammary mesenchyme compared to the salivary mesenchyme. Moreover, using a mouse model where B-catenin is stabilised, they observed increased tip production in the mammary gland epithelium. They also showed that IGF1 production is increased after Wnt pathway activation and they tested its function, both treating their ex vivo cultures with exogenous IGF1 and using Igf1r-/- mouse models.

      Major comments 1- The great majority of the results of the manuscript are based on an ex vivo model of heterochronic epithelial-mesenchymal recombination. Since the authors are studying the effect of the mesenchyme of different stages on the epithelium (and vice versa), the purity of the two compartments after the dissection is particularly important. Although they said that the purity is evaluated (line 112), it would be important to show a control staining in which they use known markers of the mesenchyme with no colocalization with the fluorescent reporter of the epithelium.

      Our response:

      We agree with the reviewer that the purity of the separated tissues is very important for our conclusions. This is why we have used genetically labeled tissues in all recombination experiments: the epithelium and the mesenchyme were always isolated from embryos ubiquitously expressing GFP or tdTomato. We find this the most reliable way to assess the origin and purity of the isolated tissues. If there was any carry-over mesenchyme isolated with the GFP+ epithelium, this would be revealed as GFP+ mesenchymal cells in the recombinants consisting of otherwise tdTomato+ mesenchyme. And vice versa: any carry-over tdTomato+ epithelium isolated with the mesenchyme would be revealed as tdTomato+ epithelial cells in the recombinants. We apologize for not making this clear enough in the original manuscript. In the revised manuscript, we now provide confocal high-resolution images of the recombinant explants (new Supplementary Fig. 1a,b). The explants have been co-stained with the epithelial marker EpCAM, revealing a robust colocalization between the ubiquitously expressed florescent labels in the designated epithelial tissues and the EpCAM.

      2- Another important point for understanding the quality and impact of these findings is to assess the similarities and differences, if there are, between the in vivo mesenchyme and the ex vivo one. Indeed, once explanted and put in culture, mesenchymal cells could change their transcriptomic profile and consequently change their signals to the epithelium. The authors should assess the expression of the genes and pathways studied during embryonic development in vivo.

      Our response:

      The reviewer is correct in that the transcriptomes will likely undergo some changes when organs are cultured ex vivo. This is why RNA-seq was done on freshly isolated tissues. Regarding the potential changes taking place ex vivo, however, we do not consider them relevant with respect to the questions we are addressing in this study. The reason is (as reported in the manuscript) that all control recombinations (homochronic recombinations such as E13 epithelium + E13 mesenchyme, E16 epithelium + E16 mesenchyme etc.) branched essentially as in vivo. Therefore, we find the results and conclusions made from the tissue recombination experiments solid.

      3- The authors clearly showed that E16,5 epithelium is able to branch in a mesenchyme free 3D culture model, while epithelia from earlier stages don't. This led to the conclusion that mesenchyme is necessary for acquiring the branching ability. Nevertheless, the authors also said that early stages epithelia scarcely grow in the mesenchyme free 3D culture. Therefore, the lack of branching may be due to the lack of growth, if not the increase of death, of epithelial cells. The authors should quantify the size and the cell death of the epithelia in the different culture conditions and discuss better this point.

      Our response:

      The reviewer is correct in that one of the key functions of the mammary mesenchyme up to E16.5 may be to provide survival signals for the epithelium, and this might explain why epithelia younger than E16.5 fail to grow/branch when recombined with salivary gland mesenchyme and in mesenchyme-free organoid culture.

      Plan for the final revision:

      To address this issue, we will assess apoptosis in mammary epithelia cultured in the mesenchyme-free 3D culture organoid set-up.

      4- The Fucci2a model allowed to assess the proliferation of embryonic mammary epithelium, showing that the great majority of proliferating cells are basal, at late stages of development (line 182). As it has already been shown, lineage specification is a late process during mammary gland development. The fact that the proliferating cells reside at the external part of the bud does not mean that they are basal cells yet. A p63/K8 staining could be important to understand if the increased proliferation occured in already specified basal cells or not.

      __Our response: __

      Indeed, mammary lineage specification is a later process. As pointed out in the manuscript and by reviewer #1, the widely used basal and luminal lineage markers have not yet segregated to separate compartments at the developmental stages analyzed in our study, and therefore cannot be used as tools for this purpose. We would like to emphasize that in the manuscript, we analyzed the cells based on their position, and have used the term basal to indicate the basal position, not the prospective lineage. Accordingly, we used the term inner instead of luminal cells to indicate their location, not lineage. We have further clarified this point in the preliminary revised manuscript.

      5- The use of Fucci2a model showed that 20% of epithelial cells are proliferative at E13,5. This phase is considered as "quiescent" by the authors (line 120), but the moderate proliferation rate shown in this experiment demonstrated that it is not. A change of the nomenclature is needed.

      __Our response: __

      We have removed the word “quiescent” from the text.

      6- Through the use of K14-Eda and Eda-/- models, the authors claimed that the lineage specification is not a prerequisite for ductal branching. To support this point, they showed that the K14-Eda mice have an anticipated branching although the expression of K8 in the inner part of the bud is transitorily decreased. The authors link the K8 downregulation to a transient suppression of the luminal lineage, but this is clearly overclaimed. Although K8 is a known marker of luminal lineage, the downregulation of one marker is not sufficient to support their thesis. They should first check more markers and in particular critical regulators of luminal lineage as Notch1, Foxa1 and Elf5. Lately, the use of different models that drive embryonic epithelial cells to a forced lineage commitment (Notch1 or Δnp63 overexpression) would support more their claim. As additional evidence, the authors showed that Eda is able to promote basal cell signature. Firstly, the authors should better explain why this point would support their thesis. Secondly, the supplementary figure 2b does not show which genes are taken into account to define the basal signature. A list of these genes would be helpful, as well as staining for some representative proteins.

      Our response:

      We thank the reviewer for these constructive suggestions. We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own to be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      7- The authors used the same mouse models to assess the importance of proliferation in the determination of ductal branching and they claimed that proliferation is not a sufficient feature. This conclusion was supported by two observations. The first one is the fact that the K14-Eda model shows an increased cell proliferation at early stages compared to wt, coupled with anticipated branching. Secondly, although having smaller glands compared to wt and showing a delay in ductal branching, Eda-/- mice have an epithelial proliferation rate very similar to wt. Again, the conclusion that proliferation is not sufficient for branching is overclaimed. Firstly, the authors should explain how the buds in wt and Eda-/- mice have different sizes although the similar proliferation (increased cell death?, cellular volume?). Secondly, to support the thesis that proliferation is not sufficient for branching, functional experiments should be performed (see point 12). For instance, the short-time treatments with inhibitors or promotors of proliferation may help to understand the effective role of proliferation in the determination of branching.

      Our response:

      We show that there is no direct link between onset of proliferation and acquisition of branching ability. However, we are not claiming that proliferation is not important for branching, as obviously new cells are needed as building blocks of growing tissues. In a recently published paper, we have assessed the role of proliferation in branch point formation in embryonic mammary glands. Using mitomycin C to block proliferation, we showed that initiation of new branches occurs even when proliferation is blocked (Myllymäki et al., JCB2023, PMID: 37367826).

      The reviewer was also asking why Eda-/- mammary primordia are smaller at E15.5-E16.5 despite similar proliferation rates. In the revised manuscript, we have quantified the volume of E13.5 Eda-/- and control mammary buds and show that Eda-/- buds are ~25% smaller (3.5 ± 0.8 x 105 µm3 in Eda-/- vs. 4.6 ± 0.7 x 105 µm3 in control, mean ± SD) already at the bud stage (new Supplementary Fig. 2c,d).

      We have also quantified the cellular size in Eda-/- and control mammary glands at E13.5 and E15.5 and found that mammary epithelial cells in Eda-/- embryos are ~15% smaller (new Supplementary Fig. 2e,f). Together, these data indicate that the smaller size of E15.5-E16.5 Eda-/- mammary glands is a combinatorial effect the smaller mammary anlage at E13.5 and smaller cell size. These findings, while interesting on their own, do not challenge our conclusions regarding the link between onset of proliferation and acquisition of branching ability.

      8- The heterotypic epithelial-mesenchymal recombination using the salivary gland is interesting. Nevertheless, some stainings to assess the purity of their systems are again required (e.g., marker of salivary epithelium to verify the purity of the mesenchyme and vice versa).

      __Our response: __

      As mentioned above, all tissue recombination experiments were performed so that the epithelium and the mesenchyme originated from genetically labelled embryos expressing different fluorescent proteins. In the revised manuscript, we provide confocal images of the salivary-mammary tissue recombinants (new Supplementary Fig. 3), confirming the purity of the tissue compartments used in these experiments.

      This model clearly showed that the mammary epithelium can form more branching when combined with the salivary mesenchyme. Moreover, the salivary epithelium preferentially branches through tip bifurcation, while mammary epithelium combined with the salivary mesenchyme has a mixed pattern of tip bifurcation and side branching (typical of the mammary gland). The authors thus concluded that the branching pattern is an intrinsic feature of the epithelium. However, a comparison between the percentage of tip bifurcation and side branching in the heterotypic combination and the homotypic combination between mammary epithelium and mammary mesenchyme is crucial to understand this point. Indeed, these results are not sufficient to exclude that the branching pattern is partially determined by intrinsic features and partially by extrinsic signals. The authors should carefully quantify the branching pattern in the homotypic combination and compare that to the heterotypic one. If the percentage of tip bifurcation do not change, their conclusion is correct; if this percentage increases in the heterotypic combination, it would be a sign of a partial effect of the signals of the mesenchyme.

      Our response:

      We thank the reviewer for raising this question. We have independently generated data on the type of mammary gland branching events in two papers with somewhat different culture and imaging conditions (Lindström et al., BiorXiv 2022 and Myllymäki et al., JCB, 2023, PMID: 37367826). Both analyses showed that in embryonic mammary glands, the majority of branching events (~70%) occurs by side-branching. These data are in line with the current study that we have now complemented to include also the mammary-mammary recombination experiments (revised Supplementary Video 1, revised Fig. 4b). Quantification of branching events revealed no significant difference in the type of branching events of mammary epithelia grown with salivary or mammary gland mesenchyme (revised Fig. 4c), further supporting our initial conclusions.

      9- Through the analysis of their transcriptomic data, Lan and colleagues found that the mammary mesenchyme expresses higher levels of negative regulators of Wnt pathway compared to the salivary mesenchyme. To demonstrate the value of their findings, they should confirm this in vivo, through staining of known Wnt proteins on the salivary and mammary mesenchymes at the embryonic stage.

      Our response:

      In mammals, there are 19 Wnt ligands, over a dozen secreted Wnt inhibitors, 10 Frizzled receptors, two Lrp co-receptors, and numerous other pathway modifiers that contribute to the net Wnt signaling activity in a complex manner. Furthermore, it has been “notoriously difficult to generate useful antibodies to vertebrate Wnt proteins...In general, these sera do not detect endogenous Wnt proteins in cell extracts, nor do they detect Wnt proteins in tissues by staining techniques. Hence, there are few data on Wnt protein distribution in intact vertebrate animals.” This is a direct citation from the Wnt Homepage, maintained by the Nusse Lab; https://web.stanford.edu/group/nusselab/cgi-bin/wnt/reagents#antibod.

      For all these reasons, we do not find this approach feasible nor informative.

      Instead, in the revised manuscript, we report the expression levels of Axin2, the most commonly used transcriptional readout of canonical Wnt activity in our RNA-seq data (new Supplementary Fig. 4c). Axin2 levels are lowest in the E16 fat pad where mammary branching takes place, much lower than in any other tissues analyzed in the study.

      Plan for the final revision:

      To complement these findings, we will additionally provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      10- Since the ability of the salivary mesenchyme to promote a higher rate of branching in the mammary epithelium, the authors wanted to assess what could be the role of Wnt signalling. To do so, they used a mouse model where B-catenin is stabilised, allowing an increased Wnt signalling in the mammary mesenchyme. As a result, they observed increased branching in the mammary epithelium. They also found that IGF1 is a ligand regulated by Wnt pathway in the mesenchyme. Therefore, the use of exogenous IGF1 in their ex vivo model was able to increase the branching of the mammary epithelium. Moreover, Igf1r-/- embryos showed a significant decrease of mammary gland branching. The conclusion based on these experiments was that the Wnt-Igf1-Igf1r axis plays a pivotal role in the promotion of mammary gland branching during embryogenesis. This conclusion is overclaimed for different reasons. Firstly, the normalization of the ductal branching to the body weight is insufficient to exclude that the impact of the Igf1r knockout may have severe consequences on the mammary gland formation, upstream of the ductal branching. Another parameter for this normalization is required (e.g., size of the bud before branching, proliferation status, etc).

      Our response:

      We agree with the reviewer in that Igf1r knockout may affect mammary gland formation in multiple ways, and also prior to onset of branching, as already indicated in the original manuscript: “…apart from one study reporting the smaller size of the E14 mammary bud in IGF-1R deficient embryos …” (line 398-399 in the revised version) and ‘…mammary gland 3 that was consistently absent.’ (line 414-415 in the revised version).

      To assess whether the reduced size and branching of E16.5/E18.5 Igf1r-/- mammary glands is merely a consequence of the smaller anlage, the revised manuscript includes new data reporting quantification of the volume of mammary gland 2 of Igf1r-/- and wild type littermate embryos at E13.5, E16.5, and E18.5 from 3D confocal images of whole mount EpCAM stained mammary glands. As can be seen from the new Fig. 7g-h, at E13.5, the mutant mammary buds are about 60% of the size of the controls, at E16.5, 25% and at E18.5 only 20 % revealing a progressive defect, indicative of a specific defect at the outgrowth and branching stage. This conclusion was validated by normalization to the body weight: at E13.5 the size of Igf1r-/- mammary anlage did not differ from that of the wild type embryos (p = 0.11), at E16.5 the sprouts were smaller in the mutants, though the difference did not reach statistical significance (p = 0.08), while at E18.5, the Igf1r-/- mammary glands were significantly smaller (p = 0.000021) (new Fig. 7i). We find these data compelling evidence for a specific role for Igf1r in outgrowth and branching of the embryonic mammary gland.

      The use of alternative models to specifically knockout the receptor in the epithelium or the ligand in the mesenchyme (e.g. viruses) would be even more useful to specifically focus on the role of this pathway for ductal branching excluding side effects.

      Our response:

      We thank the reviewer for this suggestion. Unfortunately, based on our experience, viral shRNA delivery is not sufficiently efficient for effective gene silencing, unlike Cre delivery for a gain-of-function approach (used in the current study to flox out exon 3 of beta-catenin) in case where the endogenous pathway activity is very low and therefore, targeting even a subset of cells is sufficient for upregulation of paracrine factors.

      Plan for the final revision:

      To address the question on the autocrine or paracrine role of Igf1r, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme.

      Another limit of this model is the fact that Igfr1 can be bound by Igf2 as well and we cannot exclude that this has an impact too (except if Igf2 is not expressed at this stage). A quantification of Igf2 expression may be useful.

      Our response:

      Indeed, we cannot exclude the possibility that Igf2 could also play a role (Igf2 expression was similar to Igf1 in our RNA-seq dataset, see Supplementary Fig. 5), but the connection of mesenchymal Wnt signaling activity was to Igf1, not Igf2 – in fact Igf2 was somewhat downregulated in Wnt3A treated sample reported by Wang et al. (Wang et al., 2021) (highlighted by an arrow in the revised Fig. 6). We have also clarified this point in the Discussion of the preliminary revised manuscript.

      11- From the experiments presented in this section it is clear that Wnt-Igf1-Igf1r axis has to be finely regulated to have the correct amount of ductal branching in the embryonic mammary epithelium. Nevertheless, the author just showed the RNA levels of Igf1 in the different compartments they have analysed. Stainings to see the effective presence of the ligand on the tissue is mandatory to clarify the role of this axis in the ductal branching in vivo.

      Our response:

      Igf1-Igf1r signaling plays a critical growth promoting function during embryonic and postnatal development. The expression of Igf1 at RNA and protein level has been detected in almost all tissues in humans (Daughaday et al., Endocr. Rev., 1989; PMID: 2666112). Given that Igf1 is a secreted protein and multiple Igf binding proteins (Igfpbs) (that regulate the bioactivity of Igf1 by sequestering it) are expressed in the mammary and salivary gland mesenchyme (Supplementary Fig. 5), we find it unlikely that Igf1 staining would provide any additional information to the current study, as they cannot be used to assess the source of Igf1, nor the location of the signaling activity.

      Furthermore, as underlined by the authors, this axis is specifically important and upregulated in the salivary gland. Due the limit of the Igf1R-/- model, we cannot exclude that, although Wnt-Igf1-Igf1r axis is able to increase the branching ability of mammary epithelium, the normal branching rate observed in wt mice is due to other pathways.

      Our response:

      We agree with the reviewer in that other pathways are also important in regulating normal mammary gland branching, for example, Eda/NF-κB and FGF pathways as we described in the Introduction. Our results do not exclude the possibility that also pathways other than Wnt regulate Igf1 expression. The reviewer is correct that if a paracrine factor is expressed in the salivary gland but not in the mammary mesenchyme, its physiological effect may be limited to the salivary gland. Indeed, cluster 5 identified by the mFuzzy analysis (Fig. 5f) is likely to include some genes like that. This is why we decided to focus on cluster 6 genes like Igf1. In the revised manuscript, we have better highlighted the difference between cluster 5 and 6 genes.

      Unfortunately, with the currently available tools, we cannot test the importance of the endogenous mesenchymal Wnt signaling activity by inactivating Wnt signaling activity specifically in the mesenchyme at the time point when branching begins. This would require an inducible mesenchymal Cre line (mesenchymal β-catenin is essential for the early fate specification of the primary mammary mesenchyme; Hiremath et al., 2012, PMID: 23034629), and conditional β-catenin null mouse. We do not have such mice available and we find that these experiments are beyond the scope of the current study.

      12- Lastly, once claimed to have found the key factor necessary for ductal branching promotion, the authors should also test if the proliferation and lineage segregations are unaffected in this context, confirming their dispensable role claimed in the initial part of the manuscript.

      __Our response: __

      Igf1/Igf1r is well-known for its growth promoting function via cell proliferation. We have no reasons to think that this would not be the case also in the mammary gland, and it was not our intention to give the impression that proliferation was not affected. In fact, Hiremath et al. (2012) already reported a defect in epithelial cell proliferation in Igf1rmammary buds at E14. Our key finding is that compared to other organs, the mammary gland is particularly sensitive to loss of Igf1r during branching morphogenesis. Finally, as pointed out earlier, better tools will be needed to assess the potential link between lineage segregation and onset of branching, a topic that we hope to address in the future.

      Minor comments: 1- An important paper on mammary gland ductal branching was published on Nature in 2017 by Scheele and colleagues and should be presented in the introduction, even though it is at later stages (after birth).

      Our response:

      We thank the reviewer for the suggestion. In the revised manuscript, we have added the findings from Scheele et al. 2017 in the introduction.

      2- In line 136 and 139 the authors referred to Fig 2 but it should be Fig 1

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      3- The sentence on line 142 should be rephrased, since "advanced developmental stages" may be referred to pubertal development. The authors should specify that they are talking about embryonic development.

      Our response: We apologize for the potential misunderstanding. In the revised manuscript, we have used the phrase “advanced embryonic developmental stage” to describe our conclusion more precisely.

      Reviewer #2 (Significance (Required)):

      Overall, the authors concluded that embryonic mammary gland development and branching are extremely sensitive to the loss of IGF1, normally produced by the mesenchyme. The topic of the paper is interesting, the experimental approaches are well conceived, the data are convincing and the findings are of interest to developmental biologists. Nevertheless, there are some significant points that need to be further investigated before considering the manuscript suitable for publication:

      Our response:

      We thank the reviewer for the careful assessment and positive feedback of our manuscript. We have already addressed most of the points raised and most remaining ones will be addressed in the final revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      While it has long been established that epithelial-mesenchymal interaction is necessary for mammary branching this work shows by heterochronic recombination that initiation mammary branching is not advanced by mesenchymal stage. By examining Fucci2a embryos the authors demonstrate that branching is preceded by a significant increase in basal cell-biased proliferation but, through further analysis of Eda gain and loss of function mice, conclude that proliferation per se does not cause branching. They show by heterotypic recombination with salivary tissue that early mammary epithelia rudiments require their own mesenchyme for survival and that although later E16.5 rudiments expand more robustly when in contact with salivary mesenchyme they nevertheless retain their characteristic mammary branch pattern. Thus, they establish that initiation and patterning are intrinsic properties of the epithelium but that early survival and later expansion/proliferation is regulated by the mesenchymal context. By transcriptomic comparison of mammary and salivary mesenchyme they reveal that genes encoding canonical Wnt attenuators and antagonists are highly expressed in early mammary mesenchyme and drop as branching ensues. The low expression of these negative regulators of Wnt signaling in salivary mesenchyme is proposed as an explanation for its growth and branch stimulating capability. In keeping with these observations, the authors show that experimental activation of mammary mesenchymal Wnt signaling augments both growth and branching. Lastly, they identify transcriptomic changes in IGF1 coincident with the initiation of mammary branching and confirm its role by extending analyses of the effects of gain and loss of function of IGF1 on embryonic mammary development.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      I have some minor criticisms that I believe can be quickly remedied in a minor rewrite and suggestions for the authors consideration to improve the manuscript discussion as follows:

      Minor issues Abstract, line 37: The authors misuse the word "decompose" - it should be "deconstruct"

      __Our response: __ We thank the reviewer for pointing out our mistake, which we have corrected in the revised manuscript.

      Results, p7 line 48: Add "The" to the sentence: "The majority...."

      __Our response: __ Corrected it in the revised manuscript.

      P8 line 173 This sentence refers to Figure 2G which is a quantitative plot. I would suggest replacing the word "cluster" which implies a spatial organization with the word "subset" or "significant fraction" The spatial data in Fig 2d support basal bias but do NOT to my eye show any clustering - in fact the proliferative basal cells appear to be evenly dispersed within the basal layer.

      Our response:

      We thank the reviewer for highlighting this aspect. We agree that “significant fraction” is a more suitable term than “cluster”.

      P9 line 188: The statement on basal cell lineage specification needs a reference.

      __Our response: __

      Following the suggestions from reviewers #1 and 3, we have removed the content about lineage segregation in Results, together with this sentence.

      P10 line 201-216 I found the section on lineage specification (fig S2) weaker than the rest and a distraction from the main thrust of the paper making it difficult for the reader to focus. I suggest omitting this section and supplemental figures associated with it altogether.

      __Our response: __

      We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own that should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      P9 line 190: "displays precocious onset of branching" it is sufficient to say: displays precocious branching - the use of both "precocious" and "onset' is redundant.

      P10 line 229 Similarly, delete "the onset of branching was delayed" it is sufficient to say: branching was delayed.

      __Our response: __ Both sentences have been corrected it in the revised manuscript.

      P11 line 243: Delete "on the regulation of the" and substitute the word "to" in the sentence: "Next, we shifted our focus on the regulation of the branching pattern, which is thought to be determined by mesenchymal cues."

      __Our response: __ Corrected it in the revised manuscript.

      P11 line 241 subtitle and Figure 4 title: The disparity in titles here is jarring for the reader: Results text subtitle: "Salivary gland mesenchyme is rich in growth-promoting cues, but does not alter the mode of branch point formation of the mammary epithelium". Figure 4 Title: "Mammary mesenchyme is indispensable for the branching ability of the mammary gland". I suggest to the authors divide the figure as well as the text to make the two points indicated by their disparate titles separately.

      __Our response: __ We thank reviewer for the suggestion to clarify the Results part of the manuscript. As suggested, we have split the data under two separate subtitles, but due to limitations in figure numbers, we prefer to report these data in one figure panel.

      P12 line 279 From here on out the manuscript has a tendency to use the term "growth" ambiguously - in many instances it is unclear do they mean expansion, proliferation, increased branch number/ morphology?? Please try to clarify.

      __Our response: __

      Our aim is to use the term growth to mean tissue growth (expansion). We hope that this is clearer in the revised manuscript.

      P16 line 341 use word "prompted" instead of word "promoted"

      __Our response: __ We thank reviewer for spotting out the slip, which we have corrected in the revised manuscript.

      P16 line 382: include word "embryonic" before "mammary development"

      __Our response: __ We have modified the text in the revised manuscript.

      Discussion P18 line 416: Add the words "later stage (E16.5)" to the sentence: "Importantly, we demonstrate that salivary gland mesenchyme could only promote the growth of later stage (E16.5) mammary epithelium"

      __Our response: __ We thank reviewer for the suggestion. We have modified the text in the revised manuscript.

      P19 line 437: Given the authors statement "Instead, cell motility is critical for branch point formation in the mammary gland" they should consider a brief sentence mentioning their transcriptomic findings on cadherin 11 and Tenascin.

      __Our response: __ We thank the reviewer for appreciation of our transcriptomic data. In the revised manuscript, we have added the following text in discussion: “Accordingly, we observed significantly increased expression of cell migration promoting genes such as Cdh11 (encoding Cadherin 11), and Tnc (encoding Tenascin C) 60,61 in the E16.5 mesenchyme compared to E13.5 (Supplementary Table 2).”

      P19 line 451: Similarly, given their statement "This observation suggests that mammary epithelium itself carries the instructions dictating the mode of branching" they could consider their transcriptomic data on Ltbp1 in "mammary specific" clusters 7,8,9 as a matrix molecule initially expressed by mammary mesenchyme but which becomes expressed by luminal epithelial cells at precisely the time they acquire lineage specification and intrinsic branching capability.

      __Our response: __ This is an excellent suggestion. We have added following text in discussion: “It is worth noting that certain mesenchymal factors, such as Ltbp1, began transitioning towards epithelium-specific expression around E16.5 69. Exploring the potential impact of these factors on the self-instructed branching capacity of the mammary epithelium could yield valuable insights.”

      P20 lines 462-470 The authors should address their theory of Wnt suppression in the mammary mesenchyme in the context, albeit conflictingly, of earlier studies showing expression of Wnt signaling reporters, in either epithelial or mesenchymal locations during early stages.

      Our response: __ We thank reviewer for the suggestion. In the preliminary revised manuscript, we report Axin2 expression data as __new Supplementary Fig. 4c. Axin2 expression data suggest that Wnt/β-catenin activity is lowest in the E16.5 fat pad (where branching takes place) compared to all other tissues analyzed in the study.

      Plan for the final revision:

      For the final revised manuscript, we will additionally generate transgenic Wnt reporter expression data (see also our response to point 3 of Reviewer #1). These results will be discussed in light of the published Wnt reported literature in the final revised manuscript.

      Reviewer #3 (Significance (Required)):

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      Our response:

      We much appreciate the positive evaluation of our manuscript. We have addressed all the feedback provided by the reviewer 3 in the preliminary revised manuscript, except the last point, which will be included in the final revision along with the new data on the Wnt reporter expression.

      Field of expertise: Embryonic and adult mammary development, Wnt signaling, cell adhesion

    1. Author Response

      eLife assessment

      This useful paper examines changes (or lack thereof) in birds' fear response to humans as a result of COVID-19 lockdowns. The evidence supporting the primary conclusion is currently inadequate, because the model used does not properly account for many potentially confounding factors that could influence the study's outcomes. If the analytic approach were improved, the findings would be of interest to urban ecologists, behavioral biologists and ecologists, and researchers interested in understanding the effects of COVID-19 lockdowns on animals.

      Many thanks for these supportive words. We did our best to improve our manuscript according to the reviewers and editor comments. Importantly, we regret being unclear in the Methods, as our models already controlled for most of the confounds (see below) discussed by the reviewers.

      For example, given that a single observer collected the data at most sites, site as a random intercept in the models controls also for the observer effects (which is one of the reasons why site is in the model). We added details to Methods (L352-356, see also “Statistical analyses” in the main text).

      The first reviewer asked us to use “some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here”. Our main results are now based on country-specific models and hence, the use of a single value predictor for each city is not appropriate. Please, see also below.

      The second reviewer is concerned about multicollinearity in our models because of the 0.95 correlation between Period and Stringency Index. However, these are key predictor variables of interest that have never been used within the same model as predictors. We now clearly explain this in the Methods (L458-538, 548-550) and within legend of Figure S2.

      The third reviewer suggested that our models would benefit from controlling for day in the species-specific breeding cycle. Although we don’t have precise city-specific information on the timing of breeding stages in the sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained most of the variance (see Table S1-S2) – something that could have been expected. In other words, we do control for what the third reviewer asked for. Similarly, we account for habitat features that may influence escape distance by including site in the models. Site usually refers to a specific park (we assume that within-park heterogeneity is lower than between park variation) and hence partly addresses the reviewer’s concern. Again, we highlight this within the Methods (L466-476).

      Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance). Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends.

      Thank you very much for your positive feedback.

      However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      Thank you. We did our best in addressing your comments (see below and updated Methods, Results and Discussion sections).

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g. would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e. we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g. the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      Thank you for this comment. First, Oxford Stringency Index seemed to us as the best available index for our purposes, i.e to estimate people's mobility during the shutdown because restrictions surely influenced the possibility that people would be outside, and because the index is a country-specific estimate. However, in addition, we now checked all indices mentioned in Noi et al. 2022 and found useful only the Google Mobility Reports, which we now use, because (a) it is publicly available, (b) it is available also for territories outside US, and (c) provides data for each city included in our dataset as well as for urban parks where most of our data were collected. Note that some platforms are no longer providing their mobility data (e.g. Apple).

      However, Google Mobility provides day-to-day variation in human mobility, whereas we are interested in overall increase/decrease in human mobility. Nevertheless, we correlated the Google mobility index with the Stringency index and found that human mobility generally decreases with the strength of the anti-pandemic measures adopted in sampled countries (albeit the effect for some countries, e.g. Poland, is small; Fig. 5).

      Moreover, we also added analysis using # of humans collected directly in the field during escape trials (e.g. Fig. 6 and S6) and found that the link between # of humans and Stringency index or Google Mobility was weak and noise, 95%CIs widely crossing zero (Fig. 6).

      Importantly, if we use Google Mobility and # of humans, respectively, as predictors of escape distance, the results are qualitatively very similar to results based on Oxford Stringency Index (Fig. S6), or Period, with tiny effect sizes for both (95%CIs for Google Mobility -0.3 – 0.06, Table S5, for # of humans -0.12 – 0.02, Table S6) supporting our previous conclusions.

      Note that Google Mobility and the number of humans have their limitations (see our comment to the editor and the Methods section in the main manuscript, e.g. L418-433). The lack of Google Mobility data for years before the COVID-19 pandemic does not allow us to fully explore whether overall human activity decreased during COVID-19 or not (our test for period prior and during COVID-19). If the year 2022 reflects a return to “normal” (which is to be disputed due to COVID-19-driven rise in home office use) the 2020 and 2021 had on average lower levels of human activity (Fig. 4). Whether such a difference is biologically meaningful to birds is unclear given the immense day-to-day change in human mobility and presence (Fig 4). Moreover, the number of humans capture within- and between-day variation rather than long-term changes in human presence.

      We added details on the new analysis into the method and results sections (e.g. Fig. 4-6; L142-165, 418-438, 495-535) and Supplementary Information (Figs. S5-S9 and associated Tables) and discuss the problematic accordingly. Moreover, to enhance clarity about country specific effect (or their lack), we also add country specific estimates to the Results (Fig. 1 and Fig. S6 and respective Tables). Finally, our statistical design and random structure of the model allowed us to control for spatial and temporal variation in compliance with government restrictions.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      Thank for these comments. We incorporated them in the main text (L293-329). Your point 1) corresponds to our point (i): “Most urban bird species in our sample may be relatively inflexible in their escape responses because the species may be already adapted to human presence” (L293-306); your point 2) to our point (ii): “Urban environment might filter for bold individuals (Carrete and Tella, 2013, 2010; Sprau and Dingemanse, 2017). Thus, the lack of consistent change in escape behaviour of urban birds during the COVID-19 shutdowns may indicate an absence (or low influx) of generally shy, less tolerant individuals and species from rural or less disturbed areas into the cities…” (L307-314); your point 3) to our point (iii): “Urban birds might have been already habituated to or tolerant of variation in human presence, irrespective of the potential changes in human activity patterns” (L315-329). To distinguish between (ii) and (iii) or the two from (i), individually-marked birds and comprehensive genetic analyses are needed, which we now note in the Discussion (L330-348). Importantly, we also discuss that the lack of response might be due to relatively small changes in human activity (L253-292), which we unfortunately could not fully quantify.

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species.

      Thank you for this point. We address this point in the Introduction and Discussion (L92-101, 307-314). Rural bird populations/individuals are on average less tolerant of humans than urban birds (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146) and at the same time, bird individuals seem consistent in their escape responses (Carrete & Tella 2010, Biol Lett 23:167–170; Carrete & Tella 2013, Sci Rep 3:1–7).

      Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021).

      We added the papers (L89-91). Thank you!

      There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that did observe changes in space use or abundance - i.e. changes in space use could arise precisely because responses to humans are non-plastic but the distribution and activities of humans changed.

      Thank you. Indeed, we now address this in the Discussion (L303-306): “However, some studies reported changes in the space use by wildlife (Schrimpf et al., 2021; Warrington et al., 2022; Wilmers et al., 2021). and these could arise, as our results indicate, from fixed and non-plastic animal responses to humans who changed their activities”.

      To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      Unfortunately, we have not measured changes in avian abundance or movements. But, please, note that the change in human mobility in sampled cities might be not as dramatic as initially thought and we consider this scenario to be most plausible in explaining no significant differences in avian escape responses before and during the COVID-19 shutdowns (see Fig. 4). Nevertheless, we add your point into the Discussion: If our findings imply that in birds the reaction norm to human presence is fixed over the studied temporal scale, the putative changes in human presence might then imply changes in avian abundance or movement (L293 and text below it).

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g. remotely sensed variable would provide additional context here.

      Urbanity mirrors the long-term level of human presence in cities whereas we were interested mainly in the rather short-term effects of potential changes of human presence on bird behaviour. Thus, we are not sure how adding such variable will help elucidating the current results. Please, also note that we added the country-specific analysis. Site indeed accounted for considerable amount the total variance in escape distance and that is why it was included as random intercept, which controls for non-independents of data points from each city. This could partly help us to control for difference in habitat type (e.g. urbanization level) within cities.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      We believe the initial Fig S4, now Figure 2, addresses this point. The between years temporal variation in FIDs exceeds the variation due to lockdowns. This is true both for measures taken in consecutive years, as well as for measures taken far apart.

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.

      We agree and have added this point (L301-303). Thank you.

      ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g. nest provisioning, etc.

      Please, note that we controlled for the date in our analyses. Date was used as a proxy for the progress in the breeding season (L463-464 and Fig. 1 caption). Note that we collected data only from foraging or resting individuals, and data were neither collected near the nest sites nor from individuals showing warning behaviours, which we now note (L400-401).

      iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.

      We discussed this at L307-308 and 381-383. Escape behaviors from humans are highly consistent for individuals, populations, and species (Carrete & Tella 2010, Biol Lett 23:167–170; Díaz et al. 2013, PloS One 8:e64634; Mikula et al. 2023, Nat Commun 14:2146). Whether such behavior is consistent across contexts is less clear (e.g. Diamant et al. 2023, Proc Royal Soc B, in press; but see, e.g. Radkovic et al. 2019, J Ecotourism 18:100-106; Gnanapragasam et al. 2021, Am Nat 198:653-659). Escape distance is often not measured simultaneously, for example, with human presence. In other words, whereas general level of human presence may have no effect on escape distance, the day-to-day or hour-to-hour variations might. We need studies on fine temporal scales (day-to-day or hour-to-hour) using marked individual to elucidate this phenomenon.

      iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      We agree, now use also Google Mobility Reports and # of humans data to elucidated this phenomenon and have added such interpretations to L253-292 and, e.g. Fig. 4.

      LITERATURE CITED Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

      Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      We are grateful for your positive appraisal of our manuscript, as well as for your helpful critical comments. We toned down the discussion to claim, as suggested by you, that we did not find evidence for effects of covid-19-shutdowns on escape behaviour of birds in urban settings (see Results and Discussion sections). In general, we attempted to provide a more nuanced discussion and reporting of our findings. We also changed the manuscript title to “Urban birds' tolerance towards humans was largely unaffected by the COVID-19 shutdowns” and added validation using Google Mobility Reports (Fig. 5 & S6, Table S3a and S5) and the actual number of humans (Fig. 6 and S6; Table S3b-e and S6). Note however that there is only a single robust study on the topic of shutdown and animal escape distances (Diamant et al. 2023, Proc Royal Soc B, in press), i.e. the topic is largely unexplored (e.g. L99-101), whereas we discuss our finding in light of shutdown influences on other behaviours (L293-329).

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:

      We tried to clarify our reasoning and increased consistency in our claims in the Introduction. Please, note that we simplified the Introduction and now provide one main expectation: FIDs of urban birds should increase with decreased human presence. This pattern is robustly empirically documented, regardless of the mechanism involved (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146). Please, see our revised Discussion for a more comprehensive discussion of mechanisms which could explain the patterns described in our study.

      1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.

      We agree and have deleted mentions that humans are perceived as harmless.

      1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g. raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.

      Thank you for this comment. We deleted this prediction and largely rewrote Introduction based on your comments and comments from the other reviewers.

      1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112).

      We now attempted to write this more clearly while incorporating your suggestions. In the Discussion, we now propose various hypothesis that can, but need not be mutually exclusive. Please, note that we simplified the Introduction and now provide one main hypothesis: FIDs of urban birds should increase with decreased human presence.

      In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      We regret being unclear. We do accept some degree of plasticity. Yet, our study design prohibits the assessment of the degree of individual plasticity because sampled birds were not individually marked and approached repeatedly. We tried to soften the statements in our Discussion to not fully dismiss a possibility that urban birds have some degree of plasticity in their antipredator behaviour (L293-329). Note however, that while our data collection was not designed to test how hour-to-hour changes in human numbers influence escape distance, the effect of the number of humans (i.e. hour-to-hour variation in human numbers) in our sample was tiny.

      The date and hour effect simply control for the particularities of the given day and hour (e.g. warm vs cold times or the time until sunset). In other words, the within species differences (even from the same park) may have little to do with individual plasticity, but instead may reflect between individual differences. We now add this issue to Methods (L471-476): “This approach enabled us to control for spatial and temporal heterogeneity and specificity in escape behaviour of birds (e.g. species-specific responses, changes in escape distances with the progress in the breeding season, spatial and temporal variation in compliance with government restrictions or particularities of the given day and hour)....”

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it.

      At L138-141 and 327-329 we discussed the within and between genera and cross-country variation and stochasticity in response to the shutdowns (Fig. 2). The reference to species-specific plots was perhaps a little bit misleading. We think that the essential figure, that we now reference at this point, is Figure 2 that shows the temporal trends and/or stochasticity that seem to have little in common with lockdowns. Please, also look at Figure 3 and S3-S4. These show that in all selected genera/species, the trends did not significantly deviate from central regression line which indicates no change in FID before and during the COVID-19 shutdowns.

      On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e. Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      We are grateful for this valuable comment. We believe the general approach makes sense as there is a general expectation about how birds should respond to changes in human presence. That is why we control for non-independence of data points in our sample. Thus, although lots of data come from a few European species, this is corrected for by the model. Note that given the sheer number of sampled species, some site- or species-specific trends may have occurred by chance. Importantly, we believe that Figure 2, with species-site specific temporal trends, reveals that the between year stochasticity in escape distances seems greater that any effects of lockdowns. Nevertheless, we have further dealt with this issue in the revised manuscript by running country-specific models which again clearly showed no significant effect of Period on escape behaviour of birds (including, no effects in Poland and Finland).

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:

      3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.

      3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.

      We are slightly confused by these comments.

      3.1) The cities are expected to be different but (i) the difference may be smaller than imagined (e.g. park structures, managed grass cover, few shrubs and deciduous-dominated tree species) and (ii) we expect the effects of lockdowns to be similar across cities. Whether we have no people in Rovaniemi parks (which despite Rovaniemi’s small size are usually extremely well-visited) or no people in Melbourne parks should not make a difference in principle. Note however, that to avoid overconfident conclusions, we allow for different reaction norms within cities. Please, also note that we are now providing country-specific results which should identify whether shutdowns lead to different reaction in sampled countries. We found no strong effect of shutdowns in any of sampled countries/cities.

      3.2) Because of the possible between site differences at the starting point, we use study site as random intercept and control for the between site reaction norms by including the random slope of the period. In other words, such possible differences do not influence outcomes of our models. Regardless, our a priori expectation is that the human activity levels in a given park was similar prior to covid and hence in 2014, 2018, and 2019. Again, we are now providing country-specific results which identify whether shutdowns led to different reactions in sampled countries, which they mostly did not

      3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.

      We agree that each city or even park within the city has its specific environmental conditions (here including the time point of lockdown). That is why we control for city and park location in the random structure of the model (see Method section). We now add results per country that shows no clear differences (e.g. Fig. 1).

      However, the aim of our study was to test for general, global effects of lockdowns, which are minimal. Note that we now specifically test for country-specific effects in separate models on each country (e.g. Fig. 1, Fig S6) but all country-specific effects are small and still centre around zero.

      3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      We agree. In Finland and Hungary, data were collected by two closely cooperating observers. In Poland, all data were collected by a single observer. In the Czech Republic and Australia, a single observer (P.M. and M.W., respectively) sampled 46 sites out of 56 and 32 sites out of 37, respectively. Each site was sampled by the same observer both before and during the shutdowns. We now clearly state it in the Methods (L352-356). In other words, our models already largely control for the possible observer confound by having site as a random intercept. Moreover, previous study showed that FID estimates do not vary significantly between trained observers (Guay et al. 2013, Wildlife Research, 40, 289-293).

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      Thank you for this comment. We now validate the use of the stringency index with the Google Mobility Reports, showing that human mobility generally (albeit in some countries relatively weakly) decreases with the strength of governmental antipandemic measures. Please, note that our main research question is related to the general change in human outdoor activity and not to week-to-week, day-to-day or hour-to-hour changes captured by stringency index, Google Mobility or the number of humans during an escape trial data. Nevertheless, using Google Mobility and the number of humans as predictors led to the similar results as for stringency index and Period (Fig. 1 and S6). Please, see extended discussion on this topic in our manuscript (L270-292).

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      We now added information that most birds were sampled when on the ground (79%). Importantly, previous studies have found that perch height has a minimum effect on FIDs (e.g. Bjørvik et al. 2015. J Ornithol 156:239–246; Kalb et al. 2019, Ethology 125:430-438; Ncube & Tarakini 2022, Afr J Ecol 60:533– 543; Sreekar et al. 2015,. Tropic Conserv Sci 8:505-512). We added this information to the Method section (L394-395).

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      We appreciate your concern. We cannot fully exclude the possibility of sampling some individuals twice. However, we sampled during the breeding season within which most birds are territorial, active in the areas around the nests and hence an individual switching parks is unlikely. Also, most sampled birds in our study are passerines which have small territories (typically few hundred square meters). Some larger birds may have larger territories and move larger distance to forage (e.g. kestrels which often forage outside cities) but these birds represent a minority of our records and we have not sampled outside the cities.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      There were 141 unique day-years during before COVID and 161 during COVID. So, the sampling effort as calculated by days does not explain the difference in species numbers. Whether the actual effort, which was 381 vs 463 h of sampling, explains the difference is unclear, which we now note in the Methods (L476-483). If not, your proposition is possible, but we would like to avoid any speculations on this topic in the manuscript as it is difficult to infer species diversity from FID sampling.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      We are confused by this comment and think this reflects a misunderstanding. Period and stringency index are explanatory variables of interest that were never included in the same model and hence their correlation does not contribute to the within a model multicollinearity. To avoid further confusion, we note this within (Fig. S2) legend. However, we must be cautious when interpreting the results from the models on period, Google Mobility, # of humans and stringency index, as the four measure are similar.

      We discuss multicollinearity of explanatory variables within the manuscript (L458-538, 548-550) and noted that, with the exception of temperature and day within the breeding season (r = 0.48), the correlations among explanatory variables were minimal. We thus used only temperature as an explanatory variable (i.e. fixed factor; also because temperature reflects both season and variation in temperature across a season) whereas the day was included as a random intercept to control for pseudoreplication within day. Collinearity between all other predictors was low (|r| <0.36).

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

      The random structure of the model controls for possible pseudoreplication in the data, that is for the cases where we have multiple data points that may not be independent and hence technically represent one. Apart from that, random structure tells us about where the variance in the data lies. This is often of interest and your previous questions about city, site or species specificities can be answered with the random part of the model. To follow up on your example, year is included in the model because data from a single year are not independent (for example because of delayed breeding season in one year vs. in another).

      We regret being unclear about the model specification and have attempted to clarify the methods (L466-476). We first specified a model with an ideal random structure that necessarily was complex (perhaps too complex). We then showed that using models with simpler random structures did not influence the outcomes. We now use a simpler model within the main text, but do keep the alternative models to show that the results are not dependent on the random structure of the model (Fig. S1 and Table S2).

      Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.

      Specifically,

      1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.

      2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).

      3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.

      4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Thank you for your supportive and positive comments.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.

      Thank you for your suggestions. We agree. To capture the general human presence in parks, we now incorporated an analysis using Google Mobility Reports (Fig S6b) that directly measures human mobility in each of sampled cities and specifically in urban parks where most our data were collected, and also address your further concerns that you detail below. Albeit not the main interest of our study, we now also incorporated an analysis using actual # of humans during an escape trial (Fig. S6c).

      Moreover, we think that including further possible confounds should not influence our conclusions. In other words, including further confounds will decrease the variance that can be explained by shutdowns and thus such shutdown effects (if any) would be tiny and hence likely not biologically meaningful.

      Specifically,

      1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.

      Thank you for this essential comment. Initially, we viewed Oxford Stringency Index as the best available index for our purposes. However, we now further acknowledge its limitations (L) and validate the Oxford Stringency Index with the Google Mobility Reports data, showing that both indices are generally negatively (albeit sometimes weakly) correlated across sampled cities (i.e. human mobility decreases with the increasing stringency index). Although other human presence indices were used in the past, e.g. Cuebiq, Descartes Labs and Maryland Uni index, Apple (see Noi et al. 2022, Int J Geograph Info Sci, 36, 585-616), we used only the Google Mobility index because (a) it is publicly available, (b) is available also for territories outside US, and (c) provides data for urban parks within each city included in our dataset. Note however that Google Mobility data are inappropriate to answer our primary question, i.e. whether changes in human presence outdoors due to the COVID-19 shutdowns had any effect on avian tolerance towards humans. First, Google Mobility was available only for 2020-22, i.e. the baseline pre-COVID-19 data for 2018-2019 were unavailable. Thus, there was no way to check whether the human activity levels really changed during the COVID-19 years. Second, Google Mobility data are calculated as a change from 2020 January–February baseline for each day of the week for each city and its location (here we used parks). In other words, the data are not comparable between days and cities, albeit we attempted to correct for this within the random structure of the mixed model. Also, the data may be influenced by extreme events within the 2020 Jan–Feb baseline period (see here). Third, the Google Mobility varies greatly between days and across season (see Fig 4 & S5 or the first figure in these responses), likely more than the possible change due to shutdowns. Nevertheless, we found that results based on Google Mobility are qualitatively very similar to results based on stringency index. Moreover, we showed that the relationships between # of humans and both Google Mobility or Stringency index (Figure 6) are weak and noise with 95%CIs widely overlapping zero (Table S3b-e). Also, similarly to other predictors of human presence, # of humans only poorly predicted changes in avian escape distances. We added details on the new analysis into the Methods and Results and Supplement (L134-165 and associated figures and tables, L415-535).

      2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models.

      We are confused by this statement. The fact that the FIDs varied does not translate directly to that our models did not account for the variation. Nevertheless, we do control for most of the discussed confounds (see further answers below). Importantly, it is unclear how including further possible confounds should influence our conclusions, unless the lockdowns effects are tiny, in which case those might not be biologically meaningful.

      Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:

      a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.

      We agree. Although we don’t have a precise city-specific information on the timing of breeding stages in sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained relatively high portion of the variance in our data (see Table S1 and S2) - perhaps something you expected.

      b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.

      We agree. We don’t have such fine-scale data, but we included site identity (typically within a particular park or cemetery) which should account for the habitat heterogeneity among localities. Depending on the model, site explained relatively little variance (1-6%), indicating low heterogeneity between localities in these undescribed characteristics. Also note that park structure may be quite similar both within and between cities, i.e. managed green grass areas, with only a few shrubs and deciduous trees. Therefore, the possible minor habitat heterogeneity should not have any great impacts on our results.

      c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.

      We agree that FIDs can be correlated with many possible factors. Here, we were interested in general patterns, while controlling for FID differences between species, as well as for possible species-specific reaction norms to lockdowns. Whether neophobic vs neophilic population or precocial versus altricial species react differently to lockdowns might be of interest, but it is beyond the scope of this study. However, that population and population specific reaction norms explain little variation (Table S2a, 0-6% of variation) so such a confound should not substantially influence our conclusion much. We do not have fine-scale data on the level of neophobia, but the effects of lockdowns seem similar for precocial (see Anas, Larus, Cygnus) and altricial (the remaining, mostly passerine) species in our dataset (see Fig. 3 and S3-S4). Please, note that we sampled mainly adults (L386). Moreover, the effects for clades, which may differ in their cognitive skills, are also similar (e.g. Corvids vs. Anas or Cygnus; Fig. 3).

      d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.

      While all observers were trained, the three methods may add some noise to the FID estimates. However, the FID estimates from a single method may still slightly differ between observers (so do well standardized morphology measurements; Wang, et al. 2019, PLoS Biology, 17, e3000156). Importantly, FID estimates are highly replicable among skilled observers (Guay et al. 2013, Wildlife Research 40:289-293), and we previously validated this approach and showed that distance measured by counting steps did not differ from distance measured by a rangefinder (Mikula 2014, Ardea 102:53-60), which we now explicitly state (L391-394). Importantly, we control for observer bias by specifying locality as a random intercept (see further details in our response to the Editor). Moreover, each site was sampled by the same observer both before and during the shutdowns.

      3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.

      We regret being unclear in our methods. Our full model uses all data, but alternative models (see e.g. Fig. S1) used data with ≥5 as well as ≥10 observations before and during lockdowns for a given species. Importantly, Figure 2 and 3 depict data for species sampled at specific sites. We now clarify this within the Methods (L460-483) and the Results (L125-133 and associated figures) and in the figure legends (Fig. S1).

      4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      The number of predictors and random effects is well within the limits for the given sample size (Korner-Nievergelt et al. 2015. Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan). Importantly, simpler models give similar results as the more complex ones (Fig. S1) and the visual (model free) representations of our raw and aggregated data confirm our model results. This, we suggest, makes our findings robust and convincing.

      Overarching main conclusion

      Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity).

      We find this statement confusing. We accounted for most relevant confounding factors and found little evidence for the strong effect of pandemic. Moreover, we now added country-specific analyses that confirm the lack of evidence, highlight the Figure 3 that shows no clear shutdown effect and also explore how levels of human presence changed over and within the years. Adding more possible confounds (albeit note that not many are left to add) might only further reduce the variation that could be explained by pandemic and hence such hypothetical effects of pandemic will be if anything small and thus likely not biologically meaningful.

      Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats.

      It is unclear why you think our study lacks support for the conclusion that FIDs changed little during pandemic, if all results show no such effects. However, we toned down our Discussion and highlighted also potential issues linked to our approach (e.g. that sampled individuals were not marked and hence we cannot distinguish between various mechanisms that might explain the described pattern (L293-329) or that human presence may not have changed (L253-269). For further details see our previous response.

      Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

      We appreciate and agree that our study demonstrates the challenges in quantifying human activity to understand bird escape distance and we added a paragraph on this topic to the discussion (L270-292).

      Nevertheless, we hope that our above responses clarify and address most of the issues you had with our manuscript. We tried to show that (a) most of your proposed controls are indeed included in our study design, models, and visualisations, and that (b) multiple evidence (from models and visualisation of raw and aggregated data) support the no overall effect conclusion. We further emphasize the temporal and between- and within-species variability in FIDs in the Results and now specifically indicate that lockdowns did not influenced FIDs above such variability (Fig. 2-3, Fig. S3). In other words, the natural (e.g. temporal) variation in FIDs seems far greater that potential effects of lockdowns (Fig. 2). We believe that even if lockdowns would have tiny effects that could have been detected with more. stringent experimental design (e.g. individually tagged birds) or even more complex models, such effects would be far from being biologically meaningful.

    2. Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance).

      Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends. However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g., would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e., we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g., the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species. Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021). There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that *did* observe changes in space use or abundance - i.e., changes in space use could arise precisely *because* responses to humans are non-plastic but the distribution and activities of humans changed. To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g., remotely sensed variable would provide additional context here.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.<br /> ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g., nest provisioning, etc.<br /> iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.<br /> iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      LITERATURE CITED<br /> Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

    3. Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:<br /> 1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.<br /> 1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g., raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.<br /> 1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112). In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it. On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e., Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:<br /> 3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.<br /> 3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.<br /> 3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.<br /> 3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

    1. AbstractRecent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyse genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customisable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly - combining the short input reads into longer, contiguous fragments (contigs), and binning - clustering these contigs into individual genome bins. Both processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully-automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data, and by combining multiple binning algorithms with a bin refinement step to achieve high quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets, and the impact of available assembly and binning strategies on the final results. The workflow is freely available at https://github.com/vinisalazar/metaphor.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giad055) and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Thomas Brüls **

      The authors present a snakemake-based workflow to automate and chain the main computational ingredients (assembly and binning) of genome-centric metagenomics; the authors developed a technically sound tool for this purpose, and by itself it is certainly valuable to the research community and worth of publication. however, even if the article is casted as a technical note -hence with an emphasis on the design, implementation and assessment of the tool-, I feel that a more thorough discussion of both its abilities and inabilities (e.g. strain resolution, detection of low abundance organisms, identification of virus bins, etc) would be worth for a more general audience. On the same token, a more deep discussion of some of the results obtained with their tool (see below) would be of interest and would also illustrate useful use cases. I would suggest the following modifications/additions:-the experiments with the strain madness dataset suggest that the genomes (or fragments thereof, i.e. the bins) resolved should be viewed as "species" genomes, or composite genomes possibly originating from multiple strains. if so, do the authors think this represents a hard limit to the assembly + binning approach, or could further existing tools (e.g. performing variant detection on top of cross-assembly before the binning step) be integrated or developed in the future for strain-resolution (i.e. to identify strains not dominant in any sample)? -related, a simple summary of the number of individual strains recovered in individual bins for the strain madness experiment would be interesting.-another issue that would be worth discussing in my opinion is the impact of genome abundance on the recovery of corresponding bins and their quality. the platform developed by the authors appears to be well suited for such kind of analyses and the results would be of both theoretical and practical interest. to put it simply, what is the minimal initial coverage of genomes required in order for them to be recovered in bins of a given size and quality?-rem: theses two issues (strain-level diversity and individual strain genome abundances) likely interact to limit bin resolution, and this could be mentioned by the authors.-the data presented by the authors suggest that the metabat binning engine significantly outperforms the other two tools (concoct and vamb, which are both widely used), see e.g Figure 2; what would account for that, and do the authors think this is a general observation (i.e. beyond the specific CACB setting or marine metagenome shown in Fig 2)? -a bin refinement step (based on the DAS tool and dereplication) is frequently mentioned but should be more detailed (including a precise definition of the bin quality metric used).

      further rather minor comments: -in the abstract, when mentioning "technical challenges associated with...", it would be worth mentioning that algorithmic challenges are present as well. -in the introduction, "It is hypothesised that pooled assembly and binning may lead to improved results when analysing communities with high genetic diversity, and to poorer results when there is a high level of intraspecies/strain-level diversity". I would assume there are many instances in the real world that are both, i.e. that present both high inter-species and intra-species genetic diversity, what then?-in the future directions, the authors mention the identification of eukaryotic and viral contigs and bins, and could shortly elaborate how this could be done properly. -the sentence "In summary, our assessment of ..." at the end of the ms appears to have a syntactic problem.

    1. AbstractHetnets, short for “heterogeneous networks”, contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes — including genes, diseases, drugs, pathways, and anatomical structures — with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open source implementation of these methods in our new Python package named hetmatpy.Competing Interest Statement

      **Reviewer 2. Paolo Provero **

      In this work Himmelstein and collaborators introduce a statistically controlled way of extracting significant node pairs in heterogeneous networks (hetnets) without relying on a ground truth and related training. The method "explains" why two nodes are significantly connected by extracting the metapaths most responsible for the enrichment. This is based on computing a null distribution of the DWPC, which allows assigning a P-value to each metapath joining two nodes, and then to visualize the individual paths responsible for the enrichment. The method is novel and significant, and can be in principle be applied to many hetnets, in life sciences and beyond, when a ground truth is not available or not desirable as it would introduce bias. The software tools developed appear to be readily available to other researchers.

      Major comment: If I understand correctly, given two nodes (say "Alzheimer disease" and "Circadian rhythm") the method extracts, in a statistically controlled way, the most significant metapaths joining the two nodes, and then the individual paths responsible for the enrichment. But this is not the most obvious question a life scientist would ask the network, which would be instead something like "Which are the pathways most significantly connected to "Alzheimer disease"? Indeed this type of question would be the one to ask when aiming for drug repurposing (possibly replacing "pathways" with "compounds" or "pharmacologic classes"). Based on Fig. 4A, the pathways are presented, or "suggested," in decreasing order of number of metapaths, but this is hardly a ranking by significance. Would it be possible to summarize the results in such a way as to rank the pathway nodes connected to a given disease node by significance (or more generally to rank the nodes of a certain type by the significance of their connection to a given node of another type)? This should be discussed.

      I also have several minor concerns. (1) The authors introduce and compute a null distribution of the DWPC which takes into account node degree in a statistically controlled way when evaluating the connectivity between two nodes. However, the DWPC itself does take into account node degree, as the name implies, and contains a tunable parameter that can be optimized, at least when a ground truth is available (as in Ref 39 by the same first author). I understand such tuning is not possible when, as in the present case, no ground truth is available, but the authors should make this point more clearly. (2) I find Fig. 1B a bit confusing: according to the legend, the top rows are known treatments, which should have higher than expected connectivity. However, based on the colors as explained by the legend, the bottom treatment/disease pairs seem to have higher connectivity (3) The acronym DWPC is defined after it has been used several times (4) The legend of Figure 2 should specify that these results apply to the nodes "Alzheimer disease" and "Circadian rhythm", although this becomes clear in Fig. 4 (5) I don't think Figure 3, representing the home page of the web site, is especially useful (6) I found Fig. 4 confusing: the sum of the path counts for the selected metapaths in panel B is way larger than the 425 results shown in Panel C. As far as I understand no path can belong to more than one metapaths, so is there some further selection here? (7) The "Frontend" section of the Methods seems a bit too detailed for the Gigascience audience.

      Re-review: The authors have addressed all my comments in a satisfactory way.

    1. Author Response

      Reviewer #2 (Public Review):

      This work attempts to connect the diet of a mother to the physiology and feeding behaviors of multiple generations of her offspring. Using genetic and molecular biology approaches in the fruit fly model, the authors argue that this Lamarckian inheritance is mediated by germline-inherited chromatin and is regulated by the general activity of a histone methylase. However, many of the measured effects are small and variable, the statistical tests to prove their significance are missing or poorly described, and some experiments are inadequately described and lack important controls.

      1) The authors claim that the diet of a mother can influence the physiology of her progeny for several generations. However, the observed effects of maternal diet on later generations were small and variable for most assays (see Fig1C, S1.1A, B, D). Additionally, the effect size between F0 HSD to ND was often larger than the effect size between the progeny of F0 parents and ND. To put it another way, if the authors were to compare the F1, F2, etc. to the F0 HSD flies, they would conclude that the majority of the response to diet is not maternally transmitted, and is directly controlled by the diet of the individual being measured.

      We agree with the reviewer that the effect size of acute HSD exposure (in HSD-F0 flies) was stronger than that of transgenerational inheritance (in HSD-F1/2/3/4 flies). Similar observations were also made in other studies, see Klosin et al., Science, 2017, Bozler et al., eLife, 2019. We would argue this difference in effect size was as expected and with clear biological relevance.

      For all living organisms, acute environmental changes (diet change included) have direct and profound influences on their survival and reproduction, and therefore need robust and immediate responses. In comparison, ancestral environmental changes may only provide some vague and indirect indications of the current living environment of the offspring. Such information may be beneficial for the survival and reproduction of the offspring, but the effect size is expected to be much smaller, or at least smaller than that of acute environmental changes.

      Studies on Dutch Famine offers a good example. Human individuals who were prenatally exposed to famine were found to be associated with greater risk in metabolic diseases (Ravelli et al., NEJM, 1976). But nevertheless, direct high-fat diet exposure was still the much stronger risk factor for obesity and metabolic disorders (Bray et al., Am J Clin Nutr, 1998, Jéquier et al., Int J Obes Relat Metab Disord, 2002).

      We have added additional discussions in the manuscript for clarification.

      Furthermore, since our current study aimed to investigate the mechanism of behavioral transgenerational inheritance, we focused on the comparison between HSD-F1 flies (and their progeny) vs. ND-fed flies. As the ancestors of HSD-F1/2/3/4 flies were exposed to HSD, whereas HSD-F1/2/3/4 flies themselves were never exposed to HSD, any difference we observed between the two groups could be solely attributed to transgenerational inheritance of ancestral HSD exposure. With that saying, to better distinguish the effects of acute HSD exposure vs. transgenerational inheritance upon ancestral HSD exposure, we re-analysed and presented the comparisons among ND, HSD-F0, and HSD-F1 data in the manuscript (Figure 1. B-E, Figure 1-figure supplement 1. A-E, Figure 1-figure supplement 2. A-D, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B).

      2) The authors chose to study PER, which had the largest average effect sizes between conditions. However, PER was highly variable in the averaged data, with some individuals showing large effects and others having no effects. A better characterization of transgenerational PER may increase the robustness of this assay and confidence in its results. For example, the authors could measure PER in lineages derived from individual flies to determine when transgenerational effects on PER decline or disappear. This form of data collection could help to explain the high variation in the averaged data presented in the paper.

      We acknowledged that PER in general was quite a variable behavioural trait (probably as to most if not all behavioural measures). It was not surprising since animal behaviours, as complex traits, could be influenced by numerous intrinsic and extrinsic factors, such as genetic background, developmental environment, diet, population density, environmental conditions, etc. Numerous PER studies have exhibited similar variability (Masek et al., PNAS, 2010, Marella et al., Neuron, 2012, Charlu et al., Nature Communication, 2013, Wang et al., Cell Metabolism, 2016, Wang et al., Cell Reports, 2020).

      Nevertheless, in our current study we were able to identify statistically significant behavioural difference between ND-fed flies and HSD-F1/2/3 flies, demonstrating that ancestral HSD exposure imposed transgenerational inheritance on sweet sensitivity. To further increase the robustness of the study as suggested by the reviewer, we have conducted additional repetitions of many PER experiments and further confirmed the phenotype with less variability and more statistical power (Figure 1. G-I, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B). The reviewer also suggested the use of isogenic flies, which might help to minimize the variations of genetic background. However, we think that demonstrating the behavioural difference in genetically diverse fly populations is a more credible way to show that such transgenerational inheritance is a reliable and generalizable phenomenon.

      3) What do the error bars represent on any figure? There are many examples where the data is highly variable and lies completely outside of the error bars. What is the statistical test for significance that is carried out in each figure? The brief comment about statistics in the methods section is inadequate. The authors should also supply the raw data used to generate the figures so that readers can perform their own statistical tests.

      Data in the manuscript were represented as means ± SEM (standard error of the mean) in all of our figures, which is a standard practice in the field (Masek et al., PNAS, 2010, Charlu et al., Nature Comm, 2013, Wang et al., Cell Metabolism, 2016). We have provided detailed explanations of the statistical tests in the manuscript. We have also prepared raw data files as suggested by the reviewer.

      The model that global H3K27me3 is regulated by ancestral diet is unconvincing without further experimental validation and explanation. Points 4-10 address specific issues.

      4) The authors performed ChIP on cycle 11 embryos. This stage is extremely short (11 min) and contains roughly 10 times less chromatin than embryos only 30 minutes older. These features make it very difficult to collect large numbers of precisely staged embryos without significant contamination. It is also debatable whether early cell cycles (including and preceding cycle 11) are slow enough to deposit and propagate histone marks in the presence of new histone incorporation. See the opposing arguments in Zenk et al 2017 and Li et al 2014. The authors could perform ChIP on older embryos to avoid this controversy.

      We thank the reviewer for the clarification. Our embryo collection protocol involved allowing flies to lay eggs freely in a cage for 30 minutes followed by 50 minutes of incubation on a juice plate, and then completing the embryo sorting within 30 minutes. Therefore, to describe it in a more stringent way, our embryos should be in the stage between cycle 10-12. We have corrected this information in the manuscript (Figure 2. A).

      Since all the embryos were sorted using the same morphological criteria within the same time frame, their developmental stages should be comparable (i.e. all from cycle 10-12). In several references we consulted, a broader range (cycle 9-13) was used for ChIP-seq sequencing analysis (for example, see Zenk et al., Science, 2017).

      Surely any maternally inherited information will also be present in cycle 14 or 15 embryos if it is to influence the development or physiology of the brain. The observed differences in global H3K27me3 levels in F1 vs ND flies could be explained by slightly different aged embryo collections or technical variations in the ChIP protocol. The authors could strengthen their conclusion by performing more ChIP replicates. Alternatively, the authors could use orthogonal approaches like antibody staining or western blots to measure global H3K27me3 levels in precisely staged embryos.

      We chose to use cycle 10-12 embryos because we aimed to identify epigenetic modulations directly transmitted through the maternal germline. Embryos in cycle 14-15 might reveal more profound changes, but since embryos in that stage had entered the zygotic phase and started the remodeling of histone modifications, we think it might mask the maternally transmitted changes we sought to identify.

      In addition, we conducted two biological replicates for each group for the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, Ing-Simmon et al., Nature Genetics, 2021). In the current study we further verified the genes identified in the ChIP-seq analysis in RNA-seq and qPCR analysis.

      We further verified the ChIP-seq results by using western blot, which showed a ~2 folds increase in H3K27me3 modification in HSD-F1 early embryos vs. ND-fed embryos, in line with the ChIP-seq data (Figure 2-figure supplement 1. B). We have also provided immunofluorescence results for embryos at cycle 13 and cycle 14, which clearly showed a significant increase in H3K27me3 modifications in HSD-F1 embryos (Figure 2-figure supplement 1. C).

      5) The authors measure PRC2 subunit mRNA levels in adult fly heads to attempt to explain the observed differences in inherited H3K27me3 levels in fly embryos. The authors should examine PRC2 components in germ cells and early embryos to understand how germ cells and early embryos generate H3K27me3 patterns.

      We have now shown that Pcl and E(z) mRNA expression in HSD-F0 flies were not significantly changed vs. ND-fed flies (Figure 2-figure supplement 2. D-G). Meanwhile, H3K27me3 demethylase UTX and H3K27ac acetyltransferase Cbp showed significant decrease (Figure 2-figure supplement 2. H). Therefore, HSD exposure imposed complex epigenetic modifications in HSD-F0 flies, which then led to transmission of epigenetic marks to their progeny. Given the main scope of this study was to understand which epigenetic program mediated the behavioral transgenerational inheritance upon ancestral HSD exposure (but not that mediated acute HSD exposure), we focused our effect on H3K27me3 which was significantly changed between HSD-F1 flies vs. ND-fed flies.

      6) The RNAi experiment targeting PRC2 components in embryos is uninterpretable without appropriate controls and an explanation of the genotypes used in the experimental paradigm. Are the authors crossing nosNGT mothers to UAS-RNAi fathers and assaying the progeny? What is the genotype of the F1 flies and how does it compare to the genotype of the ND flies? The authors should also note that the Gal4 drivers they use are not necessarily restricted to the ovary, and could directly affect other tissues controlling PER like neurons and muscle. Additionally, the authors should supply the appropriate controls to verify that their experimental paradigm has the intended effect. PRC2 proteins are presumably loaded into embryos and would be immune to zygotic-expressed RNAi. The authors could validate when PRC2 RNAi is effective by staining embryos for H3K27me3.

      We have now added schematic diagrams and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3-figure supplement 1. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls. We have also noted in the manuscript that the GAL4 drivers we used were not restricted to the ovary.

      We have now verified the effect of PRC2 knockdown to reduce H3K27me3 in female germline by both western blot and immunofluorescence staining (Figure 3. B-C).

      7) Although the authors do not note this, nosNGT>RNAi affects the PER of ND flies (compare Gal4>RNAi to just RNAi or just Gal4 in ND columns in Fig3A-D). This could be due to RNAi expression in neurons or muscles or some other indirect effect. Regardless of the mechanism, this result makes it difficult to interpret how RNAi treatments affect the transgenerational inheritance of PER if there is an equivalently strong nontransgenerational effect.

      Although nosNGT>RNAi appeared to slightly affect PER response of ND-fed flies, there was no statistically significant difference (Figure 3-figure supplement 1. B and D, Figure 3-figure supplement 2. A-B). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3-figure supplement 1. B), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      8) The matalpha gal4 experiment is inadequately explained in the text or methods. Are the authors expressing RNAi in the ovaries of the F0 flies that are fed an HSD? Does the ovary influence their PER somehow? Similar to point 8, there appears to be a nontransgenerational component to the RNAi phenotype that clouds the interpretation of the transgenerational effect (compare F0 in S3.1A-C).

      We have now added a schematic diagram and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls.

      Similar to point 7, although Mat-tub-GAL4>RNAi might seem to affect PER responses of ND-fed flies, there was no statistically significant difference (Figure 3. D-E). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3. D), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      9) For the EED inhibitor experiments (both PER and calcium imaging), it is unclear whether the authors fed the mothers or their adult progeny the EED inhibitor. If adult progeny were fed, what tissues were affected? The authors should stain various tissues with an H3K27me3 antibody to verify the effectiveness of their inhibitor. Finally, the effect of the EED inhibitor on calcium imaging was not convincing because the variation was so large.

      We have added a new schematic diagram and provided more detailed explanations in the manuscript for pharmacological interventions (Figure 4. A-D). To verify the effect of the drug treatment, we showed that compared to the control group fed with DMSO, flies fed with the inhibitor showed a significant decrease in H3K27me3 levels, demonstrating the effectiveness of the inhibitor (Figure 4-figure supplement 1. A).

      We acknowledged the unsatisfactory quality of our calcium imaging experiments in our initial submission. We have now improved our experimental procedures to reach better data quality, while the conclusions remained consistent (Figure 4. E).

      10) In all of the PRC2 RNAi and inhibitor experiments, are there any other phenotypes that would suggest that the treatments are working? There are many published PRC2 loss-offunction phenotypes (molecular and developmental) in different tissues. The authors could assure the reader that their treatments are working as expected by doing these controls.

      As discussed above, we have now used western blot and immunofluorescence staining to validate the efficiency of PRC2 RNAi in female germline (Figure 3. B-C).

      11) The authors propose that a transgenerationally inherited state of the caudal gene is responsible for the transgenerationally inherited PER. However, the experiments investigating the methylation state and expression level of caudal are unconvincing. Cad mRNA abundance varied immensely in the ND RNAseq samples. When the authors compared cad levels across generations, the effect size was small. A single outlier in the ND sample in both the RNAseq and the RTPCR experiments appears to drive up its mean and effect size. The H3K27me3 ChIP on cad is very similar in the F1 and ND samples and the acetylation peak on its promoter appears unchanged. The authors could vastly improve the caudal experiments in this paper by simply using cad antibodies to stain the relevant tissues that contribute to PER. For example, the authors could stain GR5a neurons for cad expression in different generations that inherit (or don't inherit) maternal PER to more accurately determine if cad levels are indeed transgenerationally regulated. The authors could also perform more ChIP experiments at a less variable stage to convincingly correlate epigenetic marks on cad with its expression level.

      As discussed above, we conducted two biological replicates for each condition of the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, IngSimmon et al., Nature Genetics, 2021). We have also performed western blot and immunofluorescence for H3K27me3 in ND vs. HSD-F1 embryos to further validate our ChIP-seq data (Figure 2-figure supplement 1. B-C).

      As for Cad gene, H3K27m3 signals showed a statistically significant difference between ND-fed and HSD-F1 flies (Figure 5. D). We have also conducted additional qPCR experiments to verify the gene expression changes of the Cad gene (Figure 5. F, right), which was in line with the ChIP-seq data and further supported its validity.

      It was worth noting that during the developmental time window of our ChIP-seq analysis, the acetylation signals in the promoter region of cad were very low (Figure 5. D), making it impossible to make a comparison.

      Reviewer #3 (Public Review):

      Jie Yang et al. investigated the transgenerational behavioral modification of a high-sugar diet (HSD) in Drosophila and revealed the underlying molecular and neural mechanisms. It has been reported that HSD exposure decreases sweet sensitivity in gustatory sensory neurons, resulting in reduced sugar response (Proboscis extension reflex, PER) in flies. The current study reports that this effect can be transmitted across generations through the maternal germline. Furthermore, the authors show that H3K27me3 modification is enhanced in the first-generation progenies of HSD-treated flies (F1), and genetical or pharmacological disruption of PCL-PRC2 complex blocks the behavioral change and restores the sweet sensitivity in the Gr5a+ sweet sensory neurons. The authors further analyze the differentially expressed genes in the F1 flies. Among H3K27me3 hypermethylated regions, they focus on homeobox genes and find a transcription factor Caudal (Cad), which shows decreased expression in the F1 flies. Knocking down Cad in Gr5a+ neurons results in decreased PER response to sucrose.

      Transgenerational changes in physiology and metabolism have been broadly studied, while inherited changes at the behavioral level are much less investigated. This work provides convincing evidence for transgenerational modification of feeding behavior and digs out the underlying molecular and neural mechanisms. However, there still are several concerns that need to be clarified.

      1) The epigenetic regulator PCR2 has been found to play an essential role in the 7d-HSDinduced modification of the PER response. In this study, it's important to clarify for the transgenerational change, whether epigenetic modification is required in the flies exposed to HSD (F0), the progenies (F1), or both. It would be very helpful for better interpretation if the procedures of HSD treatment in RNAi experiments and the drug treatments were stated in more detail. In addition, the F0 flies should be examined as the control.

      In this current study our main scope was to understand the transgenerational influence of HSD exposure on the progeny. To this aim, we chose to study the physiological and behavioral differences between ND-fed flies vs. HSD-F1 flies (and their progeny on ND). HSD-F1 flies (and their progeny) were not exposed to HSD in their whole life cycle and therefore the physiological and behavioral changes we observed vs. ND-fed flies could be solely attributed to epigenetic modifications transmitted via germline cells from HSD-F0 flies. Therefore ND-fed flies were used as the main control.

      As for HSD-F0 flies, the acute effects of HSD exposure could be more complex. Epigenetic factor was likely involved, as evident in Figure 3-figure supplement 1. C, Figure 3-figure supplement 3. A-B and Figure 4. C. In addition, HSD exposure might also directly affect gene expression and multiple signaling pathways in HSD-F0 flies (see Chen et al., Science China Life Sciences, 2020). Therefore, we did not aim to investigate how HSD exposure affected HSD-F0 flies in this current study. We have added additional discussions in the manuscript for clarification.

      With that saying, we still added more HSD-F0 flies as controls when needed (Figure 2-figure supplement 2. D-G, Figure 3-figure supplement 1. C, Figure 4. C, Figure 5. F, left).

      We have also modified the schematic diagrams and added more detailed explanations in the manuscript, in order to provide a clearer illustration of the experimental procedures (Figure 3. A, Figure 3-figure supplement 1. A, Figure 4. A, B and D). Specifically, we employed two different RNAi approaches. Firstly, we used genetic methods to obtain homozygous Mat-tub-gal4>UAS-gene X RNAi fly lines on chromosomes Ⅱ and Ⅲ for germline-specific knockdown (Figure 3, Figure 3-figure supplement 3). Secondly, we used heterozygous nosNGT-gal4>UAS-gene X RNAi flies for embryo-specific knockdown (Figure 3-figure supplement 1 and 2). Our drug experiments involved both treating the flies and measuring their PER (Figure 4. A-C) and treating the parental flies and measuring the PER of their progeny (Figure 4. D).

      2) The information on the drug treatment period is also missing for imaging experiments (Fig.4C). Moreover, the response curve is very different from those recorded in the same neurons in previous studies. What’s the reason? Please also provide a representative image showing which part of the Gr5a neurons is recorded.

      The experimental procedures of drug treatments were shown in Figure 4. A now. We fed adult flies with specific compounds for five days after eclosion, then measuring the calcium signals of Gr5a+ neurons when flies were fed with sucrose.

      As suggested by the reviewer, we have now conducted calcium imaging experiments more carefully and thoroughly. We have now added the new data into the revised manuscript and the conclusions remained consistent (Figure 4. E). We recorded the calcium signal in the axons of Gr5a+ neurons in the SEZ.

      3) It's unclear whether the decreased Cad expression upon HSD treatment specifically occurred in Gr5a+ neurons or a lot of cells. If the change in gene expression is significant in the qPCR test, it should occur in a large number of cells, most likely including different types of gustatory sensory neurons. If lower cad expression led to lower neural response and thereby lower behavioral response, how to specifically decrease the PER response to sucrose but not to other tastes? -whether HSD-induced desensitization is specific to sucrose in the offspring?

      We agree that Cad expression might decrease in a lot of cells including Gr5a+ neurons in the proboscis. In order to investigate whether taste perception other than sweet sensing was also affected, we conducted PER experiments with fatty acids, which was another type of appetitive taste cues like sugars. Perception of fatty acids is mediated by ionotropic receptors such as ir25a, ir76b, and ir56b (Ahn, et al., eLife, 2017, Brown., et al, eLife, 2021).

      Our results indicate that PER of fatty acid in HSD-F0 and HSD-F1 was not significantly reduced compared to the ND-fed controls (Figure 1-figure supplement 2. E-F). This suggests that the impact of Cad on gustatory sensory neurons might be specific to sweet sensitivity of Gr5a+ neurons.

      4) In Fig.2D, data are sorted for genomic regions showing an up-regulated modification of H3K27me. It's unclear whether similar sorting was performed in panel C. This needs to be clarified.

      The analysis shown in Figure 2C and 2D were linked. As for 2C, we identified genomic loci with enriched H3K27me3, H3K9me3, and H3K27ac peaks, and found that H3K27me3 peaks showed the most robust changes between ND-fed and HSD-F1 flies. Therefore we concentrated on these loci where H3K27me3 modifications were significantly changed between the two groups, and further analyzed their difference. As shown in Figure 2D, within these loci, H3K27ac modifications, which was functionally antagonizing to H3K27me3, were significantly reduced; whereas H3K9me3 signals within these loci remained unchanged. Such results confirmed that ancestral HSD exposure induced robust H3K27me3 modifications in certain genomic loci.

    1. AbstractTransformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep-learning framework for predicting DNA methylation sites, which is based on five popular transformer-based language models. The framework identifies methylation sites for three different types of DNA methylation, namely N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the “pre-train and fine-tune” paradigm. Pre-training is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA-methylation status of each type. The five models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source and we provide a web server that implements the approach.Key points

      **Reviewer 2. Jianxin Wang **

      In this manuscript, the authors present MuLan-Methyl, a deep-learning framework for predicting 6mA, 4mC, and 5hmC sites. They use DNA sequence and taxonomic identity as features, and implement five popular transformer-based language models in MuLan-Methyl. MuLan-Methyl is open-sourced, and a web server is also provided for users to access it. Overall, I think the methodology of MuLan-Methyl is clear and innovative, and the experiments seem comprehensive. However, I do have several concerns that I believe should be addressed before the paper is accepted by GigaScience.

      Major 1. One major concern is that, in my opinion, DNA methylation is dynamic. Cytosines in the same position of the DNA sequence may have different methylation status in different samples, different cells, or even in different development stages of a cell. So, how can we predict the methylation status of a site based on only its sequence (and taxonomic identity)? -- The authors should clarify that in what cases, MuLan-Methyl (as well as other methods that use only DNA sequence) can be used to study DNA methylation, in Introduction or Discussion section. -- The authors discuss motifs in Fig. 3, but only for positive samples. How about the motif distribution in the negative samples? Can I understand that this method is actually for discovering motifs (or sequence structures) that are highly correlated with methylation? -- How is the performance of MuLan-Methyl without taxonomic identity? 2. The authors compared MuLan-Methyl against iDNA-ABF and iDNA-ABT, especially on the independent test set (Fig. 2E). I think the authors should clarify that whether they trained the models of the three methods using the same training datasets. If not, the authors should clarify the reason. 3. I'm curious about the computational efficiency of MuLan-Methyl. How many parameters in its model? Does MuLan-Methyl have advantages over other methods in terms of computational efficiency?

      Minor 1. I don't understand why the references were not ordered from 1 in the main text. 2. I suggest that the authors re-organize the Introduction section. There are too many small paragraphs in this section. 3. At the end of Page 2, "The type 4mC type is present in 4 species" should be corrected.

      Re-review:

      The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method. The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to make them larger (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate the overall survival effect of a change in reproductive effort to be close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival, a key theme in life-history and the biology of ageing, and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than maybe considering the original hypothesis could be false or inflated in importance. The reviewer’s inclination to question the premise of the data in favor of a held hypothesis we consider not necessarily the best scientific approach here. In many places in our manuscript do we question and address issues in the underlying data and interpretation (L101-105, L149-150, 182-185 and L229-233). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival and we are aware that other trade-offs could counter-balance or explain our findings, discussed on L189-191 & L246-253. Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, with trade-offs, there are endless possiblilities of where a trade-off might be incurred between traits. We purposfully focus on the one well-studied and theorised trade-off. We clearly acknowledge though that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have done just that (L250-253).

      So whilst, we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a generalised trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations. We agree that there could be longer-term costs, and so our estimate of the survival cost for manipulated birds is likely to be an underestimate, meaning that our interpretation still holds – the cost to reproduce prevents individuals from laying beyond their optimal level. Note, however, that much theory is build on the immediate costs of reproduction and as such these costs are likely overinterpreted.

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 360-362, where we explicitly state that this is lifetime enlargement. Of course such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of annual costs incurred.

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We will add additional detail to this section in our revised manuscript.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa.

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori. We will address non-linear effects in our revised manuscript.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately made a tailored selection of studies that matched the manipulation studies (L279-282). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the observational component of our analysis and thereby fails to acknowledge the question being addressed in this study.

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 258–260, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds who have larger clutch sizes also live longer, and we suggest this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, but this is the first time we have been asked to put equations in a manuscript rather than explain them in terms that are accessible to a wider audience. Note however that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We do not think we need to repeat such equations and we cite the relevant data. For the simulation, we simply simulated the resulting effects and this is not something that we feel is captured more accurately in equations rather than in text and the associated graphs. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand the reviewer feels the simulations were not explained thoroughly. We will revise our text to see if we can add additional explanation where relevant in our revision.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. We will simplify the terminology and define terms in our revised manuscript.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      We would like to thank the reviewer for bringing to our attention the lack of clarity on the details of our methodology. We will make this more clear in our revised manuscript.

      For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: 1) overall quality effects connecting reproduction and parental survival are present 2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is right however that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L63-65, L85-88 & L237-240), but we do not quantify this as this is dependent on the unknown relationships between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there is some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation. Such information is however not available for all studies and although we explored also analyzing this, currently this is not possible to do with sufficient confidence. We will include this rationale in our revision.

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There has been unexplained difference in temperate (seasonal) and tropical reproduction strategies. Most of our data come from seasonal breeders however. Although there is some variation in second brooding and such often these species only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life-history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies, and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show quality is important and that the effect we find in experimental studies, is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within-species. We do agree that there is a lot more work that can be done in this area. We hope we contribute to this, by questioning this central trade-off. We will try and incorporate some of these suggestions in the revision where possible.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we aim to do so in the revision.

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper discussion and we will add detail to our discussion in our revised manuscript.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there isn’t one. First, and importantly, we do find a trade-off but show this is only incurred when individuals lay beyond their optimal level. Secondly, we also state on lines 258-260 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. To appreciate whether this effect alone can explain why reproduction is constrained, we ran the simulations. From these simulations we find that this effect size is too small to explain the constraint so something else must be going on and we do spend a considerable amount of text discussing the possible explanations (L182-194). Note the possibly most parsimonious conclusion here is that costs of reproduction are not there so we also give that explanation some thought (L201-205 and L247-253).

      We are disappointed by the suggestion that we have dismissed complicating factors which could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We will add further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory. Although we do feel we have addressed this (L248-255).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L248-250). We would also like to highlight that in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L249 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So without a priori knowledge on this we kept our model simple. To test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort likely does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L246-250). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We woud like to thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here.The question we pose is “why all birds don’t lay at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained at such a level? As the reviewer outlines it ranges massively with some birds laying half of what other birds lay.

    1. How willing are we to acknowledge that our institutions, both their structures and cultures, have a history of, and may still in many ways be unsupportive and/or hostile to our students and their communities?

      I completely agree with the following quote, I feel like this relates to the education system a lot for 60+ years when POC started to receive education from schools. I believe there have been positive changes resulting in them attending school. But I also believe that the education system in the US negatively continues to fail them since POC are the minorities in America and before were the poorest people on the planet. And still to his day the education system fails to help under privileged students succeed in these social institutions like in schools. Especially because I feel like the education system has not been changed for years and is outdated and only very recently there has been few changes to change it. But I think we need to reform the policies to help under privileged students attending schools in America by first acknowledging insutions failed POC. And secondly, reform the information. And lastly, create more welcome in groups at schools to help them succeed socially and many other things as well to help students in the future.

    1. If I am understanding this right - I think the potential dilemma that arises from professional versus local archaeology is interesting. Local archeology efforts could provide insight into the past that would've gone unresearched otherwise, but with lower budgets and potentially greater mistakes (due to it not being 'professional quality' (?)) could harm the items in the dig site. Professional archaeology affords research in key places, motivated by economics, politics, or culture, and allows for the use of advanced techniques such as carbon dating. Unfortunately, this may remove the discovery that many of the locals of the area would've likely enjoyed performing (since it is their roots). How are we to weigh preservation, quality, and ethics together to form the idea of 'just' archeology?

    Annotators

    1. Back in 1945, there was this guy, Vannevar Bush. He was working for the US government, and one of the ideas that he put forth was, 00:01:35 "Wow, humans are creating so much information, and we can't keep track of all the books that we've read or the connections between important ideas." And he had this idea called the "memex," where you could put together a personal library of all of the books and articles that you have access to. And that idea of connecting sources captured people's imaginations.
      • for: memex, Vannevar Bush, Indyweb, Ted Nelson
    1. The Science Behind Hydrogen Rich Water Machine

      In the health and wellness world, a fascinating trend has emerged with the rise of hydrogen infused water machine. These innovative devices promise to deliver a refreshing beverage beyond ordinary hydration – hydrogen-rich water. Packed with potential health benefits, the science behind these machines is captivating and sheds new light on how we think about water consumption and its impact on our well-being.

      Hydrogen: The Unsung Hero Of Molecules

      Before delving into the science of hydrogen-rich water machines, it's essential to understand the pivotal role of hydrogen itself. Hydrogen is the lightest and simplest element on the periodic table, consisting of a single proton and an electron. While hydrogen is generally known for its explosive nature, it has recently garnered attention for its potential health benefits when dissolved in water.

      The Power Of Hydrogen-Infused Water

      Hydrogen-infused water, often called hydrogen-rich water, is created when molecular hydrogen gas (H2) is dissolved into plain water. This process typically involves using advanced technologies found in hydrogen-rich water machines. The resulting beverage is touted for its potential antioxidant properties, which could contribute to various health improvements.

      Antioxidant Action: Hydrogen's Hidden Potential

      Antioxidants are essential for neutralizing dangerous chemicals known as free radicals, which may damage cells and contribute to a variety of health problems such as chronic illnesses and ageing. Molecular hydrogen is thought to have antioxidant characteristics that are more effective than well-known antioxidants such as vitamins C and E.

      Hydrogen's unique antioxidant potential lies in its ability to easily penetrate cell membranes and access cellular compartments, including the nucleus and mitochondria. This attribute gives hydrogen an edge in protecting cellular components from oxidative stress, potentially reducing the risk of oxidative damage.

      The Mechanism: How Hydrogen Works Its Magic

      The exact mechanism behind hydrogen's antioxidant effects is still an area of ongoing research, but several theories have been proposed. One prominent theory suggests that hydrogen is a selective scavenger of harmful free radicals, targeting the most reactive and damaging ones without affecting beneficial molecules like oxygen or nitric oxide.

      Another theory is that hydrogen has the power to modify signalling pathways within cells. By altering these pathways, hydrogen may elicit preventive responses that boost the body's natural defence systems against oxidative stress and inflammation.

      Hydrogen-Rich Water Machines: The Technology

      Hydrogen-rich water machines are designed to harness the power of molecular hydrogen by infusing it into plain drinking water. These devices commonly use electrolysis, which involves sending an electric current through water to divide it into hydrogen and oxygen gases. The hydrogen gas is subsequently dissolved in water, yielding a beverage high in this beneficial chemical.

      These machines are equipped with advanced membranes that allow only hydrogen molecules to pass through while preventing the escape of potentially harmful byproducts like ozone. This ensures the purity and safety of the resulting hydrogen-infused water.

      Potential Health Benefits

      While research on the health benefits of hydrogen-rich water is still in its infancy, preliminary studies have shown promising results. Some of the potential benefits include the following:

      Antioxidant Defense: Hydrogen-rich water's antioxidant properties could help reduce oxidative stress and associated health risks. Anti-Inflammatory Effects: Hydrogen may have anti-inflammatory effects that could benefit conditions like arthritis and other inflammatory disorders. Cellular Health: Hydrogen might contribute to overall cellular health and function by protecting cellular components. Exercise Performance: Some research suggests that hydrogen-rich water might enhance exercise performance and reduce muscle fatigue. Conclusion: A Glimpse Into The Future Of Hydration

      Hydrogen-rich water machines are ushering in a new era of hydration, where molecular hydrogen's benefits are harnessed to enhance our well-being potentially. While more research is needed to understand the extent of these benefits fully, the early findings are exciting and have sparked interest among health-conscious individuals.

      As technology advances, we can anticipate more refined hydrogen-infused water machines and a deeper understanding of how molecular hydrogen interacts with our bodies. Whether you're an early adopter or a cautious observer, the science behind these machines invites us to explore the intriguing potential of hydrogen-infused water and its impact on our health.

    1. Reviewer #1 (Public Review):

      Summary:<br /> This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:<br /> The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:<br /> As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      1. The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc. It would be important when comparing scene-specific neural patterns as templates for reinstatement across conditions that, at the time of scene presentation itself, the two conditions are equal (e.g., no difference in familiarity and so on); otherwise, we do not know which trial period (and therefore which underlying process) is driving the differences.

      c. For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well. (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach. (3) I believe the fixation cross with itself is included in the "within category" score. Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1). (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      2. I did not see any compelling statistical evidence for the claim of less robust consolidation in children. Specifically in terms of the behavioural results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioural differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be taken into account when interpreting the findings.

      3. Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately, but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      4. To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      5. Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      6. Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.

      7. Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      8. There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      9. The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      10. In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

    1. Author Response

      We thank the reviewers for their work and their thoughtfulness. However, it seems to us that much (but not all) of the critique reflects a misunderstanding of the goals and methods of computational modeling. Details are below. We are grateful for the opportunity to include our views about this in the context of our replies to the Public Critiques of our paper. The comments of the reviewers were very helpful in allowing us to see what might not be clear to our readers.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. Details are below in the answer to reviewers.

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA.

      We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. In the revision, we will be more explicit about this.

      We thank Reviewer #1 for stressing our work's important contribution to providing concrete hypotheses that can be tested in vivo and highlighting the importance of examining in the future the synergistic role of the interneurons in the BLA in fear learning in the BLA. The weaknesses reported by the Reviewer concern deviations of the model compared to the experimental literature. We describe below why we think those differences are minor in the context of the aims of our model. Specifically,

      1) Some connections among neurons in the BLA reported by (Krabbe et al., 2019) have not been taken into account in the model. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+).

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM+ to PV+ connection is shown to be small (Supp. Fig. 4, panel t). We also omitted PV+ to SOM+, PV+ to VIP+, SOM+ to VIP+, VIP+ to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. See below for comments on modeling strategies. We will explain this better in our revision.

      2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations. This point will be made clearer in our revision.

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. We will discuss these points in more detail in our revision.

      We thank Reviewer #2 for their comments. Below, we reply to each of them:

      1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations. Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2021), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020). We are not aware of any model description of the mechanisms of theta that do require layers.

      2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the above answer we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP+ neurons. Using intrinsic currents known to exist in VIP+ neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. We will emphasize these points in our revision; we note that, for any individual case, such as this one, the mechanism needs to be tested experimentally.

      3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPY-containing neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not been yet done.

      4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption. We will discuss these points in our revision.

      5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      We do not think that VIP is necessarily more fundamental in fear learning, and certainly not for fear extinction. We will make this clear in the revision.

      We thank Reviewer #3 for their comments and for recognizing that we achieved our modeling aims. We reply to the criticisms below.

      Weaknesses:

      The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing. Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      Many of these issues were addressed in the previous responses.

      1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We choose the specific currents, known to be present in these neurons, to replicate those responses.

      2) Though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      3) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. As we described above, modeling requires the omission of many details to bring out the significance of other details.

      4) As described above, some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity. In modeling, one must be specific about hypotheses, and describe why they are plausible, if not proved; indeed, modeling can explain known phenomena by showing how they are consequences of some (plausible) hypotheses, which themselves are open to experimental verification.

      5) The 40 seconds is not necessary if there are multiple presentations.

      Other critiques:

      1) It is correct that PV+ and SOM+ preferentially target different parts of excitatory projection neurons and that the model relies on a strong inhibition from SOM+ and PV+ to silence the excitatory projection neurons. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      2) The SOM+ inhibition of the pyramidal cell firing can be seen as a hypothesis of our model. It is well known that VIP+ cells disinhibit pyramidal cells through inhibition of SOM+ and PV+ cells, which is all we are using in our model; hence this hypothesis is generally believed.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      We agree with the reviewer that it would be good to have a more detailed comparison with the classical Hebbian rule (non-depression-dominated rule). However, we demonstrated in Supplementary Materials that the non-depression-dominated rule is less robust and only operates within a limited window of PV+ excitation. We will have a more robust discussion of plasticity in the revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      The authors report the first use of the bacterial Tus-Ter replication block system in human cells. A single plasmid containing two divergently oriented five-fold TerB repeats was integrated on chromosome 12 of MCF7 cells. ChIP and PLA experiments convincingly demonstrate the occupancy of Tus at the Ter sites in cells. Using an elegant Single Molecule Analysis of Replicated DNA (SMARD) assay, convincing data demonstrate the replication block at Ter sites dependent on the presence of the protein. As an orthogonal method to demonstrate fork stalling, ChIP data show the accumulation of the replicative helicase component MCM3 and the repair protein FANCM around the Ter sites. It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein. The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed. Fork stalling led to a highly localized gammaH2AX response, as monitored by ChIP using primer pairs spread along the integrated plasmid carrying the Ter sites. This response was shown to be dependent on ATR using the ATR inhibitor VE-822. This contrasts with a single Cas9-induced DSB between the two Ter sites, which causes a more spread gammaH2AX response. While this was monitored only at a single distal site, the difference between the DSB and the Tus-induced stall is very significant. Interestingly, despite evidence for ATR activation through the gammaH2AX response, no evidence for phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 could be found under fork stalling conditions. The global replication inhibitor hydroxyurea (HU) elicited phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33. In this context, it would have been of interest to examine if a single DSB in the Ter region leads to phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 and cell cycle arrest. It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response. Overall, this is a well written manuscript, and the data provide convincing evidence that the Tus-Ter system poses a site-specific replication fork block in MCF7 cells leading to a localized ATR-dependent DNA damage checkpoint response that is distinct from the more global response to HU or DSBs.

      Author response to public review:

      “It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein.”

      -The lack of perturbation of the TerB sequence on fork progression has extensively been studied previously in both Willis et al, 2014 and Larsen et. al, 2014. Furthermore, as the detection of the SMARD signal at the TerB sites is dependent on the 7.5kb probe that spans the TerB sites (orange probe, Fig 2B & 2D), it would be impossible to study the effect on replication in this region, with and without the integration of the single copy plasmid.

      “The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed.”

      -The percentage of fork stalling at the TerB sites, with and without Tus expression, has been quantified in Figure 2E & 2F. Essentially, 36% forks stall at the TerB block, i.e. 18% of the forks stall in both the 5’ to 3’ (orange) and 3’ to 5’ (blue) direction when the Tus-TerB block is active.

      “It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response.”

      -While we have not shown gH2AX accumulation via ChIP after HU treatment, Supplementary Figure 5A & 5B clearly show increased gH2AX foci when the cells are treated with HU, suggesting a global replication stress response that is in stark contrast to the response to Tus-TerB.

      Recommendations for the authors:

      Lines 78, 95: In the experimental set-up there are two divergent 5-TerB sites in the orientation that is non-permissive for the fork progression notwithstanding the direction. This raises an obvious question: How an intervening (~1kb-long) DNA segment in being replicated? Does it stay under-replicated and then break?

      -The reviewers pose an important question about how the intervening sequence flanked by the two TerB sites is replicated, and if this leads to formation of anaphase bridges resulting in breaks. We think this is very plausible and this very question is part of ongoing studies in the lab with the aim to understand how the cell resolves a site-specific block. Unfortunately, this falls outside the scope of the current study.

      Also, it is unclear what is meant with non-permissive orientation. This depends on the predominant replication direction. As the construct has Ter repeats in opposite orientation, any direction is non-permissive. These descriptions could be rephrased to avoid confusion

      -The text has been edited to clarify this.

      Fig 1A: It would be helpful to annotate the map to show the position of each primer relative to the Ter array. Why is there no signal for pp52?

      -Figure 1A has the map of the locus with the annotated primer pairs and their relative positions to the TerB array.

      -pp52 is positioned beyond the TerB array so binding of the Tus-His protein there is unlikely, confirming the specificity of the Tus binding to only the TerB array and not to the adjacent chromatin.

      Figure 1B: Change Tus to Tus-His to make it easier to understand that the anti-His ChIP is targeting Tus. Provide information what normalization method was used in the ChIP experiments.

      -Figure 1B has been edited to reflect this change

      Line 113: Willis et al. 2014 also worked with chromosomal Ter sites, which should be acknowledged here.

      The text has been modified to indicate this. We apologize for the oversight.

      Line 126: Define pWB15 and its significance in text.

      -The text has been edited to clarify this and mentions pWB15.

      Figure 2E, F: Define legend (blue, orange boxes and arrow heads).

      -The figure legend corresponding to Figure 2 has a detailed description of the boxes and the arrows.

      Figure 3E, 4C: Add map of primers like in Figures 1 and 2.

      -The map added to Figures 3 & 4 and text updated.

      Figure 4: Showing that the gammaH2AX response is spread like with the single DSB would bolster the conclusion about the difference between a local and global response. Fig 4A, Lane-3: A loading control for the chromatin fraction is missing.

      -Measuring gH2AX chromatin spread after global replication stress can be challenging. We have tried to address the question of global and local gH2AX response post replication stress by quantifying gH2AX foci in cells treated with and without hydroxyurea, comparing it with cells that have a functional Tus-TerB block (Supplementary Figure 5A& 5B). A single fork block seems to only elicit a local response while a global replication stress leads to gH2AX accumulation globally in the cell.

      -Lamin A/C has been added to Fig 4A as a loading control for the chromatin fraction.

      Figure S4: Analyzing ATR, CHK1 and RPA phosphorylation as well as cell cycle profile under single DSB condition may reveal that different localized responses exist. I mention this because it was reported in yeast that a single DSB in G1 cells leads to a similarly localized Mec1 (ATR) -dependent response that does not elicit phosphorylation of Rad53 (CHK1) and other downstream targets, but leads to H2A phosphorylation as well as phosphorylation of RPA and the Rad51 paralog Rad55 (see PMCID: PMC2853130). It might be of interest to the reader to discuss this publication and the commonalities and differences between both localized checkpoint response

      -The reviewers raise an interesting question about the phosphorylation of ATR/CHK1/RPA and its effect on cell cycle after a single DSB. The aim of using the Cas9 break site in this study was merely to corroborate previously published observations pertaining to the spread of gH2AX after a DSB and to contrast that with the local response seen with Tus-TerB. Thus, while an intriguing question, we do not think this particular experiment will help in the understanding of the localized checkpoint response after a single replication fork block. However, we have included the observations previous published in the yeast system (PMC2853130) in our discussion as it helps compare and contrast fork blocks and DSBs further. It is of worth though that the yeast studies were looking at the cellular response to a DSB in G1.

      Lines 256-260: In the discussion of ATRIP, unpublished data are discussed that show no increase in ssDNA. What is the effect of ATRIP depletion? Maybe delete this mention of unpublished data, if no new data can be provided. The authors are aware that this makes the mechanism of ATR activation at the 5-TerB site elusive.

      -This statement has been deleted and the text has been modified.

      Another possibility discussed by the authors is fork reversal. Since Tus/Ter complex block the CMG progression, fork reversal would result in a chicken foot structure with the long single-stranded 3'-overhang of an Okazaki fragment site. Such a structure should be protected by BRCA2 or RAD52 proteins from degradation. Any role for these proteins in the checkpoint activation at the TerB site?

      -The reviewers suggest an interesting scenario where the Tus-TerB block induced reversed fork structure could be protected by the loading of known DNA repair proteins and this in turn could lead to a signaling mechanism and checkpoint activation. While we have not tested this hypothesis, nor studied the temporal dynamics of the formation if the reversed fork with respect to gH2AX accumulation, we think the localized gH2AX signal observed in the vicinity of the block is what initiates the downstream DDR response, promoting fork stabilization, followed either by fork reversal and restart or fork collapse. If the reversed fork was responsible for the gH2AX signaling, one would envision the spread to be more widespread, perhaps decorating the entire stretch of DNA between the block and the reversed fork. However, further studies are warranted to tease out this mechanism and the spatio-temporal dynamics.

      Lines 292-294: The authors state that "unpublished work from our laboratory has demonstrated that replication forks are cleaved at or near the TerB site..." Unless the data are shown, it might be best to eliminate discussion of unpublished work, also because the occurrence of DNA ends at Ter sites was already described in Willis et al. 2017.

      -The statement has been deleted and Willis et al. 2017 has been referenced.

      Suppl Table 1: It would help to also show representative images of stretched fibers in addition to the summary data shown.

      -Since the data is negative, the fiber images do not show any discernible differences and we do not think it adds useful information.

      Suppl Fig 4. ChIP for gamma H2AX data. It would be helpful to show the distribution of the gamma H2AX signal along the chromosome for both the DSB response and the Tus/Ter response.

      -The gH2AX ChIP signal at PP0-2 and PP10 has been included in Supplementary Fig4D. Though not significant for PP0-2, the data strongly suggests that there is increased spread of gH2AX along the chromosome after a DSB, strongly contrasting with the response after Tus-TerB block. The text has been modified to include both primer pairs.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer comments

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary __

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      __Major comments __

      __This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. __

      __An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains. __

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      __In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent. __

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      __Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies. __

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      __With ____the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. __

      __The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding. __

      We thank the reviewer for these comments on the importance of this study.

      __Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared? __

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      __The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative. __

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      __Minor comments __

      __Should the Title include SNRNP27K? __

      We have included SNRNP27K into the revised title.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      __Figure 1C is not referred to in the text? __

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      __What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? __

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      __The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..” __

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance (Required)):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      __The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4? __

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      __The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity. __

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. __ Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.__  We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      __ Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.__ 

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      __ Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.__ 

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      __ In the text, there is no reference to Figure 1C.__ 

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      __ In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.__ 

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance (Required)):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      __3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. __

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      __4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? __

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      __Reviewer #3 (Significance (Required)): __

      __This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. __

      We thank the reviewer for the kind and constructive comments on this study.

    2. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      Major comments

      This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. 

      An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains.

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent.

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies.

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      With the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. 

      The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding.

      We thank the reviewer for these comments on the importance of this study.

      Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared?

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative.

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      Minor comments

      Should the Title include SNRNP27K?

      There is certainly a case that the title should include SNRNP27K. Our aim was to make the title as short and informative as possible without too many acronyms that need explaining. Since the clearest correlation is with METTL16 and this has broader implications for understanding the role of this enzyme not only in splicing but in possibly modifying other RNA targets too, we think not including SNRNP27K is a suitable compromise. In addition, retaining the current title simplifies the tracking of the manuscript from pre-print through to journal publication.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      Figure 1C is not referred to in the text?

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? 

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..”

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4?

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity.

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.

      We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      1. Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      1. Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      1. In the text, there is no reference to Figure 1C.

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      1. In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. 

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? 

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      Reviewer #3 (Significance):

      This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. 

      We thank the reviewer for the kind and constructive comments on this study.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1*. This is a good paper dealing with gap of our knowledge in understanding reason of ICB failures. Subject being difficult it is expected that the design and content of such experiment will be complex.But the authors forget practicality of readers attention and making paper apear interesting. They need to organise and may be classify the varied information in such a way that reader can find a rhythm in excavating data more easily. It appears confusing at time, so they may try to make it more simple. In this way they may concentrate more on methods and classify results too. A thorough revision is suggested, to make it consize. *

      __Authors’ answer: __We thank the Reviewer for his positive evaluation and constructive feedback. We appreciate the complexity of single-cell RNA-sequencing analyses. In order to simplify our manuscript, our revised manuscript now focuses on the transitional states of tumor-resident and circulating T cells found in ovarian cancer patients. Our study is timely as it is the first to report the developmental relationship of TILs in ovarian cancer. We substantially edited our manuscript to make it clear that our findings suggest a gradual acquisition of the exhaustion program initiated by effector-like cells (cluster CD8_GZMH) that eventually gives rise to more terminal states with features of tissue residency and chemotaxis (clusters CD8_CCL4, CD8_XCL1, and CD8_CXCL13). We also include new analyses revealing the presence and proportion of these T cell states in different cancer patients (New Fig. 4A-B), and how these T cell states associate with clinical responses to immune checkpoint blockade (ICB). We hope the Reviewer will find our revised manuscript easier to read.

      Reviewer #2. I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Author’s answer: Thank you for discussing the limitation of the signature employed. We agree with the reviewer’s comment. Old Figure 5 has been removed from the revised manuscript.

      Reviewer #2. Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Author’s answer: Thank you for your comment. As we previously discussed, we have removed Figure 5 from the revised manuscript. The method used to generate the signature was found to be inappropriate.

      Reviewer #2. The latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Author’s answer: Although immune checkpoint blockade has demonstrated limited effectiveness against ovarian cancer, subset analyses suggest superior efficacy for some patients and according to subtype. Combination anti-PD-1/CTLA-4 therapy for instance achieved response rates up to 31% (Zamarin et al., 2020), and superior benefit for single agent PD-1 blockade has been reported in clear cell ovarian cancer. Moreover, encouraging clinical results have recently been reported in studies exploring combinations with PARP and VEGF inhibitors. As example, interim analysis of the phase 3 DUO-O trial (NCT03737643) showed a statistically significant and clinically meaningful improvement in PFS in patients with newly diagnosed advanced ovarian cancer without a BRCA1/2 mutation (Harter et al., 2023).

      Our study aimed to better understand how ovarian tumor-infiltrating T cells acquire their exhaustion program after migrating from the periphery and whether these mechanisms are unique or shared amongst cancer types. Recent studies in other cancer types had shown the dynamics of T cells and demonstrated the clonal replacement of intratumoral T cells after ICB and emphasized the role of peripheral clones in this process (Wu et al., 2020; Yost et al., 2019). In lung cancer, it has been proposed a transitional state between precursor and terminally differentially cells (Gueguen et al., 2021). Our study demonstrates, for the first time in ovarian cancer, the presence of similar transitional states of CD8 T cells. Our revised manuscript also now includes new data revealing that pre-effector GZMK- and intermediary GZMH-expressing CD8 cells are better biomarkers of ICB response than terminally differentiated XCL1 and CXCL13 expressing CD8 T cells (New Figure 4). Altogether, our study provides important and novel insights on the development of tumor-infiltrating T cells in ovarian cancer patients, which may serve to better select ovarian cancer patients for ICB therapy.

      Reviewer #2. Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate (GZMH_CD8), including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease since the study only analyzed a sample of 5 patients.

      Author’s answer: We thank the reviewer for proposing to extend our analysis. As suggested, our revised manuscript now includes new analyses which reveals the different proportions of our identified T cells states in different cancer patients (New Figure 4). We further investigated whether these T cell states associate with clinical responses and observed that pre-effector GZMK- and intermediary GZMH-expressing CD8 T cells are better biomarkers of ICB response than terminally exhausted XCL1- and CXCL13-expressing CD8 T cells (New Figure 4).

      Reviewer #3. Question 1: Whether the distribution patterns of CD4+ and CD8+ T cell clusters in Figure 1B were comparable among the 5 patient samples? Whether the proportion of five types of clones in Figure 3C are comparable among the 5 patient samples?

      Author’s answer: Thank you for the question. We included the results to answer these questions in the supplementary material (fig. S1C-D). For each patient, we calculated the proportion of a cluster among T cells in the blood or tumor. As observed in the boxplot (fig. S1C), the proportion of some subsets were higher in certain patients, such as the higher proportion of CD8_GZMK in the tumor of patient p09454. A recent study classified patients’ tumors based on the spatial distribution of CD8 T cells and performed scRNA-seq to identified cell subsets enriched in the groups inflamed/infiltrated (characterized by the distribution of CD8 T cells within the tumor epithelium), excluded (infiltrating CD8 T cells are restricted to the tumor stroma) or desert (T cells are not present or have low frequency) (Hornburg et al., 2021). Interestingly, this subset of CD8_GZMK cells were enriched in desert tumors, suggesting that the difference we observed in our dataset might reflect the spatial distribution of CD8 T cells in patient p09454. Regarding the TCR-seq data, the frequency of the five types of clones was different among patients. To show this data, we included a barplot (fig. S2D), showing for example, a higher proportion of tumor-expanded clones in patient p10329.

      Reviewer #3. Question 2: In Figure S2C, only a very small number of cells in the CD8-GZMK K-22 population. Are these cells representative? Do they generally exist in multiple samples or only in one sample?

      Author’s answer: Thank you for your comment. The subcluster k_22 indeed has a smaller number of cells compared to other subclusters. Nevertheless, the K_22 cluster was found in every patient and in every healthy donor. To clarify, we edited our revised manuscript to include a statement that cluster k_22 was composed of fewer cells compared to other clusters.

      Reviewer #3. Question 3: In the Fig.S6 legend, the authors stated "Our results suggest the differentiation of cluster CD8-GZMK into the effector-like subset CD8-GZMH." However, there seems to be no corresponding analysis in the main text to support this conclusion.

      Author’s answer: We appreciate your attention to this statement. We agree the results of our study doesn’t sustain this statement and so we have excluded it in the revised manuscript.

      Reviewer #3____. Question 4: Is there more detailed clinical information that can be provided for the 5 patients included in the study? Per the methods all patients were receiving debulking surgery and were treatment naïve, but did they differ in stage, age, comorbidities, etc.?

      Author’s answer: Thank you for your comment on this. We have included a table with clinical information on the stage, age, and menopause status of the five patients.

      Reviewer #3. Question 5: Were any cells included for sequencing from adjacent 'normal' tissue uninvolved with tumor (these samples are from surgical debulking of primary tumors, which may include such areas of non-involved tissue.) While shared TCR clonotypes between blood and intratumoral T cells strongly suggests the tumor-resident populations are recruited from the blood, the degree of sharing with normal tissue-resident T cells would be of interest as well.

      Author’s answer: Thank you for your comment. Samples were provided for sc-RNA-seq after pathology review and validation of tumor histology. We did not perform sc-RNA-seq on normal adjacent tissue (NAT) We agree this would be interesting as a follow up study, since in other cancer types (renal, colon and lung) it has been demonstrated that T clones expanded in the tumor and NAT are also present in peripheral blood (Wu et al., 2020).

      Reviewer #3. Question 6: Very little is discussed about HGSOC itself in the main text (eg clinical background, prior literature on the composition of infiltrating immune populations and potential reasons for at best modest poor responses to IO) until the first sentence of the discussion. As the entirety of the new data produced in this study is from HGSOC tumors there should be more focus on this tumor type and conversation with the prior literature on it (mainly from prior studies on the immune environment of HGSOC). Further, how distinct do the authors suspect the cell populations found in their study to be to ovarian as opposed to other epithelial tumor types?

      Author’s answer: Thank you for the suggestion. We now included more background information on immunotherapy of HGSOC. Specifically, we added the following paragraph in our introduction: “In ovarian cancer, the presence of both T and B cells improves patients' survival (Nelson, 2015; Nielsen et al, 2012). They are usually organized in lymphoid aggregates ranging from a small group of cells to a well-organized TLS (Kroeger et al, 2016). Organized TLSs correlate with better survival, such as observed in patients treated with ICB. Although immunotherapy has demonstrated limited effectiveness against ovarian cancer, subsets of patients may thus benefit from ICB. In support of this, combination anti-PD-1/CTLA-4 therapy can achieve response rates above 30% (Zamarin et al., 2020), and encouraging clinical results have recently been reported when combining ICB with with PARP and VEGF inhibitors (Harter et al., 2023)”.

      Reviewer #3. Question 7: Were the signature genes used for analysis in figure 5 remove chosen in a formal, unbiased manner, or simply hand-picked as representative of the respective cell types? This information is not provided in the supplement.

      Author’s answer: Another reviewer has also expressed similar concerns. The genes selected to represent cell types were chosen manually, which we acknowledge is not the best method for defining a signature. As a result, we have decided to exclude Figure 5 from the manuscript under review. We believe an unbiased approach is more suitable for characterizing the cell network proposed in our study.

      Reviewer #3. Question 8: While the NicheNet analysis of potential interactions among lymphocyte populations raises some strong hypotheses, it would be interesting to extend the interaction analysis to all CD45+ populations, given the sequencing was done on CD45+ immune cells.

      Author’s answer: Thank you for suggesting analysis. We have included the results of cell interaction including all CD45+ cells (fig. S3). We observed CD40L as one of the top predicted ligands highly expressed in CD4_CXCL13 subset mediating a response in subsets of antigen-presenting cells, such as B cells (cluster B), plasma cells (cluster PC_2), and plasmacytoid dendritic cells (cluster pDC). Interestingly, this result also support the hypothesis of Tfh-like cells (cluster CD4_CXCL13) coordinating the action of intratumoral immune cells involved in the antitumor immune response.

      Reviewer #3. Question 9: A sample size of 5 patients is relatively small for current single cell RNAseq studies of human tumor patients.

      Author’s answer: We agree with the reviewer that a sample size of 5 patients is relatively small. Thus, to validate our results in other patients, we included in the reviewed manuscript the analysis of scRNA-seq of 47 patients across10 cancer types (dataset from (Zheng et al., 2021). As demonstrated in figure 3 and figure 5, we could identify subsets of CD8 and CD4 T cells from our ovarian cancer patients in those 10 cancer types dataset.

      Reviewer #3.____ Minor

      *1. In lines 96-97, "CD8-GZMB" was mentioned twice in the description. *

      2. In line 126, this section did not discuss residency markers, yet a conclusion about residency was made in this sentence.

      Author’s answer: We appreciate you bringing these errors to our attention. We fixed them in the updated version of the manuscript.

      References:

      Gueguen, P., Metoikidou, C., Dupic, T., Lawand, M., Goudot, C., Baulande, S., … Amigorena, S. (2021). Contribution of resident and circulating precursors to tumor-infiltrating CD8 T cell populations in lung cancer. Science Immunology, Vol. 6, p. eabd5778. doi:10.1126/sciimmunol.abd5778

      Harter, P., Trillsch, F., Okamoto, A., Reuss, A., Kim, J.-W., Rubio-Pérez, M. J., … Aghajanian, C. (2023). Durvalumab with paclitaxel/carboplatin (PC) and bevacizumab (bev), followed by maintenance durvalumab, bev, and olaparib in patients (pts) with newly diagnosed advanced ovarian cancer (AOC) without a tumor BRCA1/2 mutation (non-tBRCAm): Results from the randomized, placebo (pbo)-controlled phase III DUO-O trial. Journal of Clinical Orthodontics: JCO, 41(17_suppl), LBA5506–LBA5506.

      Hornburg, M., Desbois, M., Lu, S., Guan, Y., Lo, A. A., Kaufman, S., … Wang, Y. (2021). Single-cell dissection of cellular components and interactions shaping the tumor immune phenotypes in ovarian cancer. Cancer Cell. doi:10.1016/j.ccell.2021.04.004

      Wu, T. D., Madireddi, S., de Almeida, P. E., Banchereau, R., Chen, Y.-J. J., Chitre, A. S., … Grogan, J. L. (2020). Peripheral T cell expansion predicts tumour infiltration and clinical response. Nature. doi:10.1038/s41586-020-2056-8

      Yost, K. E., Satpathy, A. T., Wells, D. K., Qi, Y., Wang, C., Kageyama, R., … Chang, H. Y. (2019). Clonal replacement of tumor-specific T cells following PD-1 blockade. Nature Medicine. doi:10.1038/s41591-019-0522-3

      Zamarin, D., Burger, R. A., Sill, M. W., Powell, D. J., Jr, Lankes, H. A., Feldman, M. D., … Aghajanian, C. (2020). Randomized Phase II Trial of Nivolumab Versus Nivolumab and Ipilimumab for Recurrent or Persistent Ovarian Cancer: An NRG Oncology Study. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 38(16), 1814–1823.

      Zheng, L., Qin, S., Si, W., Wang, A., Xing, B., Gao, R., … Zhang, Z. (2021). Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science, 374(6574), abe6474.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: This study used single-cell transcriptomics and T cell receptor profiling to identify the developmental relationships of T cell populations in ovarian cancer patients. The researchers proposed a model of differentiation pathway that showed how an intermediate GZMH-expressing CD8 T cell subset progressively reinforces exhaustion and tissue residency programs towards terminally exhausted cells. Then they also focus on the nature of TPEX, dual-expanded clone, which is considered an important indicator for the efficacy of ICB, and argue that it is strongly related to GPR183, 25-OHC, and IL21. Based on the analysis of clinical samples, they argue that their proposed gene signature may also be prognostically relevant and predictive of ICB efficacy.

      Major comment: I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Significance

      It is interesting to note that the authors simultaneously analyze immune cells in the blood and in the tumor, and examine in detail what is characteristic of the blood, what is characteristic of the tumor, and what is seen in both. And it is very interesting that they specifically proposes an intermediate group that is recruited from the blood to the tumor and is in the process of becoming exhausted. I am sure there are many studies on TILs and TLSs, but this study would be helpful to understand how they are concentrated locally (near the tumor) in comparison with immune cells in the blood as well.

      However, the latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate state, including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease, since the study only analyzed a sample of 5 patients.

    1. Reviewer #2 (Public Review):

      Kraus, Aurora et al. investigated the potential immune response of the olfactory bulb after exposure of the infectious hematopoietic necrosis virus (IHNV), via the olfactory epithelia. Specifically, they show that a) viral-specific neuronal activation of "OSNs" (Crypt cells), b) changes in behaviour of both adult and larval zebrafish after viral exposure, c) Pituitary adenylate-cyclase-activating polypeptide (PACAP), was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV

      Although the paper does have strengths in principle, the weaknesses of the manuscript are that these strengths are not directly demonstrated and the referencing of the manuscript omits many references important for the understanding of the questions and the results of the study. Furthermore, the data presented are not sufficient to fully support the key claims in the manuscript. In particular:

      a) Viral-specific neuronal activation of OSNs:<br /> What type of neurons? The authors are a bit elusive and do not clearly state that the neurons are crypt cells (Sepahi et al.: rainbow trout) which have a very specific axonal projection to the brain and whose response characteristics are not well characterized (see work of Korsching lab). Crypt cells are not present in the olfactory epithelia of mammals. Furthermore, in their previous work the crypt cells die; so how do they think the (inflammatory) virus response is transmitted to the olfactory bulbs in order to protect the brain?<br /> The authors state from previous work that they never detected virus in the brain, but why would they? Does INHV move trans-synaptically?<br /> The neuronal activity was monitored using a pan-neuronal marker thus these data are of limited use when trying to understand the role of neuronal activity (crypt cells) in the IHNV-triggered activity: the authors may be looking at a generalized inflammation response, and the image presented is not particularly informative it is difficult to decipher the results. The authors assume IHNV is an odorant without carefully ruling out the possibility of a generalized inflammation response.<br /> b) Changes in behaviour of both adult and larval zebrafish after viral exposure:<br /> What is the motivating question for looking at behaviour of the virus infected animals? Do we know the effects of crypt cell loss on the behaviour in any fish species? Authors need to build a better conceptual framework for the behaviour experiments.

      c) Pituitary adenylate-cyclase-activating polypeptide (PACAP) was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV. Authors draw many generous conclusions from limited data. Authors seem to have forgotten to cite papers previously published showing that PACAP-38 has anti-viral activities in fish (VHSV: trout) such as: Velasquez et al 2020, First in vivo evidence of pituitary adenylate cyclase-activating polypeptide antiviral activity in teleost.<br /> The histology for PACAP presented in the manuscript is not convincing. The antibody is against the human form of PACAP thus any labelling should be treated with caution (and called PACAP-38-like).

      Summary: The authors need to better develop their model (perhaps a diagram would be helpful) explaining exactly which neurons are transmitting the information. Because of the elusive nature of some referencing and the skirting of important issues such as clearly stating which neurons are affected (crypt cells), what the point of the behaviour is (relate to neuronal type infected by virus), and, the lack of an antibody specific to the zebrafish protein, the model appears to be built on an unstable base.

    1. Joint Public Review:

      In this manuscript, the authors challenge the fundamental concept that all neurons are derived from ectoderm. The key points of the authors argument are as follows:

      1) Roughly half of the cells in the small intestinal longitudinal muscle-myenteric plexus (LM-MP) that express a pan-neuronal marker do not, by lineage tracing, appear to be derived from the neural crest.

      2) Lineage tracing and marker gene imaging suggest that these non-neural crest derived neurons originate in the mesoderm, leading to their designation as mesodermal-derived enteric neurons (MENs).

      3) Single-cell sequencing of LM-MP tissues confirms the mesodermal origin of MENs.

      4) MENs progressively replace neural crest derived enteric neurons as mice age, eventually representing the bulk of the EN population.

      There is broad agreement among the reviewers that the identification and description of this cell population is important, and that the failure of these cells to be labeled by neural crest lineage tracers is not artifactual. The work with transgenic lines is convincing that some presumptive neurons in the enteric nervous system (ENS) likely originate from an alternative source in the postnatal intestine and that this population increases in aging mice.

      There is, however, ongoing disagreement between the authors and reviewers about whether the authors' provocative and potentially paradigm-changing proposal that these are neurons of mesodermal origin has been established. While the authors believe they have addressed the reviewers' concerns in multiple rounds of review (much of this prior to submission), the reviewers remain unconvinced and continue to request additional data and analyses.

      A key premise of the preprint review system is that the best interests of science are not served by endlessly litigating disagreements around papers by either compelling the authors to do extensive and expensive additional experiments that they do not believe to be necessary or by treating the authors' claim as established in the face of continued skepticism. Accordingly the editor believes it is time to present this work, which everyone agrees contains important observations and valuable data, along with the following editor's synthesis of the reviewers' concerns and author responses about the question of these cells' origins. We encourage anyone interested in the details to review the already posted reviews and authors' response.

      The following key issues have been raised during review:

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin?

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments?

      * Does the genomic data establish that the sub-population of cells the authors focus on are of mesodermal origin?

      * Are there alternative explanations for the lineage tracing and genomic observations than a mesodermal origin?

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin? *

      The proximal evidence that the authors present for a mesodermal origin of the non-NC derived cells is presented in Figure 2, which establishes the presence, via lineage tracing of Tek+ and Mesp1+ (and therefore mesoderm derived) and Hu+ (and therefore neuronal) cells. The fraction of lineage labeled cells in each case (~50%) corresponds roughly to the fraction of cells that do not appear to be NC derived.

      The reviewers raise several technical questions about the lineage tracing experiments, including issues of incomplete labeling, ectopic labeling and toxicity. The authors have addressed each of these with data and/or citations, and the editor believes they have demonstrated, subject to the broader limits of lineage tracing experiments, that there are Hu+ cells in the tissue that are derived from cells that do not express NC markers and that do express mesodermal markers.

      One reviewer raised the question of whether these cells are neurons. This appears to the editor to be a valid question, in that specific neuronal activity of these cells has not been established. But the authors' argument is persuasive that their Hu+ state would have led them to be designated neurons and that changing that designation based on not being derived from NC is circular. However the possibility that, despite this accepted designation, these cells are not functionally neurons should be noted by readers.

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments, and does this data establish mesodermal origin? *

      To provide orthogonal evidence for the presence of mesodermally derived enteric neurons, the authors carried out single-cell sequencing of dissociated cells from hand-dissected longitudinal muscle - myenteric plexus (LM-MP) tissue. They use standard methods to identify clusters of cells with similar transcriptomes, and designate, based on marker gene expression, two clusters to be neural crest derived enteric neurons (NENs) and mesoderm derived enteric neurons (MENs). However the reviewers raised several issues about the designation of the cells MENs, and therefore their equation with the cells identified in lineage tracing.

      While the logic behind specific choices made in the single-cell analysis is not always clear in the manuscript, such as why genes not-specific to MENs were used to identify the MEN cluster and how genes were selected for subsequent analysis (although both issues are explained better in the authors' response to reviewers), they in the end identify a single large cluster that has the characteristics of MENs (it expresses both neuronal and mesodermal markers) that is (by immunohistochemistry) broadly associated with the previously described tissue MENs.

      The standard methods for the delineation of clusters in single-cell sequencing data (which the authors use) are stochastic and defy statistical interpretation, and the way these data and analyses are used is often subjective. The editor shares the reviewers' confusion about aspects of the analysis, but also finds the authors' assertions that they have described a cluster of cells that express both neuronal and mesodermal genes, and that this cluster corresponds to the tissue MENs described in lineage tracing, to be broadly sound.

      The biggest weakness in the single-cell data and analysis - identified by all reviewers - is the massive overrepresentation of MENs relative to NENs. The authors' explanation - that some cells are more sensitive to manipulations required to prepare cells for sequencing - is certainly well-represented in the literature and is therefore plausible. But it isn't fully satisfactory, especially because it undermines the notion that the MENs and NENs are functionally equivalent (though one could argue in response that increased fragility of NENs is why they are progressively replaced by MENs).

      There are many additional questions about the single cell analysis that are difficult to resolve with the data in hand. I think everyone would agree that an ideal analysis would have more cells, deeper sequencing, and comprehensive validation of the identity of each cluster of cells. But given the time and expense required to carry out such experiments, we cannot demand them, and must take the data for what they are rather than what they could be. And in the end, it is the editors' view that these data and analyses bolster the authors' claims, without conclusively establishing them. That is, these data should neither be dismissed nor, on their own, considered definitive.

      * Are there alternative explanations for the data than that they are mesodermally derived neurons? *

      As discussed above, the reviewers generally agree that the lineage tracing experiments are careful and well-executed, and the authors have provided data that demonstrates that the data are highly unlikely to be due to either incomplete or ectopic lineage marking. The reviewers raise several possible alternative hypotheses, some based on the literature and some based on the genomic data. The authors discuss each in detail in their response. The editor would note that, at this stage in the history of single-cell analysis, the criteria for using single cell sequencing data to establish cell type and cell origin is are not well established, and that neither the presence nor absence of specific sets of genes in single cells should not, for both technical and biological reasons, be considered dispositive as to identity.

      * Additional aspects of paper: *

      There are additional intriguing aspects of the paper, especially the increase in the number of MENs relative to NENs over time, suggesting functional replacement of one population with the other, and some evidence for and speculation about what might be regulating this evolution. However these are somewhat secondary points relative to the central question at hand of whether the authors have discovered a population of mesodermally derived neurons.

      * Editor's summary and comment: *

      The editor believes it is a fair summary to say that the authors believe they have gone to great lengths to provide multiple lines of evidence that support their hypothesis, but that these reviewers, while appreciating the potential importance of the authors' discovery of an unusual cell type, are not yet convinced of its origin.

      In an ideal world, the authors, reviewers and editor would all ultimately agree on what claims the data presented in a paper supports, and indeed this is what the traditional journal publishing system tries to achieve. But the system fails in cases like this where no consensus between authors and reviewers can be reached, as it neither makes sense to "accept" the paper and imply that it has been endorsed by the reviewers, nor to "reject" it and keep the work in peer review limbo.

      There is certainly enough here to warrant the idea and the data and arguments behind it being digested and considered by people in the field. It may very well be that the authors - who have spent years working on this problem and likely know more about this population of cells than anyone on Earth - are right that they have discovered something that changes how we think about the development of the nervous system. To the extent the reviewers are representative, people are likely to need additional data to be convinced. But it is time to put that to the test.

    1. Reviewer #3 (Public Review):

      This manuscript aims to exploit experimental measurements of the extracellular voltages produced by colliding action potentials to adjust a simplified model of action potential propagation that is then used to predict the extracellular fields at axon terminals. The overall rationale is that when solving the cable equation (which forms the substrate for models of action potential propagation in axons), the solution for a cable with a closed end can be obtained by a technique of superposition: a spatially reflected solution is added to that for an infinite cable and this ensures by symmetry that no axial current flows at the closed boundary. By this method, the authors calculate the expected extracellular fields for axon terminals in different situations. These fields are of potential interest because, according to the authors, their magnitude can be larger than that of a propagating action potential and may be involved in ephaptic signalling. The authors perform direct measurements of colliding action potentials, in the earthworm giant axon, to parameterise and test their model.

      Although simplified models can be useful and the trick of exploiting the collision condition is interesting, I believe there are several significant problems with the rationale, presentation, and application, such that the validity and potential utility of the approach is not established.

      Simplified model vs. Hogdkin and Huxley<br /> The authors employ a simplified model that incorporates a two-state membrane (in essence resting and excited states) and adds a recovery mechanism. This generates a propagating wave of excitation and key observables such as propagation speed and action potential width (in space) can be adjusted using a small number of parameters. However, even if a Hodgkin-Huxley model does contain a much larger number of parameters that may be less easy to adjust directly, the basic formalism is known to be accurate and typical modifications of the kinetic parameters are very well understood, even if no direct characterisations already exist or cannot be obtained. I am therefore unconvinced by the utility of abandoning the Hodgkin-Huxley version.

      In several places in the manuscript, the simplified model fits the data well whereas the Hodgkin-Huxley model deviates strongly (e.g. Fig. 3CD). This is unsatisfying because it seems unlikely that the phenomenon could not be modelled accurately using the HH formulation. If the authors really wish to assert that it is "not suitable to predict the effects caused by AP [collision]" (p9) they need to provide a good deal more analysis to establish the mechanism of failure.

      (In)applicability of the superposition principle<br /> The reflecting boundary at the terminal is implemented using the symmetry of the collision of action potentials. However, at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate where the extracellular field is one objective of the modelling, as here. I believe this assumption is not problematic for the calculation of the intracellular voltage, because extracellular voltage gradients can usually be neglected, but the authors need to explain how the issue was dealt with for the calculation of the extracellular fields of terminals. I assume they were calculated from the membrane currents of one-half of the collision solution, but this does not seem to be explained. It might be worth showing a spatial profile of the calculated field.

      Missing demonstrations<br /> Central analytical results are stated rather brusquely, notably equations (3) and (4) and the relation between them. These merit an expanded explanation at the least. A better explanation of the need for the collision measurements in parameterising the models should also be provided.

      Adjusted parameters<br /> I am uncomfortable that the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately. With a variation of more than 20-fold reported between the different models in Appendix 2 we can be sure that some of the models are based upon quite unrealistic physical assumptions, which in turn undermines confidence in their generality.

      p8 the values of both the extracellular (100 Ohm m) and intracellular resistivity (1 Ohm m) appear to be in error, especially the former.

      (In)applicability to axon terminals<br /> The rationale of the application of the collision formalism to axon terminals is somewhat undermined by the fact that they tend not to be excitable. There is experimental evidence for this in the Calyx of Held and the cerebellar pinceau. The solution found via collision is therefore not directly applicable in these cases.

      Comparison with experimental data<br /> More effort should be made to compare the modelling with the extracellular terminal fields that have been reported in the literature.

      Choice of term "annihilation"<br /> The term annihilation does not seem wholly appropriate to me. The dictionary definitions are something along the lines of complete destruction by an external force or mutual destruction, for example of an electron and a positron. I don't think either applies exactly here. I suggest retaining the notion of collision which is well understood in this context.

    1. Author Response

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript. We have made the aesthetic changes requested by Reviewer 2.

      Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Concern 1

      In paragraph 4.2, I found it unclear why the authors find it unsurprising that different experiments would correspond to different betas. I think that this point should be discussed, as beta and N appear in combination in determining the interaction strength. Otherwise, they could try to fit all distributions with the same beta, which would be more natural for me. I guess that the fits would be anyway good to the eye, though quantitatively suboptimal (which could be quantified with the distance introduced).

      The reviewer raises valid concerns since as shown in Fig 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. We (RS, OM, and OP) find it intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. We acknowledge that we do not have an explanation why the fitted parameters values are what they are but note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. While further agent-based simulations could explore these findings more systematically, we believe that investigating this matter is outside the scope of this paper. Instead, we have acknowledged these points explicitly in the revised discussions.

      Portion added to discussions: “As shown in Fig. 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. Perhaps it is intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. While we do not currently have an explanation for why the fitted parameter values are what they are, we note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. Further agent-based simulations could explore these findings more systematically, and provide useful insights.”

      Concern 2

      Citation of previous work on dynamical quorum sensing (lines 51 & 52) I think misses two important points: first these works (and others following them) deal with the appearance of collective oscillations at high density (therefore, the same general problem addressed here); second, Taylor et al. studied also a transition where the oscillators involved did not oscillate at low density, whereas above a density threshold, they display coherent collective oscillations whose period decreases with density - similar to what observed here. I do not think this takes anything away from the originality of this work, which refers to a different system, and models it with different equations, but the parallelism between integrate-and-fire dynamics with quenched noise and excitable dynamics in the presence of noise should in my opinion not be overlooked.

      We have explicitly mentioned this in the revised text.

      Concern 3

      As the authors stress in lines 105 and 132, the analytical model shows that all that really matters in this phenomenon is the fastest frequency of the system. This could be used as an argument to say that the actual frequency distribution of individual fireflies is not all that important, as long as their fastest frequency is comparable. The assumption that they are identical would then sound less radical. Ideally, one could use the numerical simulations to check this, as well as the fact that the phenomenon does not break down when the shortest individual interburst interval Tbmin is narrowly distributed (which could also explain why having a few individuals who can flash at a higher frequency does not affect the outcome).

      We thank the reviewer for these observations.

      Concern 4

      I still feel that the agreement between the model and observations is a bit overstated (line 120). At least, I think the authors may stress that whereas the model predicts that the frequency of the 7-14 minutes oscillations should increase a lot with N, this is not observed in the data. Maybe this mismatch would be reduced if inter-individual variability was added.

      Please see the last three paragraphs of the discussion section. In reality, as the swarm size increases, we expect that swarms will no longer be all-to-all connected, and the dynamics of the system will depend upon the speed of propagation of information across the swarm. Precisely how this happens is outside of the scope of the current experimental work and theoretical description presented here.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In the manuscript entitled "Aurora A mediated new phosphorylation of RAD51 is observed in Nuclear Speckles", the authors unveil the Serine S97 as a novel phosphorylation site of the RAD51 recombinase and that this phosphorylation is mediated by the Aurora A kinase using a set of in vitro and in cellulo experiments. The authors also describe this phosphorylation being in the nucleus specifically in nuclear speckles where mRNA maturation and splicing occurs suggesting a role of RAD51 in the latter. The confocal microscopy images provided to test this hypothesis are convincing. However, using confocal images as well, the authors claim that RAD51 phosphorylated at S97 foci do not colocalize with the DNA damage marker -H2AX, hence a function not related to DNA damage, however the data provided does not fully support this statement. In this study, Alaouid et al, utilize mutants of RAD51 that alter S97 phosphorylation to further study its function and provide data that support RAD51 as an RNA binding protein. Overall, the manuscript shows some interesting observations that are worth pursuing however the in vitro and in cellulo results are not aligned, lack some controls, and many points should be reconsidered.

      Major comments:

      • Are the key conclusions convincing?

      Not as stated.

      Fig. 1A. The authors conclude that pS97-RAD51 favors RAD51 strand invasion capacity using the D-loop assay. Indeed, the S97D phosphomimic increased the D-loop activity 2.5-fold compared to WT RAD51. However, the S97A mutant, which is the non-phosphorylated form also increased the D-loop activity by 2-fold compared to WT (figure 1C). So, the phosphorylation or the absence of it seem to promote strand invasion. So, what is the role of the phosphorylation? There is no discussion about this. Besides, no representative image of the D-loop assay is shown, this is very important as these experiments need to be run with the relevant controls to be meaningful.

      Fig. 1D. The polymerization rate of RAD51 is probably irrelevant for its function in the absence of DNA. What do they want to get at with this assay?

      In figure 2B, the authors conclude that RAD51 phosphorylation at S97 is dynamically regulated throughout the cell cycle. Indeed, the pS97-RAD51 is well observed in asynchronous cells, and the double thymidine block time course experiment followed by PI staining shows the oscillation of the pS97-RAD51 from G1 to G2/M stage. The authors quantified the ratio of pS97-RAD51/total RAD51 to conclude this. However, it would be more accurate to also divide the above over the intensity of the loading control (tubulin) because in figure 3A for example, they quantified the ratio of pS97-RAD51/tubulin but did not consider the levels of RAD51 in their quantifications.

      In figure 3B, the authors state that pS97-RAD51 is decreased after CPT treatment and that the pS97-RAD51 foci do not localize with the DNA damage marker -H2AX. The signal of gH2AX is already weird as it does not change from Ctrl to CPT conditions (especially in HCC1806 cells). A pre-extraction of soluble protein with CSK should be used to then look at the co-localization, with the pan-staining of the two signals is difficult to draw any conclusions of colocalization. Nevertheless, the signal of RAD51 seems equal in all conditions in the images shown and it does not seem to be reduced after CPT.

      In figure 4A, the authors show that Aurora A is responsible for the S97-RAD51 phosphorylation in cellulo. Indeed, the use of an Aurora A inhibitor reduces the pS97-RAD51 signal, however, this is only true in one cell line (HCC1806) but no effect was observed in HeLa cells. Is this effect cell-specific?

      The authors find that RAD51 binds both DNA and RNA and measure the affinities of the RAD51 bearing the S97D and S97A mutations. S97D shows the highest affinity for ssDNA and RNA in Fig. 7A, B, however the opposite is true for dsDNA in Fig 7C, D. All three forms of RAD51 bind RNA although with different affinities however no error bars are shown. The description of the results does not seem accurate. Importantly, these data should somehow correlate/be discussed with respect to the D-loop assay performed in Fig. 1. The authors conclude that the binding to RNA is reduced in S97D-RAD51 suggesting that the pRAD51 that they observe at nuclear speckles would be probably not associated with RNA at these nuclear speckles, right? this goes against their idea of this phosphorylated form being related to RNA splicing... - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The manuscript seems to be in early days and requires lots of editing, rewriting to relate the in vitro and in cell data and make a coherent story - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors performed chromatin fractionation to determine the correct localization of the pS97-RAD51 and looked for the phosphorylated form by western blots. But then they confirmed the finding using immunofluorescence. I think it would be more convincing and consistent if the authors do a pre-extraction before the use the antibody because as such, they would be indeed confirming the localization of the protein they are looking at that is specifically in the nucleus.

      As well, in order to test the specificity of the pS97-RAD51 antibody they generated, a simple treatment of the lysates with phosphatases would be a good control for the specificity of their antibody These and the critics mentioned above need to be address. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      This manuscript is not ready for submission - Are the data and the methods presented in such a way that they can be reproduced?

      Yes. However, the legends of the images are way too concise. - Are the experiments adequately replicated and statistical analysis adequate?

      In Fig. 2B, the authors performed a double thymidine block followed by a time course release to track cell cycle progression of the cells and phosphorylation of RAD51 at S97. They do not indicate the biological replicates they performed. There are no error bars in the estimated KD shown in Fig.7.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      The authors conclude that the S97 is specifically phosphorylated by the Aurora A kinase. How? Have they looked at other documented kinases known to phosphorylate RAD51?

      In figure 6 the authors overexpress HA-tagged RAD51 proteins corresponding to WT, S97D and S97A mutants in cells and label them for immunofluorescence. Maybe it would be better to downregulate the endogenous RAD51 to discard possible combined effects.

      In figure - Are prior studies referenced appropriately?

      The authors show in their manuscript that RAD51 protein CAN interact with RNA in vitro, a finding not previously described to my knowledge. However, a recent study entitled "RAD51-dependent recruitment of TERRA lncRNA to telomeres through R-loops, Nature, 2020" provides in vitro data showing the binding of RAD51 to TERRA, a LncRNA, which I think would be worth mentioning their manuscript.

      The authors should mention previous contributions in the field especially when it comes to RAD51 in the HR pathway post DNA damage, which is quite documented and updated. For example, in this section of the introduction, "RAD51 is a recombinase protein implicated in the strand exchange mechanism during the DSB repair by the Homologous Recombination (HR) pathway. In the absence of DNA Damage (DD), RAD51 is predominantly cytoplasmic and translocates to the nucleus during the DNA Damage Response (DDR) to manage HR repair. As it needs the undamaged sister chromatid as a template, the HR repair pathway occurs mainly in the late S, G2 phases of the cell cycle. However, it has been documented that HR repair can also occur during G1 and early S phases, and in this case, the undamaged template used for the repair could be the homologous chromosome or an RNA transcript2". This statement is definitely worth more references.

      The same problem is recurrent in the rest of the introduction; therefore, it needs to be updated and better referenced. - Are the text and figures clear and accurate?

      The text needs a lot of editing to accurately describe the results, see for example: "The resulting KD evaluation shows that the S97D mutant had a dsDNA binding affinity lower to that of the WT (a KD of 2.26 μM for the S97D-RAD51 vs a KD of 0.38 μM for the WT RAD51). Concerning, the S97A mutant comparison to the WT RAD51, we observed modified association and dissociation curves that resulted in an identical affinity to dsDNA (a KD of 0.33 μM for the S97A-RAD51 vs a KD of 0.38 μM for the WT RAD51). We can conclude that in our in vitro conditions, the Ser97 phosphorylation has a high impact on RAD51 affinity for DNA by dividing its affinity by 5.8." Besides, the figures are of low quality and should be more carefully crafted and presented. Some experiments (such as the D-loop) are not represented in the figures.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Using a different representation for the graphs would be a plus (also see previous comments)

      Referees cross-commenting

      I think the other reviewers and I have raised very important and complementary points that will help the authors improve the quality of the manuscript substantially.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      The discovery of a new phosphorylation site in RAD51 (S97) by Aurora A is potentially interesting for the field of the maintenance of genome stability as it could broaden the understanding of how such an important recombinase may be regulating the maintenance of genome integrity throughout the cell cycle. Also, the idea of RAD51 being involved in splicing and mRNA maturation seems very attractive and a very important conceptual advance. However, given the premature status of the text and the figures, the manuscript falls short to show convincing evidence. - Place the work in the context of the existing literature (provide references, where appropriate).

      Many works are highlighting the role of RNA binding proteins as an integral part of the DNA damage response. In addition, a wealth of evidence in the literature suggest that many DNA repair proteins are RNA binding proteins, and that RNA is an important player in the DDR. The possible finding that RAD51 interacts with RNA and localize to nuclear speckles possibly acting in splicing is very interesting and attractive. How is Aurora A involved in this, what is the trigger, and whether RAD51 is binding RNA at these sites is still unclear. - State what audience might be interested in and influenced by the reported findings.

      Labs working in genome integrity mechanisms and the crosstalk between transcription and DNA repair would be interested. - Define your field of expertise with a few keywords to help the authors contextualize your point of view.

      Genome Instability, homologous recombination

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for their careful reading and the many thoughtful suggestions to improve our manuscript, as well as both the Editors and Reviewers for the generally positive evaluations and encouraging statements.

      Editorial assessment:

      This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simula- tions and comparison to experiments, the study provides solid evidence for the role of the DG-CA3 loop in generating theta-time scale correlations and sequences, which would be reinforced through the clarification of the concepts introduced in the study, in particular the notion of intrinsic and extrinsic sequences. This study will be of interest for the hippocampus and neural coding fields.

      We appreciate that our work has been considered important. In our revision we made a considerable effort to improve on the presentation of our results and the justification of our model assumptions. Particularly we aimed to clarify the meaning of intrinsic and extrinsic sequences by ad- ditional figure panels as well as fleshing out their definition via spike-timing correlations being independent or dependent on the direction of the running trajectory, respectively. To address all the requests, we added 3 new Fig- ures, multiple new Figure panels and simulated a new model variant.

      Reviewer #1 in their public review assessed ”The manuscript has the potential to contribute to the way we interpret hippocampal temporal coding for navigation and memory.”

      They criticized

      • The findings generally relate to network models of phase precession (re- viewed in e.g., Maurer and McNaughton, 2007, Jaramillo and Kempter, 2017). An important drawback of these models with respect to explaining specific experimentally observed features of phase precession, is that they cannot straightforwardly explain phase precession upon first exposure onto a novel track. This is because, specific connectivity in network models may re- quire experience-dependent plasticity, which would not be possible upon first exposure. This is essential, given that the manuscript addresses the possible origin of phase precession in terms of network models and at minimum, this weakness should be discussed.

      We agree with Reviewer # 1 (and also with Reviewer # 2, who brought up a similar point) that models based on recurrence struggle to ex- plain how the recurrent connectivity matrix should come about. While we feel that a full model of how the 2-d topology in the recurrent weights can be learned goes far beyond the scope of this paper (and to our knowledge has not been solved so far in any existing model), we added a new model variant (new Figure 6 and Supplementary Figure 1), which explains the ba- sic phenomenology of extrinsic and intrinsic sequences without the need of recurrent connections, only using feed-forward synaptic facilitation. Thus, assuming recurrent connection is not necessary for our main findings. How- ever, we would like to point out that this does not exclude the possibility that recurrent connections, if set up in an appropriate way, also contribute to phase precession and theta sequences.

      • An important and perhaps essential component of the manuscript, is the distinction between extrinsic and intrinsic models. However, the main con- cepts on which this hinges, namely extrinsic and intrinsic sequences (and the related extrinsicity and intrinsicity) could be better explained and illustrated. Along these lines, the result suggested by the title, namely, hippocampal theta correlations, may be important yet incidental in light of the new concepts (e.g., extrinsicity, intrinsicity) and computational models (e.g., DG-CA3 recurrent loop) that are put forward.

      We have added substantial new explanatory material to the figures, captions and text to more didactically introduce the concepts of in- trinsicity and extrinsicity. We have also completely rewritten the abstract and added a subtitle: ”extrinsic and intrinsic sequences”

      • The study seems to put forward novel computational ideas related to neural coding. However, assessing novelty is challenging as this manuscript builds on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmadi et al., 2022. bioRxiv) work. For example, the interpretation of intrinsic sequences in terms of landmarks had been introduced in Leibold, 2020.

      We agree with the reviewer that this paper touches on many related ideas from previous papers (not only of our lab) and is supposed to tie loose ends. Thus, the novel contribution is a biologically plausible mechanistic model of how intrinsic sequences and 2-d place maps interact on the level of interconnected spiking neurons. Such a level of explanation has not yet been available in previous work. We have considerably extended the Discussion section in our revision detailing the bigger picture underlying this theory. Also our addition of the non-recurrent model variant (see above) adds considerable novelty, since it provides an account of phase precession and preplay in novel environments.

      • The significance of the readout tempotron neuron could be expanded on. In particular, there is room for interpretation of the output signal of that neuron (e.g., what is the significance of other neurons downstream? Why is the rationale for this output to being theta-modulated?)

      We have added an additional Figure 8 to better illustrate the inner workings of the tempotron. We also extended the discussion to better explain the potential use of the tempotron output (see above). In short, we consider the tempotron to signal a unique behaviorally important context that is independent of remapping induced by changes of sensory cues, which is a new prediction of the model. Since the context signal is resulting from DG loops it requires a stable code to also exits in the DG. Evidence for such long-term stability in DG has been found in Hainmu¨ller & Bartos (2018).

      Reviewer #2 in their public review find ”this research topic to be both important and interesting” and appreciates ”the clarity of the paper.”, com- mending our ”efforts to integrate previous theories into their model and con- duct a systematic comparison”.

      We are very happy about these positive remarks and sincerely would like to thank the reviewer!

      Reviewer #1 made the following specific recommendations for changes:

      The abstract is somewhat difficult to parse. I have identified some words and/or sections that could be improved.

      • ’ ....inherently 1 dimensional’. This statement seems to be related to an a priori interpretation of the authors. On the other hand, if offline sequences are trivially 1 dimensional because they are sequences (i.e., they constitute a vector), then online sequences would be 1-dimensional as well. What is the key difference between offline and online? Is it the omnidirectional place fields in two dimensions? Perhaps more importantly, how relevant is this fact with respect to the main results of the manuscript, which concern ex- trinsic and intrinsic sequences?

      We indeed meant that the sequences are trivially 1-dimensional. The main challenge that we would like to address in this paper is how a 2-d topology of place cells (and direction dependent theta sequences) and a 1-d sequence topology of intrinsic theta correlations and during (p)replay can be reconciled. We hope this has become clearer in the rewritten abstract.

      • The language in lines 36-38 is overly technical. I suggest modifying the language, the language was less technical and more understandable in the body of the manuscript, which should be also reflected in the Abstract.

      We would would like to apologize for making the abstract too technical. Also in response to Reviewer #2, we decided to rewrite the ab- stract entirely.

      The authors use a mixture of conductance based models and Izhikevich neurons, presumably for the spiking generating mechanism. The conductance component can be readily interpreted in terms of the underlying biophysics. The Izhikhevich neuron model, however, is phenomenological. I suggest you address i) the rationale for using Izhikevich model, 2) its biophysical inter- pretation, 3) and its combination with conductance-based currents.

      The reviewer is correct that spike generation is modelled using Izhikevich’s model whereas synaptic integration is included in a conductance- based manner. As suggested by the reviewer, we have added further expla- nation in the Methods part, explaining that the Izhikevich approach allows to adjust burst firing properties with only few parameters by efficiently em- ulating the bifurcation structure of spike generation in the full biophysical model (1&2) and otherwise has no effect on the integration of conductance- based synaptic currents in a subthreshold regime (3).

      Line 126: when you say preferred angle, do you mean preferred (heading) direction? If so, please maintain consistency throughout.

      We thank the reviewer for pointing out the inconsistency. We have added the word ”heading” throughout the manuscript whenever ap- propriate. To further improve the consistency, we have clarified the meanings of ”best” (or ”worst”) direction and reserved the use of it solely for cases when trajectory direction is compared with the preferred heading direction, namely, ”best” (”worst”) direction when trajectory is along (opposite) the preferred heading direction.

      Line 174: When discussing cross-correlation, sometimes you mean a cross-correlation function between two place fields and sometimes to the his- togram of all such correlations? Please clarify.

      We used histograms to empirically estimate the underlying cross-correlation function. For clarity, we have specified that it is a cross- correlation histogram in the revised manuscript whenever we refer to the empirical estimate.

      Figure 3:

      Understanding the difference between extrinsic and intrinsic sequences is fundamental for the manuscript. I suggest that in the section that refers to Figure 3 (or Figure 3 itself), you kindly provide an example depicting how extrinsic and intrinsic sequences can

      1) coexist yet be distinctly identified

      2) depend on trajectory

      3) depend on DG input

      By coexistence, we meant the heterogeneous population of ex- trinsic and intrinsic cell pairs and, hence, the extrinsic and intrinsic theta correlations, as shown in Figure 3J. To improve the clarity, we added the following sentence in the section that refers to Figure 3: ”In our simula- tion, extrinsically and intrinsically driven cell pairs are both present in the population (Figure 3J), indicating a coexistence of extrinsic and intrinsic sequences.”. To illustrate how extrinsic and intrinsic sequences depend on both tra- jectory and DG recurrence, we have also added annotations in Figure 3F to mark the extrinsic and intrinsic part of the sequence.

      Moreover, the caption of Figure 3 refers to the directionality of the theta sequences. How does this again relate to the extrinsic/intrinsic distinction?

      We hope the highlighting in panel F of Figure 3 has resolved this problem.

      Figure 5:

      • This is a crucial figure that should illustrate the differences between extrinsic and intrinsic sequences, as the figure caption suggests. Surprisingly, it is not at all clear where (i.e., in which panel) and how (i.e., methodologi- cally) should one distinguish one type of sequence from another. I suggest that at least one such panel is dedicated to illustrating the difference and/or detection of these sequences in time and/or from phase precession plots. Moreover, there is significant visual crowding that makes the interpretation challenging (e.g., insert a space between G and E)

      We would like to apologize that in the previous version of the manuscript, we seemed to have evoked the impression that the difference between intrinsic and extrinsic sequences should be mainly illustrated in Figure 5. We hope that our revisions of Figures 1 and 3 have made it sufficiently clear to this point. The main purpose of Figure 5 was (and is) to illustrate how intrinsic sequences can lead to out-of-field firing. We have modified the figure caption (and text) accordingly. To address the visual crowding problem in Figure 5, we have inserted a space between panels and also removed repeated labels.

      Tempotron neuron and Figure 6:

      From the reviewer’s questions on Figure 6, we feel that our presentation caused considerable confusion about the motivation and inter- pretation of the tempotron simulations. We therefore rewrote parts of the associated text and Figure caption. We hope that the revised presentation clarifies the issues. We therefore only briefly respond to the reviewer’s points here, because we think they largely resulted from misunderstandings.

      • Intuitively, and as the manuscript results suggest, late phases are asso- ciated to extrinsic mechanisms while early phases are associated to intrinsic. Why not construct a simpler classifier readout based on this fact? How does it compare to a tempotron?

      Opposite to the reviewer’s comment, extrinsic mechanisms are visible at early phases (late in the field), intrinsic mechanisms at late phases (early in the field). In fact, what the tempotron does is learning to identify the intrinsic (late phase) part and to disregard the extrinsic (early phase) part.

      • What is the significance of theta-modulated output of the tempotron (readout) neuron?

      The theta modulation of the tempotron output is a trivial re- sult of the theta-modulation of the input, i.e., the detection of the intrinsic sequence pattern is done once every cycle.

      Suggestion for Figure 6 related to Tempotron readout: Focus on ’with DG loop condition’, as the challenge and most important point here is to identify extrinsic and intrinsic sequences. The No-loop condition could be left as a supplementary figure or side panel.

      The no-loop condition is the essential control showing that the tempotron only responds to the previously learned intrinsic pattern and can- not identify spatial location based on the extrinsic pattern.

      Further work/predictions.

      Lines 196-198. ”Since intrinsic sequences can also propagate outside the trajectory (Figure 5) and activate place cells non-locally, our model predicts direction-dependent expansion of place fields.” If remote activation is ’suffi- ciently’ remote, wouldn’t this predict two separate place fields instead of an expansion?

      The reviewer is completely correct. Out of field spiking can be also affecting remote locations, if the intrinsic sequences link to remote place fields. This would lead to double fields, however, the intrinsic part would only be active at late theta phases. For simplicity, we have not added such a case in our paper, but we would like to thank the reviewer for this comment, since it leads to a nice prediction of the model, which can be experimentally tested and therefore was included to the discussion.

      Lines 556-558. ”In our model, firing rate is determined by both low-phase spiking from sensory input and high-phase spike arrivals of DG-CA3 loops, both producing opposing effects on the phase distribution.” Is it possible to make a differential prediction based on lesions here, e.g., along the lines of reduced range phase precession, for either high phases or for low phases?

      We thank the reviewer for this great suggestion. Lesion of DG in the model does indeed reduce the phase range and mean spike phase. This further corroborates the effect of DG-loop on theta compression and high-phase spiking. We have included a new panel D in Figure 4 and a corresponding mention in the result section.

      Line 570. ”We speculate that the functional roles of intrinsic sequences may not be limited to spatial memories.”. Is there any relationship to re- play and/or sleep-dependent memory consolidation? Some speculation in the Discussion section would be welcome and appropriate.

      We have added some further speculative ideas to the last section of the Discussion. We propose that replay and preplay reflects the intrinsic sequences that express the current expectation of the animal. We have not yet thought well enough about their relation to memory consolidation to phrase this in the manuscript, but would suggest that they could serve to signal multimodal context information to the neocortex where it can evoke retrieval of unimodal memory traces.

      The description of the results, as stated in the public review, can be im- proved. A key component is the definition and identification of extrinsic and intrinsic sequences.

      Some comments:

      • I think that the words ’extrinsic’ and ’intrinsic’ are problematic as both types of sequences/models rely on external (spatial) input, hence both are in some sense ’extrinsic’. On the other hand, both are network mechanisms, thus in some sense ’intrinsic’, where the asymmetry is either programmed directly onto the weights or due to synaptic depression. To add to the con- fusion, ’intrinsic’ mechanisms very often refer to cellular mechanisms in neurophysiology. I kindly ask you to, ideally, reconsider the terminology, or at the very least, be very thorough and precise when describing the mech- anisms. For example, sometimes extrinsic (intrinsic) ’models’ are referred to, sometimes ’sequences’, sometimes ’factors’, sometimes ’pairs’, etc.

      We understand and appreciate the reviewers argument, but would like to stick to the terminology, since it was already used in our prior publication. We have made considerable effort to improve the explanation and illustration of extrinsic vs. intrinsic pairs in the main text, Figure 1 and 3 to highlight our definition that is based on pair correlations: Extrin- sic pairs flip the correlation lag with reversal of running direction, intrinsic pairs don’t. This is simply a functional definition and should not be con- fused with potential microscopic mechanisms. One of those (DG-loops) is suggested in our paper.

      • As discussed in the public review, network mechanisms may require experience-dependent plasticity and hence cannot easily explain phase pre- cession on the first pass. Please discuss why and/or how your model fits with this observation.

      We agree that the two models under consideration both require the recurrent network be set up appropriately and there is no theory so far that would explain how. The reason we chose these two models is because they are well known in the community and relatively similar. We reasoned that comparison between an intrinsic model and an extrinsic model would make most sense if the two are a similar as possible. Nevertheless, we ex- tended the manuscript by a new set of simulations in which we do not use re- current CA3 connections and obtain phase precession solely be feed-forward synaptic facilitation (new Figure 6 and supplementary Figure S1). The new simulations show that the basic phenomenology can also be obtained with- out using recurrent CA3 connections, however, as expected when removing one mechanisms of phase precession, the range of phase range is somewhat reduced as compared to the full model.

      Along a similar vein, phase precession in Figure 1E only has a range of pi/2, which is about half of the typical range of phase precession for single runs. This should be characterized as a weakness of the intrinsic model.

      The precession range in spiking models is highly sensitive to a large number of parameters such that it is hard to make such definite claims (see also above response). In the original Tsodyks et al. 1996 paper the phase range went up to 270 degrees with a slightly different implementation to ours in terms of current vs. conductance-based synapses, an exponen- tial instead of a Gaussian recurrent weight function, and 1-d (original) vs 2-d (ours). We chose conductance-based synapses, and a Gaussian weight profile for better comparison with the Romani and Tsodyks (2015) model. In the original non-spiking implementation by Romani and Tsodyks (2015), the phase range was hardly 70 degrees. Our model implementation of the Romani and Tsodyks (2015) model fits the experimentally reported phase ranges of about 70 to 180 degrees in CA3 (Harris et al., 2001).

      Lines 282-284: ”...since phase precession properties change in relation to running directions, nor are they solely intrinsic since reversal of correlation is still observed in most of the sequences (Huxter et al., 2008; Yiu et al., 2022).”. To which extent is this a consequence of the phase precession model (extrinsic vs intrinsic) or the fact that place fields are sometimes directional?

      The reversal of sequences with reversed running direction is how we define extrinsic correlation. We hope our changes in relation to Figure 1 has clarified this point.

      Figure 2: Is it i) directional input or ii) short-term facilitation that gives rise to lower phase? (or perhaps both?) Please clarify.

      It’s both. This is now clarified in the revised version of the Re- sults sections related to Figure 2: higher depolarization always yields earlier phases in spiking models, however, pair correlations are not affected by ei- ther of the two mechanisms.

      Line 320. ”...onset of phase precession”. Do you mean in CA3/CA1/DG?

      Thank you for pointing this out. We have clarified that this statement refers to CA3.

      Line 323. ”....at a different location”. Please add rationale why it has to be at a different location and a reference to the appropriate equation.

      The sequence rationale as well as the equation number have been added.

      Line 384. ” ... predicting that loss of DG inputs is compensated for by the increase of release probability in the spared afferent synapses from the MEC.”. It wasn’t clear whether this was a ’homeostasis prediction’, or and implementation in the model. Please clarify.

      Since the model explained the experimental observations by implementing an increased probability of release, the model predicts that in animals with DG lesion the probability of release should be enhanced. We have modified the wording to avoid confusion.

      Line 428 ”...and near future locations) is obvious, the potential role of the lesser expressed intrinsic sequence contributions is not straightforward.”. Similar to my comments above regarding terminology, please clarify what are both contributions and why are intrinsic sequences ’lesser expressed’.

      We have rewritten this passage to avoid unclear wording.

      Line 474. ”...we showed that the trajectory-independent sequences”. Do you mean ’intrinsic sequences’?

      We thank the reviewer for careful reading! We have changed the wording ”intrinsic sequences” in the revision.

      Line 482. ”...field pairs being extrinsic”. Please clarify, as the usage of extrinsic now refers to field pairs.

      Thank you for pointing this out. We went through the whole manuscript and clarified the terms.

      Line 245 (heading). Consider rewriting as ’Dependence of theta se- quences on heading directions’. Extrinsic and Intrinsic models have not yet been introduced.

      Since the main purpose of the first Results section is to explain the difference between extrinsic and intrinsic sequences we kept these terms in the heading but modified it to ”Dependence of theta sequences on head- ing directions: Extrinsic and intrinsic sequences”. Additionally, we have put more emphasis on introducing the terms ”extrinsic” and ”intrinsic” in this section.

      Figure 1.

      • I suggest using the same font - C and D, and F and G are too close to each other, consider adding space. For example, the exponent, 10-2 makes reading cumbersome. Line 300. Phase tail means offset phase? Phase tail may be too informal. Line 325: DG loop. Do you mean CA3-DG projection?

      We thank the reviewer for the suggestions. In the revised manuscript, we have ensured that the same font is used in all of the fig- ures. To improve the readability of Figure 1, we have added space between panels as suggested, removed repeated axis label and downsized the text ”10-2”. Furthermore, we have rewritten the referenced line without using the word ”tail”, and also, clarified the meaning of DG loop as the short form of CA3-DG projection.

      Figure 4 caption: ”DG lesion reduces temporal correlations...”. It is more precise to say that the lesion reduces the slope of the fitted lag vs dis- tance. And how is this related to sequence compression?

      In the paragraph referring to Figure 4, we have elaborated on the meaning of theta compression and its relation with the the lag-distance plot. However, we argue that ”reduces the slope of the fitted curve” is not comprehensive enough to express our summarized conclusion in a caption title. We have modified the wording to be ”DG lesion reduces theta compression”.

      In addition, we have changed the slope unit to be radians per cm rather than radians per maximum pair distance, in conformity to unit standards.

      General comment about terminology with regards to tuning and connec- tivity: it is not formally correct to compare connectivity with trajectories (e.g., lines 388-395, caption of Figure 5A, etc). Perhaps compare tuning to particular directions/preference or receptive field?

      We have corrected the wording such that the direction of DG- loop projection is compared to the direction of trajectory.

      Line 470. ’...fixed recursive loop.” Sentence is not clear, do you mean recurrent loops?

      The reviewer is correct. We corrected the wording

      Reviewer #2 had the following recommendations.

      M1. The abstract focuses on the differences between online and offline hippocampal replays. However, the replay topic is not touched upon in the rest of the manuscript. I found this very confusing when I first read the pa- per. I suggest the authors reconsider the best way to approach the opening or at least discuss if and how their model would incorporate replay phenomena.

      Also in response to reviewer #1 we have rewritten the abstract focusing on the problem of how to generate 2-d topology from 1-d sequences. In addition, also in response to Reviewer#1 we added a paragraph in the discussion detailing a hypothesis on how er think replay and intrinsic se- quences work together.

      m2. On lines 89-91, the authors provide the selection of neuronal pa- rameters for excitatory pyramidal cells and inhibitory cells in the Izhikevich model. While the choice of model is reasonable, it would be helpful to clarify the source of these neuronal parameters, especially for readers who are not familiar with the model.

      Again, also in response to reviewer # 1, we have added more motivation for the Izhikevich model.

      M3. On lines 94-98, the model considers a 2D sheet of CA3 neurons. One of the most significant assumptions is that each 2x2 tile of place cells is considered a unit with four directional angles. What is the basis for this assumption? Is there any experimental result supporting this, or is it a completely artificial design for the model? This is important since the or- ganization of CA3 cells also affects the network architecture discussed later and impacts the realism of the model.

      This comment is related to Reviewer #1’s concern on experience- dependent plasticity: How is this connectivity pattern established? We fully agree that this is an open problem for the Tsodyks et al.-type networks. The main reason for choosing them (as argued in our response to reviewer #1) is to have two published models, representing one type of sequence each, that are similar enough for comparison. In addition, we added new simulations (new Figure 6 and Supplementary Figure S1), showing that the basic phe- nomenology can also be obtained in a model without recurrent connections (see also response to Reviewer # 1)

      m4. Similarly, on lines 111 and 140, the model uses 500 ms for the timescales of short facilitation and short-term synaptic depression. The choices of these two timescales are vital for producing directionality in extrin- sic and intrinsic sequences, yet their experimental sources are not clarified.

      In the Methods section of the revised manuscript, we have in- cluded the sources of previous experimental data and modelling work to support our choice of the time constants.

      M5. On line 126, the authors assume that the synaptic strengths be- tween CA3 cells, Wij, are given by the distances between neurons and the similarity between their directional preferences. While this assumption seems reasonable in the sensory cortex, I am unsure if this is also the case in the hippocampus, and the authors should clarify the basis for this assumption.

      The distance dependence simply reflects the original Romani and Tsodyks 2015 model (see response to M3) and we share the concern of the reviewers. The increased connectivity for neurons with the same di- rectional preference was necessary to recover the direction dependent phase precession properties (Figure 2) in the realm of the Romani and Tsodyks 2015 model. Please also see our new Figure 6 showing simulations without the recurrent matrix.

      More importantly, the existing connections within CA3 and DG cells completely determine the ”intrinsic” sequences. But wouldn’t this be fragile when place cells undergo global remapping, which can take place within only a few seconds? The author should comment on this in the discussion.

      We would like to thank the reviewer for bringing up this inter- esting point. In our thinking, the DG-CA3 connectivity is fixed (multiple 1-d trajectories, not necessarily requiring 2-d topology), i.e., the same in- trinsic sequence should show up in multiple environments (and should not remap), although it may just not be active in some environments). This is a prediction of our model and we have added it to the Discussion.

      M6. I found the setup of DG place cells unreasonable. DG place cells are found to be granule cells rather than pyramidal cells. Moreover, the model does not consider recurrent connections between DG cells (These setups are closer to CA1 place cells).

      We agree with the reviewer, DG granule cells should rather be modelled as high-input resistance EIF neurons. However, the feedback loop via the dentate is not a direct one. It involves hilar mossy cells plus multiple hierarchies of feedback inhibition (this is probably what the reviewer means with recurrent connections between DG neurons, because granule cells are not recurrently connected in the non-pathological state). To our knowledge a biologically realistic model of the hilar-DG network does not exist and it would be far beyond the scope of this paper to develop one. We therefore see our DG feedback model rather as phenomenological. The discussion paragraph on the anatomy of the dentate gyrus touches on these points.

      Therefore, a significant concern is: Why should it be the DG feedback projection to CA3 responsible for the ”intrinsic” sequences instead of pro- jections from other brain areas?

      The reviewer is generally correct, any brain structure which im- plements fixed sequences via a loop would do. The reason why we suggest the DG to be the best candidate is purely empirical referring to papers with dentate lesions: Sasaki et al. 2018 and Ahmadi et a. 2022. We have added a similar argument to the discussion.

      m7. On line 166, the authors claim that there are no connections between inhibitory cells at all. While I understand that this is for simplification of the model, the lack of recurrent inhibition between interneurons may have limited the model’s ability to produce gamma-band dynamics (referring to PING and ING mechanisms), which are robust rhythms produced in CA3. I am very curious if the model can incorporate theta-gamma coupling by in- troducing connections between CA3 inhibitory cells.

      We have omitted the gamma oscillation for simplicity, because we do not have a hypothesis for a functional role in the context of dis- tinguishing extrinsic from intrinsic sequences (Occam’s razor) and, as the reviewer correctly anticipates, they unavoidably show up when inhibitory in- terneurons connect to each other (e.g. Thurley et al. 2013). Of course, one could envision situations in which gamma for intrinsic sequences my have different frequency than for extrinsic ones, by differentially manipulating the CA3 and DG basket cell networks, but, as long as there is no experimental data, it would be pure speculation and thus we have not included it in the model.

      m8. The authors should clarify the source of parameters in Table 1, especially the synaptic strengths. These values are vital for extrinsic and intrinsic theta sequences.

      The weight values have been chosen to allow for large theta phase precession range, coexistence of extrinsic and intrinsic sequences, and stability of the network activity. A similar statement has been added to the manuscript.

      M9. I have another concern regarding the measurements of ”extrinsic- ity” and ”intrinsicity” defined on lines 185-196. Are they the best measures? To distinguish the cause of spike correlations, the ”extrinsicity” and ”intrin- sicity” of a pair of spikes should not be high at the same time. However, this is clearly not the case in the model, according to Figs 3 and 5. Moreover, in the data analysis carried out later, spike pairs are considered extrinsic or intrinsic merely by comparing the two measurements. I suggest the authors consider counterfactual methods in causal inference. For example, would a spike pair (cell1, cell2) still exist if we change the sensorimotor inputs or the DG-CA3 projections? If this is difficult to implement, the authors should at least discuss how different choices of measurements would impact the con- clusions of the paper.

      The problem the reviewer has identified arises from the funda- mental symmetry of theta phase quantification: if spikes of a pair of place fields have a phase difference of 180◦ one cannot say which cell leads and which cell follows, hence, the phase difference is both intrinsic (because the peak doesn’t flip) and extrinsic (because the peak flips and ends up at the same phase). The fact that in some cases extrinsicity as well as intrinsicity are high simply means that the field pair has a correlation peak lag close to 180◦. Since in the experimental data set in (Yiu et al. 2022) only field pairs were available, we have not been able to use a different quantification then and decided to apply the same quantification in our model for comparison. Moreover, Figure 5F nicely shows that the measures are able to retrieve the ground-truth intrinsic DG-loop structure when considered on the population level.

      In our model, though, we can go beyond 2-nd order statistics and derive sequence similarity measures including multiple cells, e.g., Chenani et al. 2019. However, since, we already know the ground truth by construction, we decided to not use these methods. We added a paragraph in the discus- sion elaborating on beyond 2nd order sequence quantification.

      m10. The authors begin discussing ”intrinsic sequences” from line 316. However, it is not defined before that (and in the rest of the paper as well), causing confusion when reading the paper. The exact definitions of extrinsic and intrinsic sequences should come earlier.

      We hope that our changes to the beginning of the results section (Figure 1), also asked for by Reviewer # 1 could clarify the confusion.

      m11. On lines 345-347, the authors claim that ”the intrinsic sequences are played out backward as determined by the direction of fixed recurrence (Figure 3F),” which is vague. If such sequences are present in that panel, it should be more explicitly indicated graphically.

      Also in response to Reviewer #1, we have graphically high- lighted the two types of sequences.

      M12. On lines 309, 356, 484, 495, 515, and possibly other instances, the authors repeatedly claim that the model simulations are in ”quantitative agreement” with their previous experimental paper. However, no experimen- tal data or comparison with the simulations are presented in this paper. The authors should at least create one figure to demonstrate the degree of consistency between them, instead of merely asking the reader to refer back to their previous paper.

      We agree with the reviewer that the experimental data of our previous paper should be presented in the manuscript. However, creating more panels or figures is likely to clutter the already crowded visuals and ob- scure our main message. We therefore decided to give numerical comparisons the previous findings in the main text whenever appropriate, specifically, in the sections referring to Figures 2, 3 and in the Discussion.

    1. Author response

      Reviewer #1 (Public Review):

      The potential role of the CaMKII holoenzyme in synaptic information processing, storage, and spread has fascinated neuroscientists ever since it has been described that self-phosphorylation of CaMKII at T286 (pT286) can maintain the kinase in an activated state beyond the initial Ca2+ stimulus that induced kinase activation and pT286. The current study by Lučić et al utilizes biochemical and biophysical methods to re-examine two pT286 mechanisms and finds:

      (1) that a previously proposed activation-induced subunit exchange within the holoenzyme can not provide pT286 maintenance or propagation; and

      (2) that pT286 can occur not only within a holoenzyme but also between two holoenzymes, at least at sufficiently high concentrations.

      For the observation regarding the subunit exchange, the authors go above and beyond to demonstrate that a previously proposed activation-induced subunit exchange does not actually occur in their hands and that the previous appearance of such a subunit exchange may instead be due to activation-induced interactions between the kinase domains of separate holoenzymes. This provides important clarification, as the imagination about the possible functions of this subunit exchange has been running wild in the literature.

      By contrast, pT286 between holoenzymes at sufficiently high concentrations was largely predicted by the previously reported concentration-dependence of pT286 between monomeric truncated CaMKII (although these previous experiments did not rule out that such pT286 could have been excluded for intact full-length holoenzymes). Notably, the reaction rate reported here for pT286 between two holoenzymes is more than two orders of magnitude slower compared to the previously described rate of the pT286 reaction within a holoenzyme.

      The only point on which we disagree (and we think it’s unarguable) is that the current consensus is that inter-holoenzyme phosphorylation simply doesn’t happen (whether or not monomers can phosphorylate each other). The reviewer is of course right that this view seems now less and less likely. We now performed new experiments to investigate this critical point further (see below).

      The probable reason for the discrepancy in reported half-time of phosphorylation measured in earlier reports and in our paper is the fact that earlier reports (for example Bradshaw et al., 2002) measured autophosphorylation rate of wild-type CaMKII holoenzymes, at catalytically-competent enzyme concentrations of 0.1-5 µM. We are reporting the phosphorylation rate of 4 µM kinase-dead CaMKII, which is only a substrate, by 10 nM catalytically competent enzyme (CaMKII wild-type). There is up to 500 times less catalytically competent enzyme in our reactions, which is probably the reason why the reaction itself is several orders of magnitude slower.

      In summary, this study contains two somewhat disparate parts: (1) one technical tour-de-force to provide evidence that argues against activation-induced subunit exchange, which was a tremendous effort that provides influential novel information, and (2) another set of experiments showing the somewhat predictable potential for pT286 between holoenzymes, but without indication for the functional relevance of this rather slow reaction. Unfortunately, in the current/initial title of the manuscript, the authors chose to emphasize the weaker part of their findings.

      We agree with the reviewer that the title should be modified to emphasize both findings of our study. We also hope that our new experiments do bolster our findings with regard to pT286 between holoenzymes, as the reviewer puts it.

      The seemingly slow inter-holoenzyme phosphorylation is only slow under conditions in which one of the proteins is kinase-dead. In situation in which all CaMKII holoenzymes are wild-type and therefore capable of performing phosphorylation (both intra- and inter-holoenzyme) the reaction rates for pT286 are expected to be orders of magnitudes faster, than those reported here for the phosphorylation of T286 on kinase-dead protein.

      Reviewer #2 (Public Review):

      This well-written manuscript provides a technical tour-de-force to provide a novel mechanism for sustaining CaMKII autophosphorylation through an interholoenzyme reaction mechanism the authors term inter-holoenzyme phosphorylation (IHP). The authors use molecular engineering to create designer molecules that permit detailed testing of the proposed interholoenzyme reaction mechanism. By catalytically inactivating one population of enzymes, they show using standard assays that the inactive enzyme can be phosphorylated by active holoenzymes. They go on to show that in cells, the inactive enzyme is phosphorylated only in the presence of co-expressed active CaMKII and that this does not appear to be due to active and inactive subunits mixing within the same holoenzyme. The authors suggest reasons for why previous experiments failed to expose IHP and in some experiments provide evidence that reproduces and then extends earlier studies. Some noted differences from earlier experiments are the reaction temperature, the time course of the reactions, and that significantly higher concentrations of the inactive (substrate) kinase in the present study amplify the IHP. These are plausible reasons for earlier studies not finding significant evidence for IHP and the presented data is well-controlled and of high quality.

      The authors then take on the idea of subunit exchange employing multiple strategies. Using genetic expansion, they engineer an unnatural amino acid into the hub domain of the kinase (residue 384). In the presence of the photoactivatable crosslinker BZF and UV illumination, a ladder of subunits was generated indicating intraholoenzyme crosslinks were established. Using this cross-linked enzyme, presumably incapable of subunit exchange, the authors show significant phosphorylation of the kinase-dead mutant. This further supports that IHP is the cause of phosphorylation and not subunit exchange. Extending these experiments, they could not find evidence when CaMKIIF394BZF was mixed with the kinase-dead mutant and exposed to UV light, that there was evidence of the kinasedead subunits exchanged into CaMKIIF394 (active) enzymes.

      Just a note, instead of residue 384, this should read 394.

      With an entirely different approach, the authors use isotopic labeling of different pools of wt CaMKII (N14 or N15) followed by bifunctional cross-linking and mass spec to assess potential intra- and interholoenzyme contacts. Several interesting findings came of these studies detailed in Figure 4, mapped in detail in Figure 5, and extensively documented in supplementary tables. Critically, numerous crosslinks were found between different domains of the enzyme (catalytic, regulatory, hub) that are themselves a nice database of proximity measurements, but critical to the hypothesis, no heterotypic cross-links were found in the hub domains at any activated state or time point of incubation. This data supports two findings, that catalytic domains come into close proximity between holoenzymes when activated, supporting the potential for IHP, but that no subunit exchange occurs.

      The authors then pursue the approach used originally to provide evidence of subunit mixing, single molecule-based fluorescence imaging. Using pools of CaMKII labeled with spectrally separable dyes, the authors reproduce the earlier findings (Stratton et al, 2016) showing that under activating conditions, but not basal conditions, colocalized spots were detected. Numerous controls were done that confirm the need for full activation (Ca2+/CaM + Mg2+/ATP) to visualize co-localized CaMKII holoenzymes. Extending these studies, the authors mix holoenzymes, fully activate them, and after sufficient time for subunit exchange (if it occurs), the reactions were quenched, and then samples were analyzed. The result was that no evidence of dual-colored holoenzymes was present; if subunits had mixed between holoenzymes, dual-colored spots should have been evident after quenching the reactions. This was not the case. Further, experiments repeated with pools of differentially labeled kinase dead enzymes produced no colocalization, as predicted, if activation of the catalytic domains is necessary to establish IHP.

      Finally, the authors employ mass photometry to investigate the potential for interholoenzyme interactions. At basal conditions, only a mass peak consistent with CaMKII dodecamers was evident. Upon activation, a small fraction of dimeric complexes was evident (with Ca2+/CaM bound) but the majority of the peak was a dodecamer with 12 associated CaM molecules, and importantly, a significant fraction of a mass population was found consistent with a pair of holoenzymes with associated CaM. As an aside, the holoenzyme population appeared to be modestly destabilized as evidence of a minor fraction of dimers appeared as the authors diluted the enzyme, but the pools of holoenzyme and pairs of holoenzymes (with CaM) remained the dominant species when activated under all three enzyme concentrations assessed. Supporting the importance of activation for interactions between holoenzymes, the catalytically dead kinase even under activating conditions, shows no evidence of dimers of holoenzymes.

      Each of the approaches is well-controlled, the data is of uniformly high quality, and the authors' interpretations are generally well-supported.

      We are very grateful for these supportive comments.

      Reviewer #3 (Public Review):

      CaMKII is a multimeric kinase of great biologic interest due to its crucial roles in long-term memory, cardiac pacemaking, and fertilization. CaMKII subunits organize into holoenzymes comprised of 1214 subunits, adopting a donut-like, double-ringed structure. In this manuscript, Lucic et al challenge two models in the CaMKII field, which are somewhat related. The first is a longstanding topic in the field about whether the autophosphorylation of a crucial residue, Thr286, can be phosphorylated between intact holoenzymes (inter-holoenzyme phosphorylation). The second is a more recent biochemical finding, which tested the long-running theory that CaMKII exchanges subunits between holoenzymes to create mixed oligomers. These two models are connected by the idea that subunit exchange could facilitate phosphorylation between subunits of different holoenzymes by allowing subunits to integrate into a different holoenzyme and driving transphosphorylation within the CaMKII ring. Here, the authors attempt to show that one intact holoenzyme phosphorylates another intact holoenzyme at Thr286. The authors also provide evidence suggesting that subunit exchange is not occurring under their conditions, and therefore not driving this phosphorylation event. The authors propose a model where instead of exchanging subunits, two holoenzymes interact via their kinase domains to enable transphosphorylation at Thr286 without integrating into the holoenzyme structure. In order for the authors to successfully convince readers of all three facets of this new model, they need to provide evidence that 1) transphosphorylation at Thr286 happens when subunit exchange is blocked, 2) subunit exchange does not occur under their conditions, and 3) there are interactions between kinases of different holoenzymes that lead to productive autophosphorylation at Thr286.

      Strengths:

      The authors have designed and performed a battery of cleverly designed and orthogonal experiments to test these models. Using mutagenesis, they mixed a kinase-dead mutant with an active kinase to ask whether transphosphorylation occurs. They observe phosphorylation of the kinase-dead variant in this experiment, which indicates that the active kinase must have phosphorylated it. A few key questions arise here: 1) whether this phosphorylation occurred within a single CaMKII holoenzyme ring (which is the canonical mechanism for Thr286 phosphorylation), 2) whether the phosphorylation occurred between two separate holoenzyme rings, and 3) why was this not observed in previous literature? To address questions 1 and 2, the authors implemented an innovative strategy introducing a geneticallyencoded photocrosslinker in the oligomerization domain, which when crosslinked using UV light, should lock the holoenzyme in place. The rate of phosphorylation was the same when comparing uncrosslinked and crosslinked CaMKII variants, indicating that phosphorylation is occurring between holoenzymes, rather than through a subunit exchange mechanism that would require some type of disassembly and reassembly (presumably blocked by crosslinking). The 3rd question remains as to why this has not been previously observed, as it has not been for lack of effort. The authors mention low temperature and low concentration as culprits, however, Bradshaw et al, JBC v. 277, 2002 carry out a series of careful experiments that indicated that autophosphorylation at T286 is not concentration-dependent (meaning that the majority of phosphorylation occurs via intra-holoenzyme), and this is done over a concentration and temperature range. It is possible that due to the mutants used in the current manuscript, it allows for the different behavior of the kinase-dead domains, which will have an empty nucleotide-binding pocket. Further studies will need to elucidate these details, and importantly, understand what physiological conditions facilitate this mechanism.

      We thank the reviewer for their assessment of our work.

      The paper cited by the reviewer (Bradshaw et al, JBC v. 277, 2002) is indeed a carefully designed biochemical investigation of CaMKII activity. As the reviewer pointed out, one of the conclusions of the paper is that the autophosphorylation of CaMKII is not concentration dependent, implying that it has to occur exclusively intra-holoenzyme. However, there are some limitations which colour the interpretation of this classic paper. Bradshaw and colleagues used only CaMKII wild-type protein, so the autophosphorylation which is taking place in their reactions is possible both within holoenzymes and between holoenzymes, but this is impossible to distinguish. The authors of the cited paper then used “Autonomous activity assay” (not any measurement of pT286 on CaMKII itself) in which they first stopped the initial autophosphorylation reaction at T286 by adding a quench solution which contained a mixture of EDTA and EGTA, and then measured phosphorylation of the peptide-substrate of CaMKII (autocamtide-2), in the absence of Calmodulin binding (autonomous activity). They also diluted the autophosphorylation reaction to 10 nM CaMKII before adding it to the “Autonomous activity assay”.

      As a side point, each reaction was quenched and diluted to the same final CaMKII concentration of 10 nM. They measured the activity of this dilution with phosphorylation of a peptide-substrate (autocamptide-2), in the absence of CaM binding. The authors contend that autonomous activity reported in this way reflects the amount of pT286, which is not impossible, but it is not a direct measure of pT286.

      All this adds up to allowing the autophosphorylation of wild-type CaMKII at various concentrations ranging from 0.1 to 4.6 µM in the presence of 10 µM Ca/CaM and 500 µM Mg/ATP. This is a very fast reaction, concentrations of enzyme (CaMKII wild-type), activator (Ca/CaM) and ATP/Mg are all high at the beginning of the autophosphorylation reaction and would expect to allow for maximal autophosphorylation in very short times (seconds). Most importantly, this experiment does not exclude a inter-holoenzyme reaction slower than the intra-holoenzyme one. It certainly could not detect it.

      In any case, to relate these concepts to our experiments and current understanding of CaMKII, we performed a new set of experiments modelled on the Bradshaw paper. Critically, we used CaMKII wild-type as the enzyme, and CaMKII kinase-dead, as the substrate. Intraholoenzyme phosphorylation cannot occur in this reaction, which was designed to detect a concentration-dependent phosphorylation reaction. We used a fixed concentration of the substrate kinase (4 µM), and 4 different concentrations of CaMKIIWT ranging from 0.5 -100 nM. In our assay, the level of phosphorylation on substrate CaMKII(CaMKIIKD) was dependent on concentration of enzyme CaMKII (CaMKIIWT) (Figure 1-figure supplement 3), adding more evidence to the hypothesis that CaMKII autophosphorylation can occur inter-holoenzyme.

      The possibility that empty nucleotide binding pocket is influencing the phosphorylation status of T286 in the regulatory domain of kinase-dead CaMKII is highly unlikely. One could maybe envision that empty nucleotide binding pocket might expose the regulatory domain in kinase-dead CaMKII for phosphorylation, which would be prevented in CaMKIIWT, but in all available structures of CaMKII (Chao et al, 2011; Myers et al., 2017, Buonarati et al., 2021), the regulatory domain is docked to the kinase domain of CaMKII, although the nucleotide binding pocket is empty (either by mutation of residue K42 and/or simply by not adding the ATP/Mg to reduce chemical dispersity of the sample). The only time the regulatory domain was not docked on the kinase domain is when CaMKII was in complex with Calmodulin (Rellos et al., 2010). Finally, in our crosslinking mass spectrometry experiments, we used both heavy and light forms of CaMKII wild-type, and there we can clearly see interactions between kinase/regulatory domains of two different species of CaMKIIWT, which are dependent on activation.

      The most convincing data that subunit exchange does not occur is from the crosslinking mass spectrometry experiment. The authors created mixtures of 'light' and 'heavy' CaMKII holoenzymes, either activated or not and then used a Lys-Lys crosslinker (DSS) to trap the enzyme in its final state. The results of this experiment indicate that subunit exchange is not occurring under their conditions. A caveat here is that there are not many lysines at hub-hub interfaces, which is the crux of this experiment. If there is no subunit exchange under their conditions, how does transphosphorylation occur between holoenzymes? The authors show very nice mass photometry data indicating that there are populations of 24-mers, which corresponds to a double-holoenzyme. Paired with the data from their crosslinking mass spectrometry which shows crosslinks between kinase domains of different holoenzymes, this indicates that perhaps kinases between holoenzymes do interact, and they do so in a competent manner to allow transphosphorylation to occur.

      It is true that there are “only” 6 Lysines in the hub domain of CaMKII. However, it is clear from our crosslinking mass spectrometry data that we can detect hub:hub peptides coming from the same holoenzymes (homocrosslinks, either 14N: 14N or 15N: 15N species), but never between holoenzymes (14N with 15N). The fact that peptides can be detected in the homocrosslinks speaks to the validity of using Lysine crosslinkers in this experiment.

      Weaknesses:

      The authors should be commended for performing three orthogonal experiments to test whether CaMKII holoenzymes exchange subunits to form heterooligomers. However, there are technical issues that dampen the strength of the results shown here. For simplicity, let's consider that CaMKII holoenzymes are comprised of two stacked hexameric rings. It has been proposed that the stable unit of CaMKII assembly and perhaps also disassembly and subunit exchange is a vertical dimer unit (comprised of one subunit from each hexameric ring). In the UV crosslinking data shown in this paper, the authors have a significant number of monomers, some crosslinked dimers (of which there are two populations), and fewer higher-order oligomers. To effectively block subunit exchange, robust crosslinking into hexamers is necessary, which the authors have not done. Incomplete crosslinking results in smaller species that can still exchange (and/or dissociate), confounding the results of this experiment. In addition, Figure 3 shows a trapping experiment, where if the exchange was occurring, there would be an oligomeric band in Lane 8, which is visible and highlighted with a blue arrow by the authors. This result is explained by nonspecific UV effects, however by eye it is not clear if there is an equivalent band in lane 10. The overall issue here is inefficient crosslinking.

      We agree with the reviewer that the robustness of the UV-induced crosslinking is not extremely high. However we do observe higher order oligomers on the gel (Figure 2 and Figure 3B, pT286 blot), which states that at least a portion of the holoenzymes is crosslinked. On the other hand, the UVinduced crosslinking is not slowing down the trans-phosphorylation reaction, which would be expected if the subunit exchange would be the prevailing mechanism for spread of kinase activity between holoenzymes.

      In figure 3, lanes 8 and 10 show a small portion of dimers (less than 5% by densitometry), and at the absolute limit of detection. This dimer band is most likely due to unspecific UV-induced disulfide bridging (we already lessened it by adding 50 mM TCEP prior to UV treatment (Figure 3-figure supplement 1B and C). Previous reviewers of this manuscript criticized the small dimer band in lane 8, and we wanted to address this transparently in the submission to eLife.

      Unfortunately, if we absolutely crank up the contrast to see this band in lane 10, we start to see other features in the noise as well. We have now edited the image in Figure 3B to highlight these minor bands more clearly, but this is also not ideal.

      With regard to the trapping experiment, the overall problem is not inefficient crosslinking, because we see that P-T286 signal is quite nicely represented in higher order bands from F394BzF protein, but kinase dead protein (Avi-tagged signal in Figure 3) is almost entirely absent. Any crosslinking of Avitagged protein (possibly corresponding to subunit exchange) is a minor process at the limit of detection on WB.

      Unfortunately we did not yet find any better crosslinking sites than the two we report (we have tried about 10). But the results we did obtain encouraged us to employ other techniques to probe subunit exchange (for example, the MS X-linking).

      The authors also employ a single-molecule TIRF experiment to further interrogate subunit exchange. Upon inspection of the TIRF images, it is not clear that the authors are achieving single molecule resolution (there are evident overlapping and distorted particles). The analysis employed here is Pearson's correlation coefficient, which is not sufficient for single molecule analysis and would not account for particle overlap, particles that are too bright, and/or particles that are too dim. For example, an alternative explanation for the authors' results is that activation results in aggregation (high correlation), and subsequent EGTA treatment leads to dissociation at these low concentrations (low correlation). However, further experimentation and analysis are necessary.

      In the manuscript we present raw images, not processed. As we wrote in the material and methods, we thresholded the images for further processing. All colocalization methods have drawbacks, but we found that our thresholding combined with the Pearson coefficient was highly reproducible. We did also look at Manders coefficients, but these are less straightforward to understand, whilst still giving in our hands the same answer. We agree, there are more experiments that can be done, with particular predictions based on our new mechanism. And we are doing them and will report them when they are ready.

      At the risk of repeating ourselves, the reversible loss of overlap of the two labelled populations is the key result and cannot be explained by spurious dim or bright particles, or by a few overlapping profiles.

      Taken together, the authors have provided important food for thought regarding inter-holoenzyme phosphorylation and subunit exchange. However, given the shortcomings discussed here, it remains unclear exactly what mechanisms are at play within and between CaMKII holoenzymes once activated.

      We thank the reviewer for their critical assessment of our manuscript. We will continue to investigate the relevant points and refine the overall picture of CaMKII, to better clarify the mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Sučević and Schapiro investigated a neurobiologically inspired model of human hippocampal structure and computation in category learning. In three separate simulations, the model (CHORSE) is presented with learning tasks defined by various category structures from prior work and evaluated for its ability to learn the category structure, generalize categorization to novel stimuli, and accurately recognize previously encountered stimuli. Although originally conceived of as a computational model of associative memory, C-HORSE is demonstrated to quite naturally account for human-like learning of the three category tasks. Notably, the authors characterize the mechanisms underlying the model's learning by way of additional simulations in which "lesions" to the model's monosynaptic pathway (MSP; direct connections between ERC and CA1) are contrasted with lesions to its trisynaptic pathway (TSP; pathway connecting ERCDG-CA3-CA1). These in silico lesions offer key insight into the computational principles underlying theorized hippocampal functions in category learning: whereas MSP provides incremental learning of shared features diagnostic to category membership that are important for category generalization, TSP learns item-specific information that drives recognition behaviour. The authors propose that C-HORSE's successful account of a broad set of category learning datasets provides clear support for the role of complementary hippocampal functions mediated by MSP and TSP in category learning. This work adds compelling computational evidence to a growing literature linking hippocampus to a broader role in cognition that extends beyond declarative memory.

      The model simulations are clear and properly conducted. The three datasets examined offer a relatively broad set of findings from the category learning literature; that the models provide reasonable accounts of human performance in all three speaks to the model's generalizability. Overall, I find this work exciting and an important step in linking longstanding well-established formal learning theories of psychology with neurobiological mechanism. Several weaknesses dampen this excitement, each of which are detailed below:

      1) C-HORSE is presented as a new entry into a rich field of formal computational models of category learning. As noted above, the datasets examined span a broad range of learning contexts and structures and the model's ability to account for learning behaviour is compelling. However, no other models are leveraged to perform a direct evaluation. In other words, CHORSE's predictions are compelling, but is it better than other competing models in the literature? To be clear, C-HORSE offers a novel alternative with its fundamental mechanisms originating from anatomical structure and connectivity. As such, a proof-of-concept showing that such a neurobiologically inspired framework can account for category learning behaviour is a worthwhile contribution in its own right and a clear strength of this paper. However, how to consider this model relative to existing theoretical frameworks is not well described in the manuscript.

      We very much appreciate this point — see response to Editor summary point #3 above.

      2) Relatedly, C-HORSE is evaluated in terms of qualitative fit to behaviour measures from prior studies and in all three simulations restricted to measure of end of learning performance. Again, an appeal to the proof-of-concept nature of the current work may provide an appropriate context for this paper. But, a hallmark of well-established category learning models (e.g., SUSTAIN, DIVA, EBRW, SEA, etc.) is their ability to account for both end of learning generalization (and in some cases, recognition) and behaviour throughout the learning process. C-HORSE does provide predictions of how learning unfolds over time, but how well this compares to human measures is not considered in the current manuscript. Such comparisons would strengthen the support for C-HORSE as a viable model of category learning and help position it in the busy field of related formal models.

      We completely agree about the value of this, and we have added empirical timecourse data for comparison with all simulations, as described in response to Editor summary point #7, above.

      3) A consistent finding across all three simulations is that the TSP provides item-specific encoding. Evidence for this can be inferred by contrasting categorization and recognition performance across the TSP- and MSP-only model variants. In the discussion, the authors draw a parallel between exemplar theories of category learning and the TSP, which is a compelling theoretical position. However, as noted by the authors, unlike exemplar theories, the TSP-only model was notably impaired at categorization. The author's suggestions for extensions to CHORSE that would enable better TSP-based categorization are interesting. But, I think it would be helpful to understand something about the nature of the representations being formed in the TSP-only model. For example, are they truly item-specific, are the shared category features simply lost to heightened encoding of item-unique features, are category members organized similarly to the intact model just with more variability, and so on. Characterizing the nature of these representations to understand the limitations of the TSP-only model seems important to understanding the representational dynamics of C-HORSE, but are not included in the current manuscript.

      The RSA results, now included for Simulations 2 and 3 in addition to Simulation 1, provide the information needed to characterize the nature of the TSP representations. Generally speaking, they are truly item specific, meaning that each item is represented by its own distinct set of units. This is a demonstration of the classic pattern separation function of this pathway, taking similar inputs and projecting them to orthogonal populations of neurons. Simulation 1 is the clearest example of this, where there is virtually no similarity and very low variability in the item similarity structure in DG and CA3. The new Simulation 3 RSA shows us where the limit is to this pattern separation ability of the TSP, with highly typical items being represented by somewhat overlapping populations of neurons in DG and CA3. To the extent that the TSP can succeed in generalization, it seems to involve this pattern separation failure.

      We have made these points more explicit in new discussion of the RSA results:

      • Simulation 1: “In the initial response, there was no sensitivity at all to category structure in DG and CA3 — items were represented with distinct sets of units. This is a demonstration of the classic pattern separation function of the TSP, applied to this domain of category learning, where it is able to take overlapping inputs and project them to separate populations of units in DG and CA3.” • Simulation 3: “As in the prior simulations, DG and CA3 represented the items more distinctly than CA1, and settled activity after big-loop recurrence increased similarity, especially in CA1. This simulation was unique, however, in that DG and CA3 showed clear similarity structure for the prototype and highly prototypical items. There is a limit to the pattern separation abilities of the TSP, and these highly similar items exceeded that limit. This explains why, at high typicality levels, the TSP could be quite successful on its own in generalization (Figure 5e), and why it struggled with atypical feature recognition for these items (Figure 5f).”

      4) In general, a detailed description that links model mechanisms and analyses to the learning constructs of interest for the different simulations is lacking. For example, RSA results for simulation 1 are contrasted for initial and settled representations, but what is meaningful about these two timepoints is not directly stated (moreover, what initial and settled response mean in terms of the current model is not explained). The authors do briefly suggest that differences between initial and settled representations may reflect encoding dynamics before and after bigloop recurrence, but this is not established as a key metric for evaluating the nature of the model representations. In general, more motivation is needed to understand what the chosen analyses reveal about the nature of the model's learning process and representations.

      We have added more description of the motivation for our analyses. See response to Editor summary point #6 above.

      5) I appreciate the comparison in the discussion to extant models of categorization. Certainly, the exemplar and prototype models are fixtures of the category learning literature and they somewhat align with the type of learning that TSP and MSP, respectively, provide. REMERGE and SUSTAIN are also briefly mentioned, but their discussion is limited which is unfortunate as they are actually more functionally equivalent to C-HORSE. I think, however, that the authors are missing an opportunity to discuss how C-HORSE offers a means for bridging levels of analysis to connect neurobiological mechanisms with these notably successful psychological models of category learning. Rather than framing C-HORSE as a competitor to existing models, it should be viewed as an account existing on a different level of analysis. In this sense, it complements existing approaches and potentially extends a theoretical olive branch between the psychology and neuroscience of category learning.

      We love this point about bridging levels of analysis and have added it to our discussion of the model’s relationship to other models, see Editor summary point #3 above.

      6) The discussion takes a broad perspective on covering evidence concerning hippocampal contributions to category learning. Although comprehensive, some sections are not well connected back to the main thrust of the paper. For example, a section on neuropsychological accounts of the hippocampus and category learning summarizes central aspects of this literature but is never reflected on through the lens of the current findings. I do think this prior work is relevant, especially since it a central theme of the hippocampus not being necessary for category/concept learning, but its connection back to the current study is not well argued. Similarly, the section on consolidation and sleep is relevant, but in its current form does not seem to fit with the rest of the paper.

      We have implemented these suggestions through very significant revisions to the Discussion. We now better connect the sections to the main argument of the paper and made cuts throughout, including removing the section on consolidation and sleep.

      Reviewer #2 (Public Review):

      The authors present a model of the hippocampal region that incorporates both the (indirect) trisynaptic and (direct) mono-synaptic pathways from entorhinal cortex (EC) to CA1 - the former incorporating projections from EC to dentate gyrus (DG), DG to CA3, and CA3 to CA1, and exhibiting a higher learning rate. They demonstrate that exposing this network to stimuli consistent with standard empirical tests of category learning (e.g. where within-category exemplars share a set of common features) allows the network to reliably assign both novel and previously encountered stimuli to the correct category (e.g. the network can learn to classify stimuli and generalise this knowledge to new examples). They show that the tri-synaptic pathway (TSP) preferentially supports the encoding of individual exemplars (e.g. analogous to episodic memory) while the mono-synaptic pathway (MSP) preferentially supports category learning.

      The manuscript is well written, the simulation details appear sound, and the results are clearly and accurately presented. This model builds on a long tradition of computational modelling of hippocampal contributions to human memory function, strongly grounded in anatomical and electrophysiology data from both rodents and humans, and is therefore able to link phenomena at the level of individual cells and circuits to emergent behaviour - a major strength of this, and similar, work. However, I have two major concerns relating to the relationship between these findings and previously published work by the same and other authors.

      First, it is not clear to me - from the manuscript - whether these results represent a significant novel advance on previous publications from the same senior author. Figures 1 and 3D are almost identical to figures published in Schapiro et al. (2017) Phil Trans B, and the take-home message (that the MSP might support statistical learning) is the same. In brief, it seems that the authors have subjected an identical network to some new (but related) tasks and reached the same set of conclusions. I see no distinction between learning to extract 'statistical regularities' (in previous work) and learning 'the structure of new categories' (described here). As an aside, demonstrating that an autoencoder network can learn stimulus categories and generalise to new exemplars is also well established.

      We appreciate the opportunity to better articulate the novelty and importance of applying the model to the domain of category learning. There are crucial differences between statistical learning and category learning that make these simulations nontrivial (it did not have to be the case that the results would replicate for these category learning paradigms), and, importantly, many of the insights in the current work are category-learning specific (e.g., the effects of atypical features, trade-offs between generalization and recognition of exemplar-specific features). On the other hand, we of course agree that there are principles in common between statistical learning and category learning that are leading to the consistent findings. We added new material to the Introduction to explain the importance of these new simulations in the domain of category learning, and the value we see in demonstrating convergence across domains. See response to Editor point #1 above.

      Second, I have some concerns with the relationship between the properties of this hippocampal network model and well described properties of single cells in the rodent and human hippocampus. In particular, the CA1 units in this model (and to some extent, also the CA3 units) come to respond strongly to all exemplars from within each category (e.g. as shown in Figure 3D, bottom right panel). This appears to be at odds with the known properties of place and concept cells from the rodent and human hippocampus, respectively, which show little generalisation across related concepts (i.e. the Jennifer Aniston neuron does not fire in response to other actors from Friends, for example). If the emergent properties of this model are not consistent with existing data, then it is not a valid model.

      We appreciate the opportunity to discuss connections to the physiology literature. See response to Editor summary point #2 above.

      More generally, the authors are clear that this model is "a microcosm of [the] hippocampusneocortex relationship" and that the properties of the MSP "mirror those of neocortex". Why not assume that category learning is supported by an interaction between hippocampus and neocortex, then, as in the complementary learning systems (CLS) model? Aside from some correlational fMRI data and partial deficits in hippocampal amnesics - either of which could have a myriad of different explanations - what empirical data is better accounted for by this model than CLS? Put differently, what grounds are there for rejecting the CLS model? To some extent, this model appears to account for less empirical data than CLS, with the exception of a few recent neuroimaging studies (which are hard to interpret at the level of single cells)

      This is an important point for us to clarify, so we very much appreciate this comment. The crucial issue with CLS that motivated the microcosm theory is that the neocortex in the CLS framework learns far too slowly to support the kind of category learning studied in these paradigms, which unfolds over the course of minutes or hours. The neocortex in CLS was proposed to learn novel structure across days, months, and years.

      We have added the following to the Introduction:

      • “Despite its analogous properties, the MSP is not redundant with neocortex in this framework: the MSP allows rapid structure learning, on the timescale of minutes to hours, whereas the neocortex learns more slowly, across days, months, and years. The learning rate in the MSP is intermediate between the TSP (which operates as rapidly as one shot) and neocortex. The proposal is thus that the MSP is crucial to the extent that structure must be learned rapidly.”

      We also have this description in the Discussion:

      • “The MSP in our model has properties similar to the neocortex in that framework, with relatively more overlapping representations and a relatively slower learning rate, allowing it to behave as a miniature semantic memory system. The TSP and MSP in our model are thus a microcosm of the broader Complementary Learning Systems dynamic, with the MSP playing the role of a rapid learner of novel semantics, relative to the slower learning of neocortex.”

      Reviewer #3 (Public Review):

      The current work aimed to determine how the hippocampus may be able to detect regularities across experiences and how such a mechanism may serve to support category learning and generalization. Rapid learning in the hippocampus is critical for episodic memory and encoding of individual episodes. However, the rapid binding of arbitrary associations and one-shot learning was long thought suboptimal for finding regularities across experiences to support generalization, which were instead ascribed to other, slower-learning memory systems. More recent work has started to highlight hippocampal role in generalization, renewing the question of how generalization can be accomplished alongside memory for episodic details within a single memory structure. The current paper offers a reconciliation, presenting a biologically-inspired model of the hippocampus that is able to learn categories alongside stimulus-specific information comparably to human performance. The results convincingly demonstrate how distinct pathways within the hippocampus may differentially serve these complementary memory functions, enabling the single structure to support both episodic memory and categorization.

      Major strengths and contributions

      The paper includes simulation of three distinct categorization tasks, with a clear explanation of the unique aspects of each task. The key results are consistent across tasks, lending further support to the main conclusions of the role of distinct hippocampal pathways in learning specific details vs. regularities. Together with prior work on how the same architecture can support statistical learning in other types of tasks, this work provides important evidence of the broad role of the hippocampus in rapid integration of related information to serve many forms of cognition.

      Throughout the paper, the authors nicely explain in conceptual terms how the same underlying computations may serve all three categorization tasks as well as statistical learning and episodic inference tasks. Thus, the paper will be of broad interest, beyond researchers focused on modeling and/or categorization.

      On a conceptual level, this work provides a fruitful framework for understanding hippocampal functions, representations and computations. It provides a highly plausible mechanistic explanation of how category learning and generalization can be accomplished in the hippocampus and how distinct types of representations may emerge in distinct hippocampal subfields. The framework can be used to derive new testable predictions, some of which the authors themselves introduced. It also provides new insights into how the outputs of different pathways influence each other, providing a more nuanced view of the division of labor and interactions between hippocampal subfields. For example, the big loop recurrence would eventually lead to category influences even on the initially sparse, pattern separated representations in the CA3, which is an idea consistent with empirical observations.

      The presented computational model of the hippocampus is currently the most detailed and biologically plausible hippocampal model easily applicable in the area of cognitive neuroscience and behavioral simulations. The commonalities and differences with other related models (conceptual and computational) are well explained. Both the conceptual and technical descriptions of the model are exceptionally clear and detailed. The model is also publicly available for download for any researcher to use with their own task and data. All these aspects make it likely that other researchers may adopt the model in a wider range of tasks, stimulating new discoveries.

      The autoencoder nature of the model and the use of categorization tasks meant that some measures of interest, like recognition of exemplar-specific information, could not be evaluated by direct reading of the output layer to compare with some label (like old/new). The authors however came up with clever ways how to evaluate recognition performance in each task that was sensible and highlighted the multiple ways how one may think about information contained in neural representations in each layer. This approach can also be utilized by others for evaluating item-specific and category information in activation patterns, for example in analyses of fMRI.

      Finally, I thought the current paper and provided model may also serve as an excellent introduction to computational modeling for those new to this approach. The exceptional clarity of the conceptual and technical description of this model and the clear logic of how one may model a cognitive task and interpret results made this paper fairly accessible. Furthermore, the paper offered new insights and predictions based on analyzing the model's hidden layers, lesion performance, and/or noting some patterns of behavior unique to specific tasks. This was also instructive for highlighting the distinctive contributions that the computational modeling approach can have for furthering our understanding of cognition and the brain.

      We are extremely appreciative of the value the Reviewer sees in this work.

      Weaknesses

      The paper's strengths far outnumbered the weaknesses, that are minor. For one, the selected categorization tasks nicely complemented each other, but only covered stimuli with discretevalue dimensions (features like color, shape, symbol, etc). The degree to which the results generalize (or not) to continuous-value stimuli and different category structures (for instance information-integration or rule-based in COVIS framework) is not clear. How the model could be adjusted for continuous-value stimuli was not specified.

      We agree that the simulation of only discrete valued dimensions is a limitation. We chose to do this simply because it is easier to use discrete values in the model as currently implemented, but future work will certainly need to test whether the model can simulate the various paradigms that make use of continuous-valued dimensions. We have added an explicit acknowledgement of this issue in the Methods:

      • “The inhibition simulates the action of inhibitory interneurons and is implemented using a set-point inhibitory current with k-winner-take-all dynamics (O’Reilly, Munakata, Frank, Hazy, & Contributors, 2014). All simulations involved tasks with discrete-valued dimensions, as these are more easily amenable to implementation across input/output units whose activity tends to become binarized as a result of these inhibition dynamics. It will be important for future work to extend to implementations of category learning tasks with continuous-valued dimensions.”

      There is compelling evidence for the dissociation between different hippocampal pathways and subfields (CA1 vs. CA3) that the model is based on. As the authors noted, there is also compelling evidence for functional dissociations along the long hippocampal axis, with anterior portions more geared towards coarse, generalized representations while posterior towards more detailed, specific representations. The authors nicely pointed out that these proposals of withinhippocampus division of labor are less orthogonal than they may first appear, as there is greater proportion of CA1 in the anterior hippocampus. However, it is premature to imply that this resolves the CA1/CA3 vs. anterior/posterior question; the idea that existing anterior findings may be simply CA1 findings is currently only speculation. Furthermore, first studies indicating that anterior/posterior representational gradients may exist within each subfield are beginning to emerge.

      We completely agree that this is speculative at this point, which needed acknowledgment. See response to Editor summary point #2 above.

    1. Under Our Team can Maternity Leave be replaced with just "on leave" -- i don't think it's necessary to share why they are on leave. May as well add Jazz Cook as well. Also, should we not include our pronouns here?

    1. ourses and programs within the “academic” curriculum emphasize subject-matter knowledge and the development of broadly applicable skills—think history, science, language studies, etc.

      Trades are academic programs. In Hairstyling alone we learn history - how has the trade evolved? Which cultures developed certain styles and why? Science - Formulating colours is chemistry, Mixing disinfectants is math and science! Technical Terminology is language. My trade may not identify as one single category but it includes several dimensions of learning. It offers students the opportunity to indulge in a variety of aspects and perhaps thats why it has become increasingly interesting to students who possess multiple intelligences.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> In this study, the authors generate a Drosophila model to assess disease-linked allelic variants in the UBA5 gene. In humans, variants in UBA5 have been associated with DEE44, characterized by developmental delay, seizures, and encephalopathy. Here, the authors set out to characterize the relationship between 12 disease-linked variants in UBA5 using a variety of assays in their Drosophila Uba5 model. They first show that human UBA5 can substitute all essential functions of the Drosophila Uba5 ortholog, and then assess phenotypes in flies expressing the various disease variants. Using these assays, the authors classify the alleles into mild, intermediate, and severe loss-of-function alleles. Further, the authors establish several important in vitro assays to determine the impacts of the disease alleles on Uba5 stability and function. Together, they find a relatively close correlation between in vivo and in vitro relationships between Uba5 alleles and establish a new Drosophila model to probe the etiology of Uba5-related disorders.

      Strengths:<br /> Overall, this is a convincing and well-executed study. There is clearly a need to assess disease-associated allelic variants to better understand human disorders, particularly for rare diseases, and this humanized fly model of Uba5 is a powerful system to rapidly evaluate variants and relationships to various phenotypes. The manuscript is well written, and the experiments are appropriately controlled.

      Reviewer #2 (Public Review):

      Relative simplicity and genetic accessibility of the fly brain make it a premier model system for studying the function of genes linked to various diseases in humans. Here, Pan et al. show that human UBA5, whose mutations cause developmental and epileptic encephalopathy, can functionally replace the fly homolog Uba5. The authors then systematically express in flies the different versions of the gene carrying clinically relevant SNPs and perform extensive phenotypic characterization such as survival rate, developmental timing, lifespan, locomotor and seizure activity, as well as in vitro biochemical characterization (stability, ATP binding, UFM-1 activation) of the corresponding recombinant proteins. The biochemical effects are well predicted by (or at least consistent with) the location of affected amino acids in the previously described Uba5 protein structure. Most strikingly, the severity of biochemical defects appears to closely track the severity of phenotypic defects observed in vivo in flies. While the paper does not provide many novel insights into the function of Uba5, it convincingly establishes the fly nervous system as a powerful model for future mechanistic studies.

      One potential limitation is the design of the expression system in this work. Even though the authors state that "human cDNA is expressed under the control of the endogenous Uba5 enhancer and promoter", it is in fact the Gal4 gene that is expressed from the endogenous locus, meaning that the cDNA expression level would inevitably be amplified in comparison. The fact that different effects were observed when some experiments were performed at different temperatures (18 vs. 25) is also consistent with this. While I do not think this caveat weakens the conclusions of this paper, it may impact the interpretation of future experiments that use these tools, and thus should be clearly discussed in the paper. Especially considering the authors argue that most disease variants of UBA5 are partial loss-of-functions, the amplification effect could potentially mask the phenotypes of milder hypomorphic alleles. If the authors could also show that the T2A-Gal4 expression pattern in the brain matches well with that of endogenous RNA or protein (e.g. using HCR-FISH or antibody), it would help to alleviate this concern.

      We thank the reviewer for pointing out this limitation.

      Regarding the humanization strategy we used in the study, we agree that this is a binary system which may lead to overexpression of the target protein. However, as the

      reviewer also points out, this temperature-sensitive system also enables us to flexibly adjust the expression level of the target protein, which is especially useful to study

      partial LoF variants such as the UBA5 variants in this study. In our study we have successfully compared the relevant allelic strength of most of the variants, which

      supports the use of our system in future studies. However, we do agree that the gene dosage effect could vary widely, so it is difficult to directly predict the effects of one variant in humans based upon results obtained in a model organism.

      We agree with the reviewer that a masking effect may exist in our system due to its gene overexpression nature. However, we cannot conclude that this masking effect

      really affects the interpretation of Group IA variants in our tests. The three variants are mild LoF, which is also supported by the biochemical assays. Hence, the variants may not cause any phenotype even when they are expressed at a physiological level.

      Regarding the temporal and spatial expression pattern of the T2A-GAL4, the Bellen lab has generated T2A-GAL4 lines for more than 3,000 genes. The expression pattern of the vast majority of these GAL4 lines faithfully reflects the expression pattern of the endogenous genes, which has been documented in our previous publications (PMIDs 25824290, 29565247, 31674908, 35723254).

      Reviewer #3 (Public Review):

      Summary:<br /> Variants in the UBA5 gene are associated with rare developmental and epileptic encephalopathy, DEE44. This research developed a system to assess in vivo and in vitro genotype-phenotype relationships between UBA5 allele series by humanized UBA5 fly models and biochemical activity assays. This study provides a basis for evaluating current and future individuals afflicted with this rare disease.

      Strengths:<br /> The authors developed a method to measure the enzymatic reaction activity of UBA5 mutants over time by applying the UbiReal method, which can monitor each reaction step of ubiquitination in real time using fluorescence polarization. They also classified fruit fly carrying humanized UBA5 variants into groups based on phenotype. They found a correlation between biochemical UBA5 activity and phenotype severity.

      Weaknesses:<br /> In the case of human DEE44, compound heterozygotes with both loss-of-function and hypomorphic forms (e.g., p.Ala371Thr, p.Asp389Gly, p.Asp389Tyr) may cause disease states. The presented models have failed to evaluate such cases.

      We agree with the reviewer that our model did not reflect the situation of the individuals who are compound heterozygous for a Group IA variant (p.Ala371Thr, p.Asp389Gly, or p.Asp389Tyr) and a strong LoF variant. However, we argue that our results do show that the Group IA variants alone do not cause disease. As discussed in the manuscript, individuals homozygous for the p.Ala371Thr variant are healthy and do not present with obvious phenotype. This is consistent with our findings in flies, and shows that the p.Ala371Thr variant is a mild LoF variant.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the thoughtful suggestions made by the Reviewers. We have addressed all of their comments below, with our responses bulleted and in italics. We believe these changes have helped clarify the manuscript and strengthen it overall.

      Reviewer 1

      1) Figures 1B and Supp. Figure 1A: It would be worth mentioning that the wave-form in the 129 strain in response to QLA starts out like AJ and B6, but transitions to looking like the wild-derived strain. So, although not quite as drastic as the NZO and NOD strains, it is not quite like the other classical inbred strains.

      • We thank the reviewer for pointing this out. We have added further language to clarify the point:

      “Additionally, even with the clear separation between the clusters, inter-strain variation was still observed within the clusters (e.g. more 129 islets had plateau responses to 8G/QLA than the B6 or AJ).”

      2) The figures are generally excellent and really help to clarify the work in the paper. For Figure 2A, it would help even further if you could number the six different Ca++ parameters that are measured. They're all there, but it takes a bit of time to find them on the figure and numbering will make it easier on your reader.

      • We appreciate this suggestion and have implemented it in our revised Figure 2A. The Ca2+ parameters are now numbered, and the description of this figure has been adjusted accordingly in the results section.

      We added the revised text in the results section:

      “To elucidate strain differences in Ca2+ dynamics, we focused on six parameters of the Ca2+ waveform (Figure 2A): 1) peak Ca2+ (the top of each oscillation); 2) period (the length of time between two peaks); 3) active duration (the length of time for each Ca2+ oscillation measured at half of the peak height, also known the oxidative “secretory” phase, or “MitoOx” (8); 4) pulse duration (active duration plus extra time for Ca2+ extrusion); 5) silent duration (the electrically-silent “triggering” phase, also known as “MitoCat” (8), which culminates in KATP closure and membrane depolarization); and 6) plateau fraction (the active duration divided by the period, or the fraction of time spent in the active “secretory” phase).”

      3) Figure 4A, B: I was expecting to see Ca++ vs insulin parameters in the different strains/sexes. In addition to the heat maps, it would be useful to see the regression plots, showing where each strain and sex falls for the insulin and Ca++ parameters.

      • This is an excellent suggestion, and we have added a new Supplemental Figure 5 to provide examples of various strain/sex patterns that drive the correlations used for the heatmap and histogram in Figure 4A and B.

      We added text in the results section referring to this point:

      “Clustering the Ca2+ responses into distinct groups based on our observations of the waveforms (Figure 1B, Figure 4C-E, and Supplemental Figures 1 and 2) also occurs when correlating individual Ca2+ parameters to ex vivo secretion and clinical data (Supplemental Figure 5). For example, the anticorrelation between the 1st frequency component in 8G and percent insulin secreted in 8.3G/QLA (Supplemental Figure 5A) separates the classic inbred, wild-derived, and diabetes-susceptible strains into distinct groups despite the variability in the trait. Correlation between the silent duration in 8G/QLA to insulin secretion in 8.3G/QLA, likewise groups by strain (Supplemental Figure 5B). Finally, some correlations, such as that between 8G/QLA/GIP silent duration and plasma insulin at sacrifice (Supplemental Figure 5C), can be strongly influenced by outlier strains; e.g., NZO. Collectively, these data demonstrate that genetics has a profound influence on key parameters of islet Ca2+ oscillations.”

      4) Please include methods for the insulin measurements collected in Fig. 4.

      • Thank you for pointing out this missing information. We have clarified that prior insulin measurements (plasma insulin and ex vivo static insulin secretion that were used in Figure 4 for correlation analysis) were completed in another previously published cohort of mice (reference 17: Mitok KA, Freiberger EC, Schueler KL, Rabaglia ME, Stapleton DS, Kwiecien NW, et al. Islet proteomics reveals genetic variation in dopamine production resulting in altered insulin secretion. The Journal of biological chemistry. 2018;293(16):5860-77).

      We added this new text (highlighted) to the results section to help clarify this point:

      “Fasting blood glucose and insulin levels were measured in mice at 19 weeks of age, except for the NZO males which were measured at 12 weeks of age. Glucose was analyzed by the glucose oxidase method using a commercially available kit (TR15221, Thermo Fisher Scientific), and insulin was measured by radioimmunoassay (RIA; SRI13K, Millipore). This is the same assay that was used to measure plasma insulin for the previously published cohort used for the correlation analysis in Figure 4 (17).”

      5) In the methods, please include details on the four conditions used for Ca++ imaging of the islets, and the timing for each condition.

      • We appreciate this guidance in clarifying our manuscript, and we have now included the conditions and timing for each condition in the methods section.

      We added the following text to the results section to help clarify this:

      “The solutions included 8 mM glucose (8G), 8 mM glucose + 2 mM glutamine, 0.5 mM leucine, and 1.5 mM alanine (8G/QLA), 8G/QLA + 10 nM glucose-dependent insulinotropic polypeptide (8G/QLA/GIP), and 2 mM glucose (2G), each of which were kept in a 37°C water bath.”

      Reviewer 2

      One major critique is that the authors studied "the human orthologues of the correlated mouse proteins that are proximal to the glycemia-associated SNPs in human GWAS". This implies two assumptions - (1) human and mouse proteins do not differ in terms of islet physiology and calcium signaling; (2) the proteins proximal to the SNPs are the causal factors for functional differences, though the SNPs could affect protein/gene function distant from the SNPs.

      • Thank you very much for highlighting this limitation in our study. We think this is very important to address which we have done in our discussion section.

      We have added the following text to discuss this important issue:

      “Our approach to merge human GWAS with our findings in mouse assumes that the glycemic-related SNPs we nominated alter the abundance or function of the human orthologues. Most SNPs that are strongly associated with phenotypes in human GWAS are noncoding, residing within introns, promoters, 3’UTRs, or intergenic regions (e.g. Figure 6). Therefore, a limitation of our approach is the assumption that SNPs regulate the gene they are proximal to, which is not always accurate (76-78). To infer a more direct link between SNPs and potential target genes, we incorporated human islet chromatin data (37). Physical contact between a region containing SNPs and a distal gene supports a regulatory role, as for ACP1 (Figure 6B). Additionally, SNPs within regions of open chromatin (ATAC-seq) and actively transcribed regions (histone markers) suggest a higher likelihood of regulating transcription factor access. While this approach does not conclusively show a link between the SNPs and expression of the orthologue for our candidate proteins, these chromatin data more strongly suggest that the orthologue expression may be regulated by the candidates’ SNPs.”

    1. Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      The main problem with the interpretation of the data is that the authors have NOT used a so-called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to subselect trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a high-level cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

    1. We’ve chosen to keep highlights private to avoid pages being cluttered by highlights that have no surrounding discussion. We understand that people may want to share highlights with others, and we think there are effective ways we can address that in the future.

      You would imagine that by now you would be able to share some of your highlights without having to add some weird annotation to it especially when you are trying to share it on a private group.

      it is also quite worrisome that the last time there was a comment about this it was in 2019. it is almost like there's no work or effort put into this one.

      so as much as there's a comment about "thinking of effective ways", there's no clear indication that there's some were going into it.

    1. This is the main concern raised by the public, a risk of large-scale or even unprecedented impacts on public health or the biosphere. This is one example of many: I am extremely concerned that this proposed action could potentially contaminate native life forms on Mars and/or bring back alien virus, bacteria, or other life forms from Mars to Earth. I understand that there are planetary protection protocols. However, Murphy's Law says that if something horrible could happen, it eventually may indeed occur. History is filled with examples where Acts of God and/or human arrogance caused otherwise unforeseen disasters. .... The Earth is already dealing with increasingly serious problems from invasive or alien species being transported to new locations, and viruses mutating and causing deadly pandemics. We have not been able to solve many of these problems. What happens if a Mars life form escapes containment and, without evolving in Earth's ecosystems, spreads uncontrollably and devastates Earth's species including us humans? There might be no way to reverse or even mitigate for that devastation. I support scientific research when it is safe and in the public interest. However, I oppose research when there is no absolute guarantee of safety and when the risks outweigh the potential benefits. (Spotts, 2022) I provide direct links to all the comments submitted in the final round of public comments with a brief summary of the level of concern for each one here: Most public comments share Sagan's priority that NASA can't take a risk of large-scale harm NASA's response to Spotts was: "Refer to the previous response for HS-002" (NASA, 2023 : B-5) HS-002 is their answer to another similar question: Granger:Are you certain that in any way, this mission won’t end with the total annihilation of the entire planet, or force us to live in biomes for the rest of time? NASA: As discussed in Section 3.2 of the PEIS, the exact nature of the Mars sample constituents regarding biosignatures and potential biological activity is currently unknown. The PEIS cites several sources supporting the position that contamination of Earth by Martian microorganisms is extremely unlikely to pose a risk of significant harmful effects. However, the risk cannot be demonstrated to be zero (see Response ID HS-001 for information regarding containment measures). As a result, a comprehensive quantitative analysis of the potential impacts of a sample release in the event of an off-nominal landing and the effects of Mars samples on Earth’s environment cannot be accomplished with current data; any such analysis would be theoretical at best, involving substantial speculation and supposition. For this reason, the emphasis of the MSR approach is on sample containment (NASA, 2023 : B-43) So even in response to a concern by a member of the public who asked NASA if it is possible that one consequence would be that we have to live in biomes on Earth for the rest of time or total annihilation of the planet (presumably meaning extinction of all terrestrial life) NASA were not able to rule this out as a possible consequence of their mission. Instead NASA responds by saying that the emphasis is on sample containment, since they can't predict consequences if the samples are not contained. As we saw at the start expert opinion is that the risk of such scenarios is very low, and the analogy of a house fire and a smoke detector fits them well. But we take great care to protect our houses from the very low risk scenario of a house fire. Smoke detector analogy for the low risk of large-scale harm to human health and Earth's biosphere Later in this paper we look at a couple of examples of a likely very low risk but of unprecedented harm. The mirror life scenario in worst case where we can't engineer microbes to stop it could be incompatible with our ecosystems and take over the soils, and then we'd need to maintain the terrestrial ecosystems in biomes and keep out mirror life. It wouldn't happen instantly but as it radiates and spreads through the ecosystems we'd then need to work to rescue them and the only solution might be large dome-like biomes covering them and barriers in the soil and then measures within to sterilize them of mirror life and to keep it out. Detailed scenarios of mirror life and a novel fungal genus to motivate biosafety planning This doesn't fit their conclusion in the PEIS itself that any environmental effects would not be significant. A non zero risk of large-scale harm to Earth's biosphere that could lead to humans having to live in biomes for the rest of time is NOT identical to NASA's conclusion in the PEIS of no risk of global harm. Chester Everline, the expert on probabilistic risk assurance who commented on the last day of public comments put it like this: Given our lack of scientific insight into possible life on Mars, relics of life we may return from Mars, or simply organic substances from Mars that could interact with certain life forms on Earth, how can we possibly assert with confidence that MSR poses an acceptable risk to Earth's biosphere, even if the incredibly difficult target of a 99.9999% target for successful containment is satisfied? Given that sample return missions of the type proposed for MSR have never been attempted before, is it even feasible to do enough testing to assure that a 99.9999% target can be achieved? (NASA, 2023 : B38) NASA's response: Please see Response IDs HS-001 and HS-002 regarding risks to Earth’s biosphere and NASA’s approach to addressing that. With regards to the assurance case (HS- 017), no outcome in science and engineering processes can be predicted with 100% certainty. NASA’s extensive testing activities serve to support the assurance case (NASA, 2023 : B38) NASA's statement there "no outcome in science and engineering processes can be predicted with 100% certainty" is not valid. It is frequently the case that we can predict outcomes in science and in engineering with 100% certainty. In this case, for instance, we can predict with 100% certainty that if NASA doesn't return these samples, there is no risk to Earth;s biosphere or inhabitants from the samples that Perseverance is currently caching on Mars. We can also achieve the very high level of "no appreciable risk" or essentially 100% safety by sterilizing all samples returned to Earth with a sufficiently high level of ionizing radiation. We are not required to take ANY risks with Earth's biosphere. Whether to take such a risk is an ethical decision and not a decision that can be mandated by scientists or engineers. Chester Everline continues: Does NASA intend to impose a threshold for acceptable risk (i.e., a value above which the mission is considered too risky to proceed)? A possible consequence of unsuccessful containment is an ecological catastrophe. Although such an occurrence is unlikely, NASA should at least be clear regarding what level of risk it is willing to assume (for the biosphere of the entire planet)

      I think there is no mention of experts already having problems with the way NASA are dealing with this. If there is ignore this comment. But I feel that a mention of this should be high above and then saying to see down here for more info on this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes that simultaneous inhibition of LOXL2 and BRD4 reduces proliferation of TNBC in vitro and reduces growth in vivo.

      This observation is followed by extensive mechanistic studies that suggest physical interaction between LOXL2 and short isoform of BRD4-MED1. Inferences from Chip-seq analyses suggest that this interaction is involved in regulation of multiple transcriptional programs. Authors focus on differential activation of DREAM complex, to claim that this interaction "is fundamental for proliferation of TNBC". The manuscript is very well written and mechanistic inferences are based on a set of sophisticated epigenetic analyses and bioinformatical inferences. The phenotypic effects from LoxL2 inhibition by itself, or in combination with BRD4 inhibition are relatively modest. These modest effects, as well as many of the reported changes in gene expression are clearly inconsistent with the frequently used adjectives as "dramatic", "fundamental", "deeply affected", "drastically hampered" etc. Given the modest phenotypic effects, many of the key claims and conclusions are not supported by the data.

      We thank the reviewer for appreciating our work, defining the manuscript as well-written, and saying that it comprises extensive mechanistic studies as well as sophisticated epigenetic analysis.

      We apologize if some of our statements seemed exaggerated. In this revised version, we revisited some of our conclusion to moderate them.

      Moreover, we took the reviewer's criticism as an opportunity to strengthen our findings. In the revised version of the manuscript, we included an additional TNBC PDX (PDX-127), and results from this experiment clearly reinforce our claims (Fig. 6D and Fig. EV9E-F). In this new in vivo experiment, we selected a PDX model in which the expression of BRD4L is not detectable, while BRD4S is clearly expressed. Therefore, the treatment with JQ1 would specifically affect the activity of BRD4S, making the treatment selective. Additionally, we reduced by half the dose of JQ1 administrated to limit the effect of BRD4S inhibition alone on tumor growth. The combinatorial treatment (JQ1+PXS) induced a clear superior effect in this setting as compared with single-agent treatments. In addition to this, we discarded that the observed growth reduction is not the result of the sole inhibition of LOXL2, which could affect FAK/Src activity or extracellular Collagen crosslinking. In conclusion, our data show that the combinatorial inhibition of LOXL2 and BRD4S is effective in reducing tumor proliferation in TNBC in vivo models, independently of the inhibition of BRD4S and of other pathways known to be regulated by LOXL2.

      Specifically:

      1) It is unclear why authors generalize their conclusions to TNBC. Figure 1B demonstrates synergy for 1/3 cell lines, which is chosen for the follow up study. Even for MDA231, the synergy is confined to low concentrations of BRD4i (S1c). While MDA231 cell line is frequently used in experimental studies of TNBC, it is quite dissimilar to majority of clinical TNBC, and contains mutant RAS, which is rare in this disease.

      The synergistic effect is observed in MDA-MB-231 cells because only this cell line expresses both BRD4S and LOXL2. Indeed, in Fig. 1C we show that MDA-MB-468 cells do not express LOXL2, while BT549 only express minimal BRD4 levels.

      To corroborate this hypothesis, in the revised version of the manuscript we added:

      1. A new cell line (Cal51) expressing the same LOXL2 and BRD4 levels (Fig. EV8C) but showing greater resistance to JQ1 than MDA-MB-231 (Fig. EV8D). Also, in this cell line, we could show that the combinatorial treatment had a superior effect on cell viability than the single agents’ treatment (Fig. EV8E).
      2. A western blot panel of different TNBC PDXs shows that the majority of them express medium to high levels of both BRD4S and LOXL2 proteins, as is the case of MDA-MB-231 (Fig. EV9E) and Cal51 (Fig. EV8C). This result suggests that the combinatorial treatment could be used in the majority of TNBC patients as they are expected to express both BRD4S and LOXL2.
      3. Finally, as explained above, we performed another in vivo choosing a PDX that expresses BRD4S (but not BRD4L) and LOXL2 (PDX-127) (Fig. 6D and Fig. EV9E-F). Also, in this new model, we could observe that the combinatorial inhibition had a superior effect than single treatments.

        2) In vivo, the effect appears to be modest even in the MDA231 model, selected for evidence of synergy in vitro. In vivo, the combination appears to have an additive effect. Tumor growth rates are reduced, but no shrinkage is occurring. In the PDX model, LOXL2i does not have an effect as a monotherapy, while modestly enhancing the impact of BRD4i. These results are at odds with the claim of the interaction being fundamental for proliferation.

      We agree with the reviewer that the combinatorial inhibition appears to have an additive effect in vivo using the MDA-MB-231 model.

      1. For that reason, we have now performed the in vivo PDX experiment mentioned above (PDX-127; Fig. 6D and Fig. EV9E-F) in which we decreased the dose of JQ1 by half to avoid strong tumor growth effect due to BRD4 inhibition alone. In this new experiment, the synergistic effect is evident. While single-agent treatment showed a very moderate effect (0% or 20% tumor growth reduction for LOXL2 and JQ1, respectively), the combinatorial treatment showed a 50% reduction in tumor volume, further supporting our conclusions.
      2. We also performed either BRD4 or MED1 pull-down experiments in the presence of PXS and JQ1. We show that upon PXS treatment, the interaction between LOXL2 and BRD4S is maintained while the interaction with MED1 is reduced (Fig. 5A-C). However, in the presence of JQ1, the interaction between LOXL2 and MED1 is maintained while BRD4S-LOXL2 and BRD4S-MED1 interactions are impaired (Fig. 5D-F). These new results explain why monotherapy does not have a sufficient effect in vivo and set the rationale for the use of the combinatorial treatment. We believe that these new results corroborate our initial findings and we hope to have been able to satisfy the reviewer comments.

      3) No analysis of cell proliferation was shown in vivo. Authors should have performed BrdU or KI67 staining to support the claim. For in vitro analyses, authors also used indirect assays for proliferation. PI staining by itself does not have sufficient resolution to clearly capture modest effects that authors demonstrate. BrdU-PI double staining would have been much more useful.

      We appreciate the reviewer’s comment. In the revised manuscript we have added Ki67 and H3S10p staining in the tumor samples for the new in vivo PDX experiment (Fig. 6E and Fig. EV10A-C). We show that the combinatorial treatment significantly induces a reduction of both proliferation markers, which is in agreement with a reduced tumor volume. Regarding the in vitro analysis, we did not only use PI staining to show a reduced proliferation state but also H3S10p staining (Fig. 4B) and an SLBP1 fluorescent reporter MDA-MB-231 cell line (Fig. 4D, Fig. EV6B, E, and Movie EV). In the revised version of the manuscript, we included a new FACS-PI analysis (Fig. 4A, C) to better represent the effects we see on the cell cycle.

      Minor points:

      Dose dependent decrease in phosphorylated H3 is not at all obvious from eyeballing the data in S1A; the only effect that I see is a modest reduction at the highest concentration of the inhibitor. Authors need to quantify the results to support the claim.

      We agree with the reviewer and we apologize for the misinterpretation. We have changed the revised manuscript as follows: “The selective LOXL2 inhibitor PXS-538224 (hereafter, PXS) efficiently reduced the levels of oxidized histone H3 (H3K4ox) in MDA-MB-231 cells at 40 μM (Fig. EV6C), indicating an efficient inhibition of LOXL2 catalytic activity in the nucleus.”

      Most of breast cancer cell lines are derived from metastatic disease, including pleural effusion, thus the point that because MDA231 cell line is derived from pleural effusion, it is metastatic does not have sufficient logical foundation.

      Many publications have shown the high metastatic capacity of MDA-MB-231 (e.g. https://doi.org/10.1016/j.bbabio.2011.04.015, doi: 10.1038/s41467-017-01829-1), which are therefore used as TNBC metastatic model. The scope of the analysis reported in Fig. 6C was just to show whether any of the used treatments could reduce the metastatic capacity of this cell line. We believe we do not overstate the results but just report them as they are.

      How is loss of cell-cell junction in vitro consistent with LOXL2 role in modulating ECM? There is no evidence of ECM production in MDA231 in vitro. On the other hand, this loss is associated with EMT.

      We thank the reviewer for identifying this mistake. In the revised manuscript we changed the text as follows: “Gene set enrichment analysis (GSEA) revealed that LOXL2 KD induced upregulation of processes involved in cell morphology, secretion, membrane trafficking, and cell differentiation, with cell-cell junction being one of the most significantly affected pathways (Fig. EV5E). These results agree with the role of LOXL2 in regulating epithelial-to-mesenchymal transition, corroborating the high quality of our dataset.”

      Reviewer #1 (Significance (Required)):

      Discovery and characterization of LOXL2-BRD4 interaction is advancing the ever-deepening understanding of molecular mechanisms of regulation of gene expression. The studies and analyses appear to be sufficiently rigorous and reported with clarity, and the claimed discovery of the biological interaction between LOXL2 and BRD4 is well supported. However, given the magnitude of the reported (rather than claimed) effects of this interaction, and concerns about generalizability of authors conclusions, it is not clear how these results are promising for the development of new therapies in TNBC. Moreover, in contrast to luminal BC, there is no clear evidence for utility of cytostatic drugs in constraining TNBC. Therefore, biological and clinical significance of the authors discovery is unclear and claims in this regard appear to be overblown

      We thank the reviewer for stating that our analysis is rigorous and reported with clarity. We really took the criticisms as an opportunity to strengthen our findings, as explained above.

      For the newly presented in vivo PDX model, we performed immunohistochemistry of Ki67, H3S10p and Cleaved Caspase 3 to check whether the reduction of tumor volume observed in the combinatorial treatment was a result of a cytotoxic and/or a cytostatic effect (Fig. 6E and Fig. EV10A-C). As shown in the figure, the combination of the two inhibitors induced a superior decrease of Ki67, H3S10p, and a clear increase of Cleaved Caspase 3. Therefore, these new data indicate that the combinatorial treatment does not only have a cytostatic effect but also cytotoxic, suggesting a clinical exploitability for the treatment of TNBC patients.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their study, Pascual-Reguant et al. show that combined inhibition of BRD4 and LOXL2 can synergize to restrict triple-negative breast cancer (TNBC) proliferation. BRD4 and LOXL2 are transcription regulators that can read and write epigenetic information, respectively. The authors employ three distinct breast cancer cell lines and mouse models with cell line-derived xenografts, and they show that combined inhibition of BRD4 and LOXL2 can be superior to single BRD4/LOXL2 inhibition in these model systems. In an attempt to identify a connection between BRD4 and LOXL2, the authors find that the two proteins can bind to each other. The authors performed most of the experiments in the breast cancer cell line MDA-MB-231. To assess the impact of LOXL2-inhibition on transcription, the authors assessed changes of the transcriptome in MDA-MB-231 cells following LOXL2 knockdown. They found that genes related to cell differentiation and morphology were upregulated, while genes related to the cell cycle were downregulated. ChIP-seq data of BRD4 showed that BRD4 can bind to cell cycle gene promoters and that this binding was enhanced upon loss of LOXL2. The authors found that LOXL2 and BRD4 interacted with the transcriptional cell cycle regulators B-MYB, FOXM1, and LIN9, which are components of the MYB-MuvB-FOXM1 (MMB-FOXM1) complex that is known to promote the expression of late cell cycle genes with important functions during mitosis. The authors conclude that LOXL2/BRD4 interact with each other and with the MMB-FOXM1 complex to drive the expression of cell cycle genes and cell proliferations. Vice versa, they conclude that inhibition of LOXL2/BRD4 reduces cell proliferation through inhibiting the expression of cell cycle genes.

      Major:

      • The data and methods are presented well. The experiments are adequately replicated and analyzed. However, except for the first section, all experiments were performed using only one cell line. It is important to validate key findings in at least a second cell line.

      We thank the reviewer for valuing our work.

      To address the reviewer’s comment, in the revised manuscript we added an additional cell line (Cal-51), that expresses similar levels of LOXL2 and BRD4 as compared to MDA-MB-231 (Fig. EV8C). Even though this cell line is clearly more resistant to JQ1 than the MDA-MB-231 cell line (Fig. EV8D), the combinatorial treatment is significantly more effective as compared with single agents’ treatment (Fig. EV8E).

      Moreover, we have also performed an additional in vivo experiment using another TNBC PDX (PDX-127) that expresses LOXL2 and BRD4S, but not BRD4L. Given that JQ1 can inhibit both BRD4 isoforms, this in vivo system allowed us to demonstrate that the tumor antiproliferative capacity of the combinatorial treatment is due to the simultaneous inhibition of LOXL2 and BRD4S (rather than BRD4S and L) (Fig. 6D and Fig. EV9E-F).

      • There appears to be a misunderstanding of the concept of cell cycle-dependent gene regulation by the DREAM complex and its related factors. Early (G1/S) cell cycle genes contain E2F promoter motifs, while late (G2/M) cell cycle genes contain CHR promoter motifs. The DREAM complex can bind both, while RB-E2F and MuvB recognize only E2F and CHR motifs, respectively. B-MYB and FOXM1 bind to MuvB and regulate late cell cycle genes, but they do not bind to early cell cycle genes. Given this concept, the authors' rationale to connect BRD4/LOXL2 through MuvB/B-MYB/FOXM1 with E2F promoter sequences and early cell cycle genes and the subsequent conclusions must be corrected.

      We thank the reviewer for their expert explanation. We corrected our conclusion in the revised version of the manuscript following the reviewer’s comment.

      • I felt that the suggested functional connection between LOXL2/BRD4 and DREAM is not strongly supported by the authors' data. Figure S6E: A similarity score of Fig. EV6E: We agree with the reviewer that a similarity score of Fig. 4E: We thank the reviewer for this comment. The performed pulldown showed that BRD4S, LOXL2, and MED1 interact with Lin9 and B-Myb, but not with FOXM1, thus FOXM1 itself is an internal negative control of the pulldown. Additionally, BRD4L does not show the same interaction pattern as BRD4S, LOXL2, and MED1, again acting as an internal negative control. We, therefore, believe that the pulldown is properly controlled and that the observed interaction is trustful. We furthermore agree with the reviewer that it would be interesting to characterize the interactions between the DREAM complex and BRD4S, LOXL2, and MED1. However, we believe that the dissection of these interactions at the mechanistic levels would require a deeper study, which can be a project in itself that we aim to explore in the future. For example, it would be interesting to investigate whether either the inhibition or the downregulation of LOXL2 and/or BRD4S specifically impairs the formation of the DREAM complex or the recruitment of specific DREAM complex subunits, as well as how these effects impair the DREAM complex chromatin binding. We are afraid that the suggested pulldowns would not be sufficient to answer these questions, which would require extensive cross-interaction studies in either BRD4/LOXL2 and BRD4+LOXL2 inhibition or downregulation followed by ChIP-seq and transcriptomics for all the conditions. We believe that the provided data, together with the functional characterization (both, in vitro and in vivo), of the phenotypes triggered by BRD4S and LOXL2 inhibition make a strong case for our manuscript and leave out of scope the suggested experiments. We hope the reviewer will understand our explanation and will appreciate that we are planning to pursue this further in the future.

      Fig. 3: We thank the reviewer for this important comment. The ChIP-seq technique very often does not provide exhaustive results due to sequencing depth limits and antibody performance. We believe that the fraction of DREAM target genes found in our dataset as bound by BRD4S is not exhaustive and that the analysis proposed by the reviewer would not lead to clear conclusive results. However, we understand the importance of verifying that DREAM target genes whose promoter is bound by BRD4 are indeed downregulated when LOXL2 is inhibited. To give an answer to this question, in the revised manuscript we added gene expression analysis of selected DREAM target genes upon treatment with JQ1, PXS their combination. We could successfully show that both JQ1 and PXS treatment impairs the transcription of the selected DREAM target genes, however, the combinatorial treatment almost shut down their expression, in agreement with our hypothesis (Fig. 5J).

      • The authors state that it is surprising to find that LOXL2 can promote target gene transcription because it is rather known as a transcriptional repressor. To this point, the authors should perform standard analyses using their RNA-seq and ChIP-seq data. Compare differential expression of genes that are bound by BRD4S/L/S+L and genes not bound by BRD4. Perform motif search and enrichment analyses for transcription factor and co-factor binding data (public ChIP-seq repositories). Such analyses may suggest what gene sets are up- and downregulated by LOXL2 through BRD4S/L and what other factors could be involved in LOXL2-dependent up- and downregulation of gene transcription.

      We thank the reviewer for this valuable comment that certainly provides the rationale for a follow-up project. However, we believe that the proposed study goes beyond the scope of our work at this moment.

      Minor:

      • I felt that background information on the BRD4 isoforms was missing. The short and long isoforms of BRD4 should be introduced briefly.

      We agree with the reviewer. In the revised manuscript, we addressed this by presenting BRD4 isoforms in the introduction part of the manuscript.

      • Given that BRD4 inhibition is known to activate p53 (e.g., PMID 23317504 and 33431824) and p21 (PMID 31265875), the authors should discuss the p53 status of their cell lines (largely mutant). In general, I felt that the authors could better cite and discuss the current literature on BRD4 and LOXL2.

      We appreciate the comment of the reviewer regarding p53. Given the fact that p53 is mutant in MDA-MB-231, we believe that the proliferation defect observed with the combinatorial treatment may be due to the activation of alternative cytostatic or cytotoxic signaling cascades, independently of P53 activation. We have now briefly mentioned this point in the manuscript discussion.

      • It was unclear to me why the authors did not actually test experimentally whether their predicted interaction models 2 or 4 are likely true (Figure 2E+G).

      We understand the reviewer’s comment. The fact that JQ1 treatment almost abrogates the interaction between LOXL2 and BRD4S strongly suggests that models 1 and 3 are likely wrong, therefore pointing towards models 2 and 4 as the correct ones. To test whether models 2 and 4 are indeed the correct models we are now performing extensive mutagenesis studies, which are producing preliminary results suggesting indeed that models 2 and 4 are correct. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      • The transcription of cell cycle genes depends on the cell cycle (i.e., reduced cell cycle entry correlates with reduced cell cycle gene expression). Given that the authors showed LOXL2 inhibition reduce MDA-MB-231 cell proliferation, they should note that reduced expression of cell cycle-related genes is expected upon LOXL2 knockdown.

      We understand the reviewer’s comment. We believe that we provide sufficient data supporting our hypothesis that LOXL2 controls the expression of cell cycle genes at the transcriptional level together with BRD4S. In addition, the sole inhibition of LOXL2 has practically no effect on tumor proliferation in vivo but largely enhances the antiproliferative effect of low-dose JQ1 (Fig. 6D). We hope these clarifications would satisfy the reviewer.

      • The authors specify in their discussion that their data show a function of LOXL2/BRD4 in the cell cycle interphase, while there were no experiments that support that specific conclusion. At least it is unclear to me why the authors rule out a function in mitosis?

      We thank the reviewer for this comment. We referred to interphase genes because these are the early cell cycle genes, while mitotic genes are the late ones. We do not discard a possible function for BRD4S and LOX2 regulating mitotic progression, however, we believe this would be a consequence of dysregulated G1-S-G2 gene expression, rather than a direct transcriptional effect. This conclusion derives from the fact that while we observe interactions between LOXL2, BRD4S, and MED1 with Lin9 and B-Myb, these are not fully conserved with FOXM1, which is typically required for the transcription of mitotic genes. To avoid confusion, we have now anyway removed the word “interphase” from the text.

      • I felt that the first part of the manuscript (combination of BRD4 and LOXL2 inhibitors in TNBC) was a bit uncoupled from the functional studies on LOXL2 and its connection to BRD4. The transition between these parts and the final discussion on why the joint control of cell cycle genes by LOXL2/BRD4 may be important for the synergistic effect of LOXL2/BRD4 inhibitors. To this point, the authors' model was not clear to me.

      We really appreciate the reviewer’s comment. To better connect the functional studies with the clinical significance of the proposed combinatorial treatment, we restructured the manuscript. In the revised version, the use of the combinatorial treatment is shown in Figure 6. Moreover, to better explain why we focused all the studies on BRD4 and LOXL2, we also included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition, suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. We believe that these data set the rationale to further explore the connection between LOXL2 and BRD4, both at the mechanistic and functional levels.

      Reviewer #2 (Significance (Required)):

      The study by Pascual-Reguant et al. shows that inhibitors of BRD4 and LOXL2 can be combined to achieve better efficacy in reducing proliferation of breast cancer cell lines and breast tumor growth in xenograft models. They provide strong evidence for a functional interaction between LOXL2 and BRD4 and investigate their common transcriptional targets. Intriguingly, some evidence points towards a direct regulation of the DREAM complex and its cell cycle gene targets.

      The findings are novel and can be the basis for further research on TNBC combination therapy using BRD4 and LOXL2 inhibitors. The link to the DREAM complex is preliminary.

      The study is of interest for a basic research audience with some translational aspects.

      I reviewed this manuscript as a researcher in gene regulatory mechanisms, with cell cycle genes as one focus area. I have no expertise in the computational modeling of protein-protein interactions and I am no expert for breast cancer.

      We thank the reviewer for the positive comments. We also would really like to thank the reviewer for their criticism, which, we believe, contributed to a new and improved manuscript version.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Laura Pascual-Reguant et al. identified a novel role of the LOXL2 oxidase in sustaining cell cycle progression through a so far uncharacterized gene-activating function is mediated by the BRD4S epigenetic reader and exerted on key DREAM-target genes in TNBC. Moreover, the authors showed that combinatorial treatment of TNBC with LOXL2- and BRD4-specific inhibitors result in a tremendous anti-tumorigenic effect. For all findings, they leveraged in vitro and in vivo settings as well as high-throughput sequencing approaches. However, the following points should be addressed and explained.

      Major points:

      -The authors on their working hypothesis propose that dual inhibition of BRD4 and LOXL2 is a novel strategy for curing TNBC. For my taste, just because both targets are quite promising for TNBC, the jump to this combinatorial treatment is kind of abrupt. Knowing the difficulty and time-/financial- investment, authors could optionally perform a mass spectrometry analysis on nuclei lysates with LOXL2 pull down to identify physical interactors. Due to the augmented resources and analysis of raw data, authors may necessitate a generous revision period (approx. 4 months for starters). By that, this can provide a more unbiased approached to look at nucleus-specific gene-regulatory functions and particularly at epigenetic readers. It would be also interesting to see if LOXL2 interacts with other members of the BRD family. Selecting BRD4 and no other members of the bromodomain family cannot be the only choice given that other BRD members can also interact with several of these mediator subunits.

      We thank the reviewer for the suggestion and we agree with the fact that the rationale for combining BRD4 and LOXL2 inhibitors was not sufficiently argued in the first version of the manuscript. For that reason, in the revised manuscript, we added new data to explain why we explored this topic. In particular, to better explain why we focused all the studies on BRD4 and LOXL2, we included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition (but not to other approved chemotherapeutic drug), suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. Moreover, we restructured the manuscript to make the story more linear, explaining first the functionality of BRD4S-LOXL2 interaction at the molecular and cellular levels, and then presenting the in vivo systems in the last part of the manuscript.

      We agree with the reviewer that it may be interesting to explore whether LOXL2 interacts with other BRD family members. However, given the prominent role of BRD4 in promoting cancer proliferation, we believe that understanding the relevance of BRD4S-LOXL2 interaction in TNBC is, per se, of great interest and provide a novel mechanistic understanding of how TNBC proliferation is controlled at the transcription level. In the specific case of TNBC, it has been shown that BRD4S has an oncogenic effect, while BRD4L is an oncosuppressor. In the manuscript, we now showed that LOXL2 downregulation sensitizes cells to JQ1 treatment (Fig. 1D). Additionally, while the downregulation of BRD4L does not have any additional effect on cell treated with PXS, the downregulation of BRD4S sensitize them to LOXL2 inhibition (Fig. EV8B). These results, once again, indicate the relevance of studying the functional interaction between BRD4S and LOXL2.

      -LOXL enzymes have been shown to promote collagen and fibronectin assembly, thereby sustaining the pro-survival effect of the ITG5A/FN1/FAK/SRC signaling cascade and shielding TNBC cells against chemotherapy treatment (32415208). Did authors observe if LOXL2 loss or inhibition decreased the active status of FAK and SRC, which are well known to promote G1-S transition (25381661)?

      Probably the cell cycle defects upon LOXL2 loss may also partially arise from the impairment of this cascade.

      We really appreciate the reviewer’s suggestions. In the revised version of the manuscript, we checked FAK and Src activation status in tumor samples from one of our in vivo experiments (Fig. EV10D). We did not observe any difference in phospho-FAK or phospho-Src upon treatment either with PXS, JQ1, or their combinations, suggesting that alterations in the activity of these factors were not driving the observed proliferation defects.

      -Authors exclusively use JQ1 as a BRD4 inhibitor. As JQ1 may have an unspecific effect on BRD2 as well, authors should consider reproducing key experiments with siControl- and siBRD4-treated cells and increasing doses of PSX as well as repeating the JQ1 dose response assay in Figure 1B using siRNA-mediated silencing of LOXL2. Given that both players are part of the same complex, silencing of one and inhibition of the other should sensitize cells compared to their control counterparts.

      We agree with the reviewer and we addressed this comment in the revised manuscript. In particular, we have added two additional experiments:

      • We transduced MDA-MB-231 cells with isoform-specific shBRD4s (shBRD4L and shBRD4S) (Fig. EV5H) and checked cell sensitivity to PXS treatment (Fig. EV8B). As explained also above, we observed that only when the short isoform of BRD4 was downregulated cells displayed higher sensitivity to PXS treatment. This result corroborates that BRD4S and LOXL2 are required for TNBC proliferation.

      • We transduced MDA-MB-231 cells with shLOXL2 and assessed JQ1 sensitivity (Fig. 1D). We showed that upon LOXL2 downregulation, cells became more sensitive to JQ1 treatment, again corroborating the fact that TNBC proliferation requires BRD4S and LOXL2.

      -Moreover, in Figures 1G and S3D the differential sensitivity of low and high LOXL2 cell lines is unclear. Do authors know if any of these growth kinetic lines represent one of the tested cell lines in Figure 1A-B? Authors should provide respective legends. In addition, authors should take advantage of their homemade data given that they have already selected a panel of TNBC cell lines with various LOXL2 expression at basal state (Figure 1A) for which dose response assays have been performed (Figure 1B). Therefore, I would perform an IC50 graph for JQ1 (without PSX treatment) using the existing data from Figure 1B.

      We apologize if our representation was confusing. In the revised manuscript we have changed the sensitivity plots (Fig. 1A and Fig. EV1) to make them easier to grasp. Additionally, in Figure 1A we included the analysis of CCLE cell lines stratified based on their LOXL2 expression levels. This analysis showed that LOXL2 expression levels could overall predict the response to BETi treatment. As suggested by the reviewer, we also plotted the IC50 of the 3 cell lines tested. However, their JQ1 sensitivity curves did not show any difference that could be attributed to their different LOXL2 levels. Our speculation is that only 3 cell lines do not provide a sufficient size to reach a meaningful conclusion, which, in contrast, can be achieved by comparing the CCLE BETi sensitivity.

      -In Figure 2D, the pull-down assay is inconclusive, as the molecular weight for each construct is not mentioned. I would probably add this information also in all performed western blots. Also, the overexpression of the BD1/BD2-mutated and especially the BD1/BD2-lacking construct is unclear if it still interacts with LOXL2, probably because of the lack of molecular weight reference of each band. Therefore, the authors should make this pull-down assay more descriptive regarding the size of the bands. Also, BD1 mutagenesis at N140 was shown to dislodge the binding of JQ1 to BRD4 (24497639), which implies that BD1 mutagenesis or overexpression of the BD1-deficient construct should abrogate the interaction of LOXL2 with BRD4, reminiscent to the abrogated interaction of BRD4/LOXL2 upon JQ1 that binds to both BDs (Figure 2F). And, what happens if a BD2-deficient construct is expressed?

      We thank the reviewer for spotting this distraction. We apologize for this and in the revised version of the manuscript we included molecular weights for all western blots.

      We acknowledge that BD1 mutagenesis displaces JQ1 binding, however, we respectfully disagree that because of this BD1-N140 mutant should not bind to LOXL2. Our docking analysis indeed showed that none of the poses is impaired either by BD1 or BD2 mutagenesis (Fig. EV4D). The fact that JQ1 disrupts the interaction between BRD4S and LOXL2 (Fig. 2F, G) is not due to the fact that they compete for the same binding residue, but rather for the space occupied by JQ1 inside the AcK binding pocket of either BD1 or BD2, which impedes proper binding to LOXL2. Our pulldown data indeed showed that mutant BD1 and BD2 retain the ability to bind to LOXL2 (Fig. 2C), as predicted by the docking.

      We did not try to express constructs either lacking BD1 or BD2 and we cannot speculate what could happen to the BRD4S-LOXL2 interaction in this scenario. Even though this experiment could help dissect the interaction between LOXL2 and BRD4S, we decided to rather perform mutagenesis of specific residues that have been predicted to be important for the interaction. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      -If authors support that BRD4S is the predominant isoform driving the expression of DREAM-targets, this means that DREAM-targets are mainly bound by BRD4S, relying on Figure 3E-F. However, based on the author's ChIPseq tracks in Figure 3H, DREAM targets such as EZH2 and HMGB2 are co-occupied by both BRD4 isoforms at the basal state on their promoter region. Also, especially for EZH2 and PLK4, authors should set to 'group auto-scale' both conditions in a smaller scale range for ChIPseq- and RNAseq tracks, although I do not these two genes as good candidates representing your analysis. Therefore, authors should initially show all genes (e.g in a table format) that enrich the 'DREAM-targets' signature and select for a greater panel of genes (like for AURKB and HMGB2) demonstrating a preferential occupancy of the BRD4S at their promoter region. Finally, authors are recommended to perform a ChIP-qPCR on these genomic regions at basal state (no LOXL2 silencing) to validate the predominant occupancy of BRD4S and the low/absent occupancy of BRD4L at these genomic sites.

      We apologize for the confusion. To make the figure more understandable, we now scaled all the panels to the same scale and highlighted in grey the promoter region of each selected DREAM target gene. As the reviewer can appreciate, none of these genes is bound by BRD4L in basal conditions (Fig. 3F).

      To better characterize the differential binding, following the reviewer’s suggestion, we performed ChIP-qPCR using Ab2 (which recognizes both BRD4 isoforms), in cells either downregulated for BRD4L or BRD4S with isoform-specific shRNAs (Fig. EV5H). Results showed that only the downregulation of BRD4S reduced the binding of Ab2 to the promoter of the selected DREAM target genes (Fig. 3D), corroborating our hypothesis and validating our ChIPseq strategy.

      -Authors in Figure 3G should select an equal-sized population of randomly chosen non-DREAM-target genes, otherwise, the comparison of log2FC difference between these two gene cohorts is unreliable and difficult to make. Mann-Whitney test should also be performed.

      We thank the reviewer for this suggestion, which was added to the revised version of the manuscript (Fig. 3E, lower panel).

      -Authors should repeat the cell cycle analysis (Figure 4A) as the number of cells subjected to flow cytometry is quite discrepant between the conditions. Also, it is not clear if the experiment was performed in at least biological triplicates (although in the respective legend, it is stated so). If performed in biological triplicates, authors should make a new graph where each cell cycle phase cell population differs between the two conditions. Moreover, the difference in cell cycle defects in LOXL2-inhibited cells (Figure 4C) is indifferent compared to their control counterpart. Therefore, authors should address these inconsistencies.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we represent the cell cycle also as a bar plot with statistical analysis (Fig. 4A, C). Even though the number of cells was the same across conditions, the sub-G1 population of the LOXL2 KD cells may have distorted the profile of the cell cycle. To avoid misinterpretations, we repeated the analysis in the revised version of the manuscript. Statistical analysis supports that LOXL2 inhibition or downregulation has a significant effect on cell cycle progression (Fig. 4A, C, right panel).

      -Furthermore, authors should explain what was the rational selecting a mediator subunit and specifically MED1 as a possible interacting partner of LOXL2 and BRD4s since MED12 and MED24 were also highly essential (Figure 4F).

      We selected MED1 as a Mediator Complex proxy. In our essentiality analysis MED 1, 9, 10, 12, 15, 16, 19, 23, 24, 25 score as significant, suggesting a functional interaction between LOXL2 and the Mediator Complex, rather than a specific subunit. MED1 has been previously described as a BRD4 partner and it is often used in immunofluorescence to visualize transcriptional foci, which made it the best candidate for follow-up study in our project.

      -Moreover, do authors also observe this functional relationship of LOXL2 and BRD4S in cell cycle progression in other breast cancer subtypes presenting a high proliferation index e.g HER2+?

      Presumably, the author's proposed mechanism applies to a wide panel of breast cancer entities, for which, only key experiments could be performed.

      We thank the reviewer for the suggestion. We hypothesized that other cancer types expressing LOXL2 and BRD4S could also benefit from the combinatorial treatment. Indeed, the CCLE drug sensitivity panel in Fig. 1A comprises cancer cell lines of different origins, not just TNBC, and corroborates that the relationship between LOXL2 expression levels and BRD4 sensitivity exist also beyond TNBC. Even though it is important to experimentally verify this hypothesis, we decided to pursue it in the future to broaden the applicability of the proposed strategy in preclinical settings.

      -Authors in Figure 5H represent LOXL2 and BRD4s as integral chromatin looping factors together with MED1 at promoter and enhancer regions. However, this illustration is an overrepresentation of their finding because authors did not address the differential occupancy of BRD4S upon LOXL2 loss in DREAM-target-specific enhancer regions. If they wish to do so, they may use the RANK ORDERING OF SUPER-ENHANCERS (ROSE) package to call for super-enhancer regions in the proximity of DREAM-targets and confirm similar results as for their TSS-proximal sites.

      We thank the reviewer for the useful suggestion. In the new version of the manuscript, we have simplified the representation, which now does not show super-enhancers. However, following the reviewer’s suggestion, we performed super enhancer analysis using ROSE. Results showed that BRD4S binds to super-enhancers more than BRD4L, including DREAM target gene super-enhancers. Additionally, while LOXL2 KD did not alter the binding of LOXL2 to DREAM target gene super-enhancers, it decreased the binding of BRD4S to them (Fig. EV7D, E). Overall, these data are in agreement with our hypothesis that BRD4S together with LOXL2 controls the expression of DREAM target genes.

      -In the current manuscript, authors did not address the translational relevance of their proposed mechanism in the context of conventional therapies. Knowing that several BRD-specific compounds currently undergo clinical trials, authors should address if LOXL2 low (MDAMB468) and high (BT549) cells demonstrate a differential sensitivity to increasing doses of chemotherapy, in the presence or absence of BRD4. By doing that, LOXL2 apart from being a therapeutic target could be also used as a prognostic marker to stratify patients and achieve better response to standard therapies.

      We really appreciate the reviewer’s suggestion and we think this is a fundamental point. In the new version of the manuscript, we have performed further analysis using a greater panel of chemotherapeutic agents from the CCLE sensitivity database. We now show that LOXL2 low-expressing cells show significantly more sensitivity to BETi treatments, but not to conventional chemotherapeutic agents (e.g. doxorubicin, Olaparib, 5-fluorouracil, paclitaxel, etc.) (Fig. 1A and Fig. EV1), which set the rationale to further explore the functional relationship between BRD4 and LOXL2.

      Minor points:

      -In Figure 1D, the authors should convert the y-axis to a logarithmic scale to better represent the differences between JQ1, PXS, and combo. Also, One-way Anova should be performed between JQ1, PXS and combo.

      We don’t understand the reviewer’s suggestion since Fig. 1D (Fig. 6B, right panel in the revised version) is a tumor picture for which the y-axis cannot be converted to a logarithmic scale.

      -In Figure S6F, authors did not show the sensitivity of LOXL2 low and high cell lines for BRD4 KO. If LOXL2-proficient cells are less sensitive to JQ1, based on Figure 1B, authors should consider showing something similar from the gene essentiality database.

      We agree with the reviewer and we apologize for this mistake. We have included the sensitivity of LOXL2 low and high cell lines for BRD4 KO and also for MYC KO (Fig. EV6G).

      -Authors failed to discuss the work from Ozge Saatci et al (PMID: 32415208) regarding LOXL2 in TNBC and ECM reorganization as well as in other cancer entities (PMID: 35428659) in the context of ECM remodeling. Authors should realize that these published works and the current ones are not conflicting but complement each other.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we discussed this work.

      Reviewer #3 (Significance (Required)):

      SIGNIFICANCE

      The conception and findings are of enlightening significance for TNBC therapy, especially given the lack of targeted therapies in this particularly aggressive breast cancer subtype. Hence, I posit this work as highly relevant for the cancer epigenetics research community interested in characterizing unknown factors that facilitate the gene-activating function of epigenetic readers in health and disease.

      My field of expertise is to uncover epigenetic vulnerabilities responsible for transcriptional plasticity driving drug tolerance in aggressive forms of breast cancer.

      We would like to take the opportunity to thank the reviewer for the relevant suggestions. We strongly believe the revised version of the manuscript has been substantially improved by addressing the comments the reviewer made.

    1. Author Response

      Many thanks for the detailed and sometimes sharp, yet appropriate criticism of our study. It was an incentive for us to carry out additional analyses and to devote more effort to an elaboration of concepts. The outcome is that the results have changed slightly and that we now give more space to a discussion of concepts. We first address here the points raised by more than one reviewer before responding to comments contributed by individual reviewers.

      The points raised can be divided into three thematic groups, 1) conceptual issues, 2) experimental and analytical questions, and 3) comments challenging the novelty of our results. On the first theme, we think it is essential to make a clear distinction between the conceptual and observational domains. As such, the criteria defining a “mirror neuron” and what is meant by the term "mirror mechanism" belong to the conceptual domain. This understanding of terms requires agreement among scientists, but is not experimentally testable. Unfortunately, there is no agreement on how to define a “mirror neuron” and what is meant by “mirror mechanism”. Thus, for the present work, the only option is to refer to specific definitions or to use our own, definitions which try to capture what others, and here most importantly Rizzolatti and colleagues, probably meant. We have adjusted the introduction in an attempt to convey our understanding and usage of the two terms in a hopefully comprehensible manner. Briefly, we use a definition for "mirror neuron" that we take from the first paragraph of the results section of Gallese et al. (Brain, 1996). We do not consider the "properties of mirror neurons" described in that paper as defining a mirror neuron (MN). Classifying neurons as MNs only on the basis of the presence of a modulation of discharge rate during an executed and an observed action compared with a baseline is a common practice also in other single neuron studies on MNs, consistent with this definition. Regarding "mirror mechanism", we refer to Rizzolatti and Sinigaglia (2016) and make a distinction between a broad and a strict definition. Given our finding that there are almost no F5 MNs whose activity during observation is a motor representation according to our strict definition of a mirror mechanism, and also given the problem that the term “mirror mechanism” itself is not uniformly understood, the question arises whether and how the term "mirror neuron" should be used in the future. The answer to this may vary and belongs to the conceptual domain. We briefly address this question at the end of the discussion of the revised manuscript.

      From that understanding of terms, conceptual hypotheses are to be distinguished, which of course must allow experimental predictions, i.e., must be falsifiable. We now distinguish more clearly between a "representation hypothesis" and an "understanding hypothesis". Both hypotheses focus on F5 MNs and are based on the strictly defined mirror mechanism. We test the “representation hypothesis” in our study, and just because it is the basis for the “understanding hypothesis”, falsifying the “representation hypothesis” would allow us to conclude that the “understanding hypothesis” is not valid. In contrast, confirmation of the “representation hypothesis” would not, of course, allow us to conclude that the “understanding hypothesis” holds. That would really be circular reasoning (this conclusion was drawn by some and rightly criticized). However, support for the “representation hypothesis” would be the necessary prerequisite for the “understanding hypothesis” to be true. These two hypotheses take up the original argument that a certain understanding of observed actions could follow from an equality of action-specific F5 MN activity during execution and observation. Because we considered the data on equality of action- specific F5 MN activity to be insufficient, we designed this study. Since our result largely argues against the "representation hypothesis" and thus against the "understanding hypothesis," we now discuss alternative concepts for the function of F5 MNs in more detail. It should be noted here that our fourth concept ("goal-pursuit-by-actor") could well represent the observed action without contradiction to our broad definition of a mirror mechanism, which in principle could also serve a subjective experience (which could be conceived as a kind of understanding). The way we structure the concepts in the discussion of this revised manuscript is, in our opinion, a useful overview of the concepts. The third concept is new in this context. We would like to emphasize that we focus on F5 MNs and intentionally avoid a discussion of mirror neurons beyond F5 in this paper. With the data from this study, we cannot say anything about MNs outside of F5.

      Regarding the key question of how the "understanding hypothesis" is testable, or whether it may not be testable at all, we agree, of course, that for the conclusion of whether F5 MNs contribute to perception, only a manipulation of F5 MNs can clarify it. We now say that explicitly in the introduction. We agree with reviewer #2 that "understanding" here is not limited to "action recognition" or "action categorization”, which in principle could be implemented by purely sensory processing. Therefore, we also do not believe that the approach proposed by reviewer #3, which builds on the distinction of actions, would allow for a critical examination of the "understanding hypothesis”. But we disagree that the "understanding hypothesis" is not testable at all. Operationalization is necessary. If we accept that we can measure certain visual or auditory perceptions of an animal by operationalization (e.g., the subjective visual vertical, see for example Khazali et al., PNAS, 2020), then we must also accept that we can, in principle, measure other subjective experiences by operationalization, such as pain or aiming at a goal or even the co- experience of pain. An example of how to approach this is the study by Carrillo et al. (Curr Biol, 2019), which reviewer #2 and colleagues discussed in a recent review article (Bonini et al., TCS, 2022).

      With regard to the second theme, experimental and analytical questions, we noticed while reading the comments that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. Therefore, we now clearly separate single neuron analysis and population code analysis in the structure of the article. In view of the fact that statements about mirror neurons in the literature mostly refer to single neurons, we added extensive single neuron analyses, so that only now statistically reliable statements about single neurons are possible. This has led to the realization that the number of neurons with exclusively shared code is so small that these neurons should be considered a rare exception. Given the small number of time periods with shared code, we additionally tested against a hypothesis already rightly proposed as an alternative explanation by G. Csibra in 2005 (Mirror neurons and action observation: Is simulation involved? In: What do mirror neurons mean? Interdisciplines Web Forum 2005). We were able to reject this hypothesis based on two of three methods for testing for a shared code. This is the second piece of evidence besides the clustering of time periods with shared code already described in the first version that time periods with shared code cannot be considered random.

      We discuss in more detail the question of whether neurons that exhibit a shared code at least at times support the representation hypothesis. To this end, we additionally examined whether certain action segments are more frequently represented with a shared than with a non-shared code, whether neurons with shared code differ from those with non-shared code in anatomical location, and whether an accuracy can be achieved with a time bin-wise selection of neurons with shared code by population cross-task classifiers as with within-task classifiers in the whole population.

      Another issue was how to test for shared code and how to decide if a code has enough sharing. To answer the question, the exact hypothesis we intended to test here is crucial. The representation hypothesis states that the representation of the observed actions in F5 MNs corresponds to the representation as it occurs during the execution of the same actions. Therefore, the relationship between discharge rate and actions that holds during execution should also hold during observation, which is measurable with a classifier trained on execution trials and tested on observation trials. Moreover, the actions should not be more distinguishable during observation with a classifier other than the execution-trained classifier, because if that were so, it would mean that the representation of observed actions is different from that of executed actions. The detection of a cluster of time bins for which both conditions are satisfied confirms that it is possible to discover in this way the shared codes postulated by the representation hypothesis.

      With respect to concerns that the monkey may not have used the cue at all when the action was executed, we added a comparison with control trials with a non-informative cue and also compared the duration of the approach phase between the three actions. Regarding oculomotor behavior, we verified that the monkey had actually directed his gaze toward the action during action observation for all three actions.

      On the third issue, concerning the novelty of our results, we have now explained in more detail in the introduction why we felt it necessary to conduct a study we considered fundamental. As a result of our study, it can be clearly stated now that representations of observed actions as predicted by the strictly defined mirror mechanism are rare in F5 MNs, but nevertheless cannot be dismissed as random. This dispels the objection rightly raised by Csibra in 2005 and contradicts the currently prevailing view that such a representation can only be found at a population level. Even if these representations are ultimately explained by a concept other than the strictly defined mirror mechanism, their existence must be accounted for by any theory of the function of F5 neurons. Moreover, it is also shown that the observed actions are well discriminated with a non- shared code, at times even optimally. This contradicts the notion – which has been widespread for a long time since the work of Gallese et al. (Brain, 1996) – that mapping to motor representations in terms of broad congruence is simply not perfect. The applied cross-task decoding approach seems promising to test also in the future for a shared action code. Finally, reconsideration of alternative concepts has led us to highlight the possibility of a representation of a goal pursuit by the observer.

      Reviewer #1 (Public Review):

      The authors set out to investigate the hypothesis that mirror neurons in ventral premotor area F5 code actions in a common motor representation framework. To achieve this, they trained a linear discriminant classifier on the neural discharge of three types of action trials and test whether the thus trained classifier could decode the same categories of actions when observed. They showed that codes were fully matched for a small subset of neurons during the action epoch, while a wider set of "mirror neurons" showed only poorly matched codes for different epochs.

      This is one of the descriptions of our results, where we realized that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. This prompted us to perform a detailed single neuron analysis.

      The authors controlled for potential visual object confounds by having identical objects be manipulated in three different ways and by having the animal carry out the motor execution in the dark. The main strength of the study lies in the clever decoding approach testing the matched tuning to behavioural categories in a model-free way. The central result is in the identification of the small sub-group of mirror neurons that show true matching during the execution epoch, which can dissociate the three types of action almost perfectly. This aligns well with some previous work while offering a novel avenue to identify and investigate those neurons. The underlying neuronal mechanism and behavioural relevance of these neurons remain an open question. It would have been interesting to understand better whether the specific motor representations at a recording site, for instance identified through microstimulation prior to recording (see Methods), the reaction times on individual trials or the specific gaze targets (object/hand) had a bearing on the decoding performance for a neuron/trial.

      We agree that these are interesting questions.

      In this study, the focus is on testing for a shared code according to a strictly defined mirror mechanism. We have now compared the anatomical locations of neurons with only time bins in which observed actions were discriminated with a shared code (according to one of the methods) to the locations of neurons with only time bins with non-shared code (see last paragraph in Results). We did not find any relevant difference and this is why one cannot expect topographically specific effects of microstimulation.

      We do not expect the reaction time (i.e., the time interval between LED onset and start button release, or the duration of the approach epoch) during execution or observation to have any effect on our results on shared coding as the analysis was based on relative time bins. The observed actions were predominantly distinguished late in the approach epoch, but especially in the manipulation epoch. At this time, reaction time is not expected to have a relevant influence.

      The relationship between gaze/eye position and the activity of mirror neurons, during execution or observation, is an interesting topic in itself. However, for testing for a shared code according to a strictly defined mirror mechanism, it is only relevant that the observing monkey actually observes the action. We have ensured this in our experiment by a fixation window and have now also confirmed that the monkey actually looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes).

      Ultimately, the uncovered matched mirror representations should in future experiments be tested with causal interventions and linked trial-by-trial to action selection performance.

      The authors put the focus of their discussion on the wider, less well-matched neuronal pool to support an action selection framework, which is of course a valid view and well established in motor representations. From a sensory perspective, sparse coding, as suggested by the small group of "true" mirror neurons identified with the decoding approach, should also be considered as the basis for a possible neuronal mechanism. A particular strength of the paper is that it could give new data and impetus to the important discussion about how motor and sensory coding frameworks come together in cortical processing.

      We have expanded the discussion considerably and also address the possibility of sparse coding.  

      Reviewer #2 (Public Review):

      The paper by Pomper and coworkers is an elegant neurophysiological study, generally sound from a methodological point of view, which presents extremely relevant data of considerable interest for a broad audience of neuroscientists. Indeed, they shed new light on the mirror mechanism in the primate brain, trying to approach its study with a novel paradigm that successfully controls for some important factors that are known to impact mirror neuron response, particularly the target object. In this work, a rotating device is used to present the very same object to the monkey or the experimenter, in different trials, and neurons are recorded while the monkey (motor response) or the experimenter (visual response) performed a different action (twist, shift, lift) cued by a colored LED.

      The results show that there is a small set of neurons with congruent visual and motor selectivity for the observed actions, in line with classical mirror neuron studies, whereas many more cells showed temporally unstable matched or even completely non-matched tuning for the observed and executed actions. Importantly, the population codes allow to accurately decode both executed and observed actions and, to some extent, even to cross-decode observed actions based on the coding principles of the executed ones.

      In my view, however, the original hypothesis that an observer understands the actions of others by the activation of his/her motor representations of the observed actions constitutes circular reasoning that cannot be challenged or falsified, as the author may want to claim. Indeed, 1) there is no causal evidence in the paper favoring or ruling out this hypothesis (and there couldn't be), 2) there is no independent definition (neither in this paper nor in the literature) of what "action understanding" should mean (or how it should be measured). Instead, the findings provide important and compelling evidence to the recently proposed hypothesis that observed actions are remapped onto (rather than matched with) motor substrates, and this recruitment may primarily serve, as coherently hypothesized by the authors, to select behavioral responses to others (at least in monkeys).

      1) One of the main problems of this manuscript is, in my view, a theoretical one. The authors follow a misleading, though very influential, proposal, advanced since the discovery of mirror neurons: if there are (mirror) neurons in the brain of a subject with an action tuning that is matched between observation and execution contexts, then the subject "understands" the observed action. This is clearly circular reasoning because the "understanding" hypothesis uniquely derives from the neuron firing features, which are what the hypothesis should explain. In fact, there is no independent, operational definition of the term "understanding". Not surprisingly there is no causal evidence about the role of mirror neurons in the monkey, and the human studies that have claimed to provide causal evidence of "action understanding" ended up using, practically, operational definitions of "recognition", "match-to-sample", "categorization", etc. Thus, "action understanding" is a theoretical flaw, and there is no way "to challenge" a theoretical flaw with any methodologically sound experiment, especially when the flaw consists of circular reasoning. It cannot be falsified, by definition: it must simply be abandoned. On these bases, I strongly encourage the authors to rework the manuscript, from the title to the discussion, by removing any useless attempt to falsify or challenge a circular concept and, instead, constructively shed new light on how mirror neurons may work and which may be their functional role.

      Please see the response to all.

      2) An important point to be stressed, strictly related to the previous one, concerns the definition of "mirror neuron". I premise that I am perfectly fine with the definition used by the authors, which is in line with the very permissive one adopted in most studies of the last 20 years in this field. However, it does not at all fulfill the very restrictive original criteria of the study in which "action understanding" concept was proposed (see Gallese et al. 1996 Brain): no response to object, no response to pantomimed action or tool actions, activation during execution in the dark and during the observation of another's action.

      We do not agree that the enumerated "very restrictive original criteria" emerge from the Gallese et al. (Brain, 1996) study. Except for the first paragraph in the results section, there is no clear statement on how mirror neurons should be defined.

      If the idea (which I strongly disagree with) was to simply challenge a (very restrictive) definition of mirroring (a very out-of-date one, indeed, and different from the additional implication of "action understanding"), the original definition of this concept should be at least rigorously applied. In the absence of additional control conditions, only the example neuron in Figure 2A could be considered a mirror neuron according to Gallese et al. 1996.

      We have the impression that the question does not distinguish clearly enough between the definition of "mirror neuron" and the definition of "mirror mechanism". In defining "mirror mechanism", we refer to the work of Rizzolatti and Sinigaglia (Nat Rev Neurosci, 2016). We do not think that this definition is out-of-date (see for example the 2018 article by Rizzolatti and Rozzi in Handbook of Clinical Neurology). If the term "mirror mechanism" is to be defined differently, then another term should be used for a new definition or an annotation should be added (such as "version 2"). This would be necessary to avoid unnecessary confusion resulting from unclear terms.

      Permissive criteria implies that more "non-mirror" neurons are accepted as "mirror": simply because they are permissively named "mirror", does not imply they are mirroring anything as initially hypothesized

      Even for a neuron that would be classified as a "mirror neuron" according to your previously stated "very restrictive original criteria”, it does not follow that it "mirrors” according to a mirror mechanism. And, of course, it is quite possible that more neurons do not "mirror” according to a mirror mechanism if one tests more neurons.

      (Example neuron in Fig 2B, for example, could be related to mouth, rather than hand, movements, since it responds strongly and similarly around the reward delivery also during the observation task, when the monkey should be otherwise still).

      We agree, it is not excluded that this neuron has a relation to mouth movements. However, since the neuron meets the conditions to be classified as a "mirror neuron", an additional relation to mouth movements would not be relevant. If mouth movements are to be an exclusion criterion, then this would have to be included and justified in the definition of a "mirror neuron".

      Clearly, these concerns impact all the action preference analyses. To practically clarify what I mean, it should be sufficient to note that 74% (reported in this study) is the highest percentage ever reported so far in a study of neurons with "mirror" properties in F5 (see Kilner and Lemon 2013, Curr Biol) and it is similar to the 68% recently reported by these same authors (Pomper et al. 2020 J Neurophysiol) with very similar criteria. Clearly, there is a bias in the classification criteria relative to the original studies: again, no surprise if by rendering most of the recorded neurons "mirror by definition" then they don't "mirror" so much. I suggest keeping the authors' definition but removing the pervasive idea to challenge the (misleading) concept of understanding.

      We think that it is very important to clearly separate "mirror neuron" from "mirror mechanism". And the question arises whether one should not include a mirroring criterion, which is derived from a definition of a mirror mechanism, in the definition of mirror neurons. We address this briefly in the discussion. Ultimately, the point of our study is to find out how many of the - if you want to put it that way - "permissively defined" mirror neurons actually “mirror”. And the answer depends on how one defines “mirror mechanism”. We provide an answer by resorting to a “strictly defined mirror mechanism”. We have now also given throughout the results section the percentages of neurons with certain properties with respect to all measured F5 neurons. This is a reference that allows comparisons among studies, provided that no neurons were directly discarded during recording, which we avoided in our study.

      3) It would be useful to provide more information on the task. Panel B in Figure 1 is the unique information concerning the type of actions performed by the monkey and the experimenter. Although I am quite convinced of the generally low visuomotor congruence, there are no kinematics data nor any other evidence of the statement "the experimental monkey was asked to pay attention to the same actions carried out by a human actor". First, although the objects were the same, the same object cannot be grasped or manipulated in the same way by a human and a macaque, even just because of the considerable difference in the size of their hands; this certainly changes the way in which monkeys' and experimenter's hands interact with the same object, and this is a quantifiable (but not quantified) source of visuomotor difference between observed and executed actions and a potential source of reduced congruency.

      We agree, of course, that there are kinematic differences in how a monkey and how a human manipulate the same object. We have not measured the kinematics and thus cannot make a systematic statement about this. We now report in the results section the rather incidental observation that already the reaching trajectories for the three actions differed and show corresponding differences in the timing of the approach epoch. However, for the question of this study, how many neurons are eligible to represent observed actions according to a strictly defined mirror mechanism, the kinematic repertoire of the observed actor is irrelevant. The reference is the F5 mirror neuron activity during the monkey's own action, i.e., how the monkey approaches the object with his hand, how he grasps it, and how he brings it to a certain target position and holds it there. The observed action, according to the strictly defined mirror mechanism, is to be mapped to this reference. Therefore, we did not collect kinematic data. But it is of course a possible explanation for a non-shared code if the strictly defined mirror mechanism does not apply.

      Second, there is little information about monkey's oculomotor behavior in the two conditions, which is known to affect mirror neuron activity when exploratory eye movements are allowed (Maranesi et al. 2013 Eur J Neurosci), potentially influencing the present findings: a {plus minus}7 (vertical) and {plus minus}5 (horizontal) window at 49 cm implies that the monkey could explore a space larger than 10 cm horizontally and 14 cm vertically, which is fine, but certainly leaves considerable freedom to perform different exploratory eye movements, potentially different among observed actions and hence capable to account for different "attention" paid by the monkey to different conditions and hence a source of neural variability, in addition to action tuning.

      We agree that the topic of the relationship between F5 MNs activity and eye movements is interesting. And we know from the work of Maranesi et al. (2013) that at least larger eye movements during action observation are related to the activity of F5 MNs. In our study, we ensured that the observing monkey was actually observing the action. For this purpose, we used a fixation window. We now additionally verified that the monkey really looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes). In our study, the fixation window was so small that the monkey could not see the face of the human actor, in contrast to the study of Maranesi et al. (2013). It was mainly the face that attracted the monkey's attention in that study (measured by gaze position). In our study, the risk that the gaze of observing monkey was out of the fixation window was high when he looked at the human actor's hand above the wrist. The execution of the action by the monkey took place in darkness. We did not use a fixation window because the monkey's own execution of the action can be assumed to direct his attention to the action.

      We cannot rule out the possibility that smaller eye movements during observation, larger eye movements during execution in darkness, covert shifts of spatial attention, or more generally attentional fluctuations have an influence on F5 MNs that might have counteracted a shared action code in our study. However, if this were the case, then the investigated hypothesis that the activity of F5 MNs during action observation is a motor representation according to the strictly defined mirror mechanism would also have to be rejected.

      4) Information about error trials and their relationship with action planning. The monkey cannot really "make errors" because, despite the cue, each object can be handled in a unique way. The monkey may not pay attention to the cue and adjust the movement based on what the object permits once grasped, depending on online object feedback. From the behavioral events and the times reported in Table 1, I initially thought that "shift" action was certainly planned in advance, whereas "lift" and "twist" could in principle be obtained by online adjustments based on object feedback; nonetheless, from the Methods section it appears that these times are not at all informative because they seem to depend on an explicit constraint imposed by the experimenters (in a totally unpredictable way). Indeed, it is stated that "to motivate the monkey even more to use the LED in the execution task, another timeout was active in 30% (rarely up to 100%) of trials for the time period between touch of object to start moving the object: 0.15 (rarely 0.1) for a twist and shift, 0.35 (rarely 0.3s) for a lift". This is totally confusing to me; I don't understand 1) why the monkey needed to be motivated, 2) how can the authors be sure/evaluate that the monkeys were actually "motivated" in this way, and 3) what kind of motor errors the monkey could actually do if any. If there is any doubt that the monkeys did actually select and plan the action in advance based on the cue, there is no way to study whether the activity during action execution truly reflects the planned action goal or a variety of other undetermined factors, that may potentially change during the trials. Please clarify.

      It is true that the three actions could in principle be performed without using the LED as an informative cue. While this is unlikely under the assumption that a monkey prefers the easiest and fastest way to get reward, it remains a possibility. For this reason, we introduced time constraints in a part of the trials. The selection of time constraints and the proportion of trials in which they were applied, was a pragmatic compromise between a time limit, at which the LED must be used as an informative cue for action selection in order to comply with the task, and a time span that allows the task to be completed even when overall motivation is low. The latter takes into account the general experimental experience that a monkey's engagement or motivation in such experiments varies across trials, sessions, and days. To evaluate whether the LED color was, indeed, used as a cue for action planning in the execution task, we randomly interleaved trials with a different LED, non-informative regarding the type of object, as a control in 5% of the trials. We compared the behavioral responses in trials with informative cues and those with a non-informative cue. The behavioral analysis established that both monkeys indeed used the informative cues to guide their choices (see Fig. 1D).

      Further evidence that the monkey used the cue for action selection and planning is the finding that the type of action was encoded before the release of the start button and then further during the approach phase, i.e., much earlier than somatosensory feedback about the manipulability of the object was available (see Fig. 3A and Fig. 6A).

      Regarding the question, which "motor errors" were possible: The answer can be found in the description of the cases in which a trial was aborted (see Material and methods): releasing the start button too early (< 100 ms after turning on the LED), manipulating the object too slowly after touching it (the time constraints mentioned), not holding the object until the reward was given, or not performing the task at all (10 s timeout).

      5) Classification analysis. There seems to be no statistical criterion to establish where and when the decoding is significantly higher than chance: the classifier performance should be formally analyzed statistically. I would expect that, in this way, both the exe-obs and the obs-exe decoding may be significant. Together with the considerations of the previous point 2 about the permissive inclusion criteria for mirror neurons, this is a remarkable (even quite unexpected) result, which would prove somehow contrary to what the authors claim in the title of the paper. The fact that in any classification the "within task" performance is significantly better than the "between task" performance does not appear in any way surprising, considering both the inclusive selection criteria for "mirror neurons" and the unavoidably huge different sources of input (e.g. proprioceptive, tactile, top-down, etc. afferences) between execution and observation. So, please add a statistical criterion to establish and show in the figures when and where the classifications are significantly above chance.

      We have added - in addition to the statistics already performed in the first version (Fig. 3A in the previous version, now Fig. 6A) - a number of analyses including statistics. This mainly concerns the analyses regarding a shared code at the single neuron level, in which we additionally tested against the null hypothesis proposed by Csibra in 2005 using permutation tests. And we have now also calculated confidence intervals for the population classifications that allow the comparison with chance level. We re-performed the classification analyses using eight-fold cross-validation. We also added a statistical analysis to the finding of clustering of time periods with shared code (Fig. 4). In Figure 5, we additionally compared the frequency of action segments with shared and non-shared codes, which is a descriptive, exploratory analysis. For this reason, it does not make sense to perform inferential statistics. Overall, these analyses represent a significant expansion of the analyses in the first version. We have done this primarily to arrive at statistically sound conclusions at the single neuron level.

      Regarding the comparison between within-task classification (o2o) and cross-task classification (e2o), it is important to keep in mind that the goal was to test the hypothesis that the activity of F5 MNs during action observation is a motor representation of the observed action according to the strictly defined mirror mechanism. This hypothesis requires both, 1) an above chance level accuracy of the e2o classifier and 2) no better accuracy of the o2o classifier as compared to the e2o classifier. If the o2o classifier were better, then the actions would not be represented as they are executed. And the reference in this hypothesis is the motor representation, that is, the code at execution. Thus, the direction e2o classification is the crucial one, not the reverse direction (o2e). One explanation for the fact that o2o shows better accuracy in the population may be the different sensory inputs mentioned above. In this case, the tested hypothesis has to be rejected and replaced by another one, which should then have a different name.

      Nevertheless, we also show the result of the o2e cross-task classification in Fig. 6 (yellow curve), which was already included in Fig. 3 of the first version. However, we do not address it in more detail in the main text because it is not relevant for the hypothesis to be tested. It is only a reportable additional result.

      6) "As the concept of a mirror mechanism posits that the observation performance can be led back to an activation of a motor representation, we restricted this analytical step to a comparison of the exe-obs and the obs-obs discrimination performance". I don't understand the rationale of this choice. The so-called "concept" of mirror mechanism in classical terms posits that mirror neurons have a motor nature and hence their functioning during observation should follow the same principle as during action execution. But this logical consideration has never been demonstrated directly (it is indeed costated by several papers), and when motor neurons are concerned (e.g. pyramidal tract neurons, see Kraskov et al. 2009) their behavior during action observation is by far more complex (e.g. suppression vs facilitation) than that hypothesized for classical "mirror neurons". Furthermore, when across-task decoding for execution and observation code has been used, both in neurophysiological (e.g. Livi et al. 2019, PNAS) and neuroimaging (Fiave et al. 2018 Neuroimage) data, the visual-to-motor direction typical produce better performance than the opposite one. Thus, I don't see any good reason not to show also (if not even just) the obs-exe results. Furthermore, I wonder whether it is considered the possible impact of a rescaling in the single neuron firing rate across contexts, as the observation response is typically less strong than the execution response in basically all brain areas hosting neurons with mirror properties, and this should not impact on the matching if the tuning for the three actions remains the same (e.g. see Lanzilotto et al. 2020 PNAS). The analysis shown in Figures 4 and 5 is, for the rest, elegant and very convincing - somehow surprising to me, as the total number of "congruent" neurons (7.5%) is even greater than in the original study by Gallese et al. (5.4%).

      As to the rationale of our approach, please see our response to the previous point.

      On the issue of rescaling: the hypothesis tested here requires that the F5 MNs activity on observation is a motor representation of the observed action. Hence, from the activity during observation the action should be just as readable as from the execution-related activity. If we had to use rescaling to find a shared code, then observed actions would not be represented in F5 MNs in the same way as on execution. Additional information on whether the action is being executed or observed would be needed. This would of course be possible in principle, but would contradict the hypothesis. And we then not only have the difficulty of which readout is the physiological one (here we make a parsimonious assumption with a linear readout), but we would have to make an additional assumption about rescaling. For this study, we have now chosen the solution of performing the action preference analysis on a single neuron level in a statistically clean way. This represents a very liberal form of rescaling, as it only tests whether the action with the highest or lowest discharge rate is the same when executed and observed. That is, if the result here is not fundamentally different, which is the case, then it can also be assumed that one does not get qualitatively different results for other forms of rescaling.

      7) The discussion may need quite deep revision depending on the authors' responses and changes following the comments; for sure it should consider more extensively the numerous recent papers on mirror neurons that are relevant to frame this work and are not even mentioned.

      The discussion has been thoroughly revised considering the comments raised and suggestions of this and the other two reviewers.

      Reviewer #3 (Public Review):

      Mirror neurons are a big deal in the neuroscience literature and have been for thirty years. I (and many others) remain skeptical of whether they serve the functions often attributed to them - specifically, whether they are motor planning neurons that contribute to understanding the actions of others. Testing their functions, therefore, is of great interest and importance. The present study, however, is not a cogent or convincing test. I do not think this study helps to answer the questions surrounding mirror neurons. It purports to provide a crucial test, that comes out mostly against the mirror neuron hypothesis, but the test has too many weaknesses to be convincing.

      Thank you for the clear words. We take from it, first of all, that in the first version of the manuscript we failed to convey the relevance of our study for the discussion of mirror neuron function. The concerns of this reviewer are in line with those of the others and are addressed in our response to all three reviewers.

      First, consider that the motor tuning and the visual tuning match "poorly." How poor or good must the match be before the mirror neuron hypothesis is rejected? I do not know, and the study does not help here. Even a "poor" match could contribute significantly to a social perception function.

      The specific hypothesis tested here assumes that an action-specific activity of F5 MNs evoked by observed actions corresponds to an action-specific activity of these actions if executed. The approach taken here to compare cross-task classification accuracy (execution-trained, tested in observation) with within-task classification accuracy (observation-trained, tested in observation) tests this hypothesis. The fact that we found a cluster of time periods of single neurons in which both accuracies are almost equal supports this approach and also the hypothesis for these time periods. In principle, of course, the decision for the presence of a difference or equality is always only a statistical statement and contains assumptions. For example, the assumption that a linear readout has physiological relevance enters here. But this problem exists in all studies that ultimately try to understand biological neuronal networks in order to explain perceptions and behavior. However, it is such studies that attempt to elucidate what information is contained in which neurons that set the stage for experiments that, in the optimal case, manipulate certain neurons in a particular way in order to then measure the behavior of an animal that is just right for those neurons.

      Second, the results remind me in some ways of other multi-modal responses in the brain. For example, in the visual area MST, neurons are tuned to optic flow fields that imply specific directions of self-motion. Many of the same neurons are tuned to vestibular signals that also imply specific directions of self-motion. But the optic flow tuning and the vestibular tuning are not perfectly matched. There is considerable slop and complexity in how the two tunings compare within individual neurons. That complexity is not evidenced against multi-modal tuning. Instead, it suggests a hidden-layer complexity that is simply not fully understood yet. Just so here, the fact that the apparent motor tuning and apparent visual tuning match "poorly" is not evidence against both a motor planning and a visual encoding function.

      We hope that it is now clearer, in contrast to the first version, that we tested a specific hypothesis that is only a prerequisite for the hypothesis of a very specific form of understanding. Referring to the example, the hypothesis analogous to ours would be that the representation of self-motion direction due to optic flow ("observation") corresponds to the representation of self-motion direction due to vestibular stimulation ("execution"). If it were then found that the self-motion direction due to optic flow cannot be predicted from a classifier trained on vestibular stimulation, and that another classifier trained on optic flow performs better, then the hypothesis would have to be rejected. This is then a reason to realize that "everything is a bit more complex" and to search for better explanations.

      Third, the animals are massively over-trained in three actions. They perform these actions and see them performed thousands of times toward the same object. Surely, if I were in the place of the monkey, every time I saw the object, I'd mentally imagine all three actions. As I saw a person act on the object, I'd mentally imagine the alternative two actions at the same time. Even if the mirror neuron hypothesis is strictly correct, this experiment might still find a confusion of signals, in which neurons that normally might respond mainly to one action begin to respond in a less predictable way during all three trial types.

      In our study, we tested a specific hypothesis related to the time an action is observed. Here, you suggest an alternative hypothesis. The question is whether this alternative hypothesis better explains the result of our study. The alternative hypothesis can be formulated as follows: the F5 MNs activity elicited by an observed action in this experiment corresponds to a mixture of the activities that occur when the other two actions are executed. This hypothesis is to be rejected because it fails to explain why a shared code occurs in single neurons and why cross-task population classifiers show an accuracy above chance level. A modified alternative hypothesis, which states that what is represented in the experiment during observation is a mixture of all three actions, cannot explain why the three actions are very well represented in the population and are optimally represented exactly when the target position of the object is reached.

      Fourth, the experiment relies on a colored LED that acts as an instructional cue, telling the monkey which action to perform. What is to stop the neurons from developing a cue-sensitive response, as in classic studies from Steve Wise and others in the premotor cortex? Perhaps the neuronal signal that the experimenters are trying to measure is partly obscured by other, complex responses influenced in some manner by the instructional cue?

      In principle, there is the possibility that purely sensory information is also represented in area F5, at least in some neurons or at certain points in time. We take your suggestion and discuss this as one of the alternative concepts (we call it "sensory concept"). However, several findings argue against this concept. For example, neural responses to cues usually represent the subsequent action, but not sensory information of the cue such as the color of the cue. In our study, it is evident from Figure 3A, 6A and 6B that during action execution, actions are discriminated even before the start button is released. Since this discrimination of actions occurs with a time delay after the cue and then increases continuously, this is evidence that the action to be executed is represented, but not the cue itself.

      Fifth, finally, and most importantly, the fundamental problem with this study is that it is correlational. Studies that purport to test the function of a set of neurons, and do so by use of correlational measurements, cannot provide strong answers. There are always half a dozen different interpretations and caveats, such as the ones I raised here. Both sides of a debate can always spin the results, and the arguments are never resolved. To test the mirror neuron hypothesis properly would require a causal study. For example, lesion area F5 and test if the monkey is less able to discriminate the actions of others. Or, electrically microstimulate in area F5 and test if the stimulation interferes (either constructively or destructively) with the task of discriminating the actions of others. Only in this way will it be possible to answer the question: do mirror neurons functionally participate in understanding the actions of others? The present study does not answer that question.

      We would like to reiterate that studies aimed at elucidating what information is contained in which neurons or areas are necessary to understand neural network processes and are a prerequisite for conducting well-considered experiments that measure behavioral effects through specific manipulation of the neural network. Without the work of Gallese, Rizzolatti and colleagues, the idea of associating F5 neurons with action understanding would not have occurred in the first place. The current tricky question is whether at all, and if so, to what understanding, to what perception, to what behavior that uses information about mental states of another, F5 MNs might be able to contribute. And for this, it helps to have a clearer idea of what information is contained in F5 MNs during action observation.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

      1. The Abstract states: "We report that the weakening of SJ integrity, caused by the depletion of bi- or tricellular SJ components, reduces ESCRT-III/Vps32/Shrub-dependent degradation and promotes instead Retromer-dependent recycling of SJ components." This is too strong, as the role of the retromer, while plausible, is not directly tested. It's fine to speculate about this in the Discussion but drawing a conclusion like this in the Abstract is unwarranted.
      2. Similarly, the title suggests that "ESCRT-III-dependent adhesive and mechanical changes are triggered by a mechanism sensing paracellular diffusion barrier alteration". They show that knocking down septate junctions alters localization of vesicle trafficking machinery, and that it leads to alterations in apparent recycling of cargo, but do they ever really assess whether these changes are ESCRT-III-dependent? Wouldn't this require knocking down ESCRT-III in cells with defects in septate junctions? There was a lot of data in this paper and perhaps I missed it but was this experiment done? I am not suggesting they do it, but that they temper this conclusion if not.
      3. The authors assessed "poly-ubiquitinylated proteins aggregates appearance, marked using anti-FK2" . They need to define FK2-what does it detect.
      4. Fig 4-is this a clone, and are we far from the boundary? Make this clearer
      5. The authors state: "Despite these apparent similarities, we noticed that, in contrast to Shrub depletion, NrxIV did not accumulate in enlarged intracellular compartments upon Cora depletion" Could the authors reference a Figure here?
      6. The authors state: "Hence, if both Shrub and bSJ/tSJ defects lead to Crumb enhanced signals" It might be better to say "altered" as they then point out the differences.
      7. I found the Discussion challenging to follow. Rather than focusing on the core observations, it addresses many, not very well-connected speculative possibilities, and in my opinion, will be challenging for most readers to follow. I would encourage the authors to revisit it from top-to-bottom.

      Referees cross-commenting

      I think we largely agree that the authors present important data, but that certain points need to be better explained or more clearly documented. While Reviewer 1 is correct that adding context about the basolateral polarity proteins would be helpful, I do not feel as strongly about this as a deficit. The authors did not manipulate Scrib, Dlg or Lgl, and i think their polarity functions may be distinct from those of the more "structural" septate junction proteins analyzed here.

      Significance

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this paper, using the triploid biotype of planarian Schmidea polychroa, the first half of the paper presents the results of the analysis of genome structure and the second half shows that (de novo) mutations in individuals that undergo regeneration are passed on by the next generation.

      While I think this paper contains interesting biological findings, I am skeptical about its novelty. I was convinced by the results and discussion of the analysis of genome structure, but the results and that of the analysis of (de novo) mutation were very confusing. This may be due to my lack of knowledge in this field. But even so, the author needs to improve this manuscript so that the general reader will better understand it.

      Major comments:

      1. The author mentions that it is important to note that this study was conducted using a parthenogenetic triploid biotype. However, I think that the parthenogenesis undergoing by a triploid biotype of S. polychroa is very unusual. It is not typical apomictic parthenogenesis. Triploid oocytes arise by meiosis from hexaploid oocytes derived from triploid adult somatic stem cells called neoblasts. On the other hand, haploid sperm arise by meiosis from diploid spermatogonia derived from neoblasts. Embryogenesis of triploid eggs then occurs by pseudogamy. Occasional sex is also known to occur even if the offspring's chromosome number remains triploid. I think this background is important information to give the reader. Also, don't the authors need to treat the results in this paper with this complex phenomenon also taken into account?
      2. Fig.4B-C: Analysis by lineage-specific mutations of parental controls.<br /> The authors do not specifically mention or discuss this result. What about the accumulation of mutations within such populations in typical parthenogenesis (daphnia and aphids)? In other words, are the results in Fig. 4B-C due to the special mechanism for parthenogenesis in the triploid S.polychroa as described above?
      3. Throughout this paper, the authors show that regeneration increases de novo mutations in the progeny. The authors conclude that many of the mutations occurred in neoblasts during regeneration. However, I would like you to explain the biological significance of this results in S. polychroa, which naturally does not reproduce by fission and regeneration. There are already reports of mutations accumulating in neoblasts in Dugesia japonica, which reproduce aexually by fission. For these reasons, I do not think this paper presents extremely novel results.
      4. p15, Discussion:<br /> "Tissue regeneration is best seen in the liver of mammals, and the regrowth of relapsed tumours following surgery can also be considered an example of a regenerative process. Mutagenesis accompanying these processes is relevant to subsequent tumorigenesis or the development of resistance, and the planarian system can provide a useful model for the mutagenic effect of tissue regeneration."

      Isn't it an overstatement to associate the regenerative system of planaria with the liver regeneration of mammals?<br /> 5. p10, Results:<br /> "We compared the two de novo spectra to the spectrum of germline heterozygous SNPs, present in all animals, and found that the pattern of germline substitutions resembled more closely the de novo spectrum of the control group (Fig 5D, Fig S3), implying that regeneration has a minor contribution to germline mutations in S. polychroa populations."<br /> p14, Discussion:<br /> "The high similarity of the spectrum of heterozygous SNPs and de novo mutations of control animals suggests that the species primarily reproduces in a non-regenerative manner. The increased mutation rate and the altered mutation spectrum upon regeneration confirmed our hypothesis that regeneration is a mutagenic process."

      I was very confused by these sentences and it took me some time to understand them. Triploid S. polychroa naturally does not reproduce by fission and regeneration, namely a non-regenerative manner. I do not understand why the author insists on this. Please explain the results for the regenerated case in Fig. 5D (0.88) in a way that is also easy to understand. Also, what is the biological significance of asserting here that de novo mutation by regeneration increases in a species that does not increase by regeneration and division in the first place?

      Minor comments:

      1. The author should add a schematic diagram showing the distribution of reproductive organs in Fig.1 to help the reader understand that the ovaries are not included in the regenerative fragment.
      2. P12, line12: Fig 6D-E, it's F, not E, right?
      3. P9, line 8:<br /> "these mutations were missing from the original egg but were present in the egg laid by the parent and thus represent the total mutation load of a generation."

      The author mentions that the de novo mutation found in offspring derived from parents that do not undergo regeneration was already present in the eggs, but I can find no evidence of this. Can you rule out the possibility that these mutations occurred between hatching and adulthood?<br /> 4. p10, Results:<br /> "Interestingly, the majority of mutations were shared in the siblings F4A and F4B. This suggests that the germ cells of these animals were descendants of the same stem cell, which underwent a high number of cell divisions early during the regeneration process prior to oocyte differentiation. The same finding also confirms that the detected clonal filial mutations were present in the respective oocyte and were not generated by embryonic cell divisions."

      The shared de novo mutations detected in the siblings (F4A and F4B) derived from the parent that underwent regeneration in Fig. 5A suggest that the germ cells of these siblings are descended from the same stem cell. The authors say that these mutations occurred in a large number of cell divisions early in the regenerative process prior to oocyte differentiation.

      So why is there no shared de novo mutation in the siblings (Fc4A and Fc4B) derived from the non-regenerating parent in Fig. 5A? As mentioned in Minor comment 3, the author states that the de novo mutations were already present in the parent-laid eggs, but when did these mutations, which are not shared, arise?<br /> 5. p11, Results:<br /> "Interestingly, in the case of FR4A-FR4B sibling pair, shared de novo mutations present in both were subclonal in R4 in a proportion comparable to the other samples (7/15 by WGS, 46.7%), while the three unique mutations could not be detected in R4 by the PCR approach, indicating again that the unique mutations, which amounted to approximately 10% of total clonal filial mutations in these two animals, arose late during germ cell regeneration."

      "during germ cell regeneration." the expression is too vague to know which stage you are referring to. In relation to minor comment 4, why not create a new chart to clearly show when the expected mutations occurred?<br /> 6. p12, Results:<br /> "Altogether 7/30 regenerant mutations were detected in PR animals, and these included those with the highest AF in the regenerants (Fig. 6C). This suggests that parental animals, even before regeneration, contained a diverse set of stem cells, and some of the detected de novo mutations in the filial generation resulted from the expansion of mutation-containing stem cell clones contributing also to germ cells in the regenerant animals."

      If the mutation in the offspring is derived from the parent (PR) prior to the time of tail amputation, wouldn't it be wouldn't it be strange to assume that it is a de novo mutation?<br /> 7. p12, Results:<br /> "The remaining 23/30 R- subclonal mutations may have arisen during regeneration. On average, ~250 dividing neoblasts were detected in cut tails of animals from the same population as the sequenced individuals, as determined by immunofluorescence of phosphorylated H3 histone (Fig 6D-E). However, the high proportions of body cells carrying regenerant-specific mutations suggest that certain stem cells contribute to disproportionately large parts of the regenerated body, including the germline."

      I did not quite understand the relevance of this discussion to the photos shown here of the M period (Fig. 6e).

      Significance

      General assessment: This paper contains important biological information. The finding that mutations in planarian stem cells cause diversity in the next generation of parthenogenesis is very interesting. However, I think that the author needs to carefully explain and change his argument, for example, that the mutations were caused by regeneration, which does not naturally occur in the species used.

      Advance: The finding that accumulation of mutations is occurring in planarian stem cells has already been reported in Dugesia japonica. Please cite the papers and clarify what is the key finding in this paper.

      Audience: Basic Research_Evolutionary Ecology, Developmental Biology (Stem Cells), Reproductive Biology

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      My field of study is reproductive biology. I am familiar with the transcriptome but unfamiliar with genome analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for carefully reading our manuscript and for all considerations, ideas, suggestions, and comments. These were all very helpful for us to strengthen the scientific statements of our manuscript. Please, note that all changes are marked in red in the manuscript and supplement. Below you will find, point by point, our responses to all questions and comments.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Overall, this is an exciting work. There are, however, several open questions that the authors could address to facilitate understanding of their work. These points are:

      1.) On page 5, lines 113ff, the authors mention the membrane bulges that they analyse in figure 1. They show these deformations by light (confocal) and electron microscopy. However, the bulges seen by confocal microscopy seem to be bigger that those seen by electron microscopy. The authors could quantify the sizes of the bulges for clarification.

      We quantified the size of the membrane bulges. At the confocal we measured in average 750nm as mean value of identified bulges (n=12) with 650nm as minimal and 890nm as maximal sizes. At the TEM we measure ~243nm as mean value (n=61), with a range between 62nm as minimum and 442 as maximum value. These measurements are shown as Figure 1E.

      Please note that measurements of TEM images do not always capture the three-dimensionality of bulges and may show only parts of them. In addition, ultrastructure is more sensitive and can easily detect small membrane changes that we cannot observe with confocal and airsycan microscopy. In contrast, even with our high-quality objective (63x Zeiss Plan Neofluar, Glycerin, 1.3 NA), standard confocal analysis is limited at ~200nm on the XY axis (airyscan ~110nm) and ~450nm on Z-axis. Therefore, TEM analysis detects smaller bulges than confocal analysis, and consequently, this method detected a large range of bulge lengths between 63nm and 441nm. In contrast, the airyscan method detected a range of bulge length between 0.65 and 0.83 µm. However, confocal and TEM analyses provide evidence of membrane bulges in pio mutant embryos. Please note that we extended our studies and now show membrane bulges in two different pio mutant alleles (17C and 5M) with airsycan microscopy.

      2.) The subject of the manuscript is rather complicated; presentation of data from Figure 1C and D on lines 113ff and 169ff is confusing.

      We apologize and thank the reviewer for careful reading. We revised both paragraphs (lines 108 – 123 and lines 166 - 174) and are confident that the descriptions are now much more understandable. All changes are marked in red.

      3.) The quality of the sub-images of Figure 2E differs. Especially, the phenotype of the wurst, pio transheterozygous embryo is not well visible.

      We apologize for it. We repeated the experiment with wurst;pio transheterozygotes, and generated wurst;pio double mutant embryos to improve the quality. The gas filling assay is shown in Fig. 3. With brightfield microscopy in overview images (10x air objective) and close-ups of the dorsal trunks (25x Glycerin objectives). Both show the gas-filling defects of dorsal trunk tubes. In a subsequent confocal analysis of chitin stainings in late-stage 17 embryos, we found that tracheal tube lumens are collapsed in the transheterozygotes and double mutant embryos.

      4.) Lines 246ff: the protein size are given for the mCherry:chimeric proteins; an estimate of the native Pio portions should be given.

      The endogenous Pio protein has a calculated mass of about 50.82 kDa. We state it now in the according legend of Fig. 6.

      5.) In Figure 6A, the appearance of chitin in the wildtype tube is different compared to the Np mutant situation, more filamentous. Can the authors comment on that?

      The author is correct. The chitin cable formation in Np mutant embryos is normal but lacks the condensation process, and, therefore, fiber structure of the chitin matrix differs from control embryos in late stage 16 and stage 17 embryos (see Drees et al., PLOS Genetics, 2019).

      6.) In the discussion section, I would appreciate if the timing of events was discussed or even shown in a model. The central question is: how are the functions of Pio and Np coordinated in time? As I understand, Np should not cleave Pio before morphogenesis is completed. Is there any example in the literature for how such an interaction could be controlled? The overexpression of Np shows that either the ratio between Np and Pio is important, or the btl promoter expresses Np at the "wrong" time point.

      We thank the reviewer for this interesting comment.

      Of course, we did not measure forces, but it has been published that axial forces appear at the apical cell membrane during stage 16 tube expansion. Our data show that Np cleaves Pio ZP domain and subsequent release increase during stage 16. The cleaved and released Pio enriches in the lumen during stage 16, from where cleaved Pio is internalized during stage 17 with the help of Wurst-mediated endocytosis. This is supported by several in vivo studies, video microscopy, antibody stainings and biochemical data, such as the interaction of Pio and Dumpy as well as the identification of different Pio products with and without Np cleavage. Moreover, we found membrane bulges that increase in size during stage 16 and identified a subsequent tear-off of the chitin matrix in Np mutant embryos. Thus, we propose that Np is required to cleave Pio-Dumpy linkages at the membrane-matrix when tubes elongate and postulated forces appear at the cell membrane during tube elongation in stage 16 embryos.

      We stated this in the discussion as follows:

      “The membrane defects observed in both Pio and Np mutants indicate errors in the coupling of the membrane matrix due to the involvement of Pio (Figs. 1,7). ..., the large membrane bulges in Np mutants affect the membrane and the apical matrix (Fig. 7). Since apical Pio is not cleaved in Np mutants (Fig. 7D), the matrix is not uncoupled from the membrane as in pio mutant embryos but is likely more intensely coupled, which leads to tearing of the matrix axially along the membrane bulges (Figs. 7, 9), when the tube expands in length.”

      How could Np be regulated at the membrane? Np is a zymogen that very likely undergoes ectodomain shedding for activation, similar to what has been described for matriptases. Additionally, human matriptase requires transient interaction of the stem region with its cognate inhibitor HAI-2, which Drosophila lacks (see Drees et al., PLOS Gen, 2019). Thus, the regulation of Np activation is not known.

      Further, we observed that Dumpy is not degraded in Np mutant embryos during stage 17. Nevertheless, in a previous publication, we showed that btl-G4 driven Np expression rescues Np mutant phenotypes in a time-specific manner. We used the btl-G4 driver line for these rescue experiments to express Np in tracheal cells. This restored tracheal Dumpy degradation in Np mutant embryos. Thus, btlG-G4 driven Np overexpression is able to rescue Np mutant tracheal phenotypes in a time-specific manner, although Gal4 is expressed from early tracheal development onwards. Further, btl-Gal4 driven Np expression mimics the endogenous Np, which is expressed from stage 11 onwards in all tracheal cells throughout embryogenesis (see Drees et al., PLOS Gen, 2019).

      Based on these experiments, we conclude that the btl-G4-driven Np overexpression can cleave Pio ZP domain in stage 16 embryos at the correct time.

      However, the ratio of Np expression and Pio is essential in the way that btl-Gal4 driven Gal4 Np overexpression may cause cleavage of a higher number of Pio proteins and the release of critical Pio-Dumpy linkages at the cell membrane and matrix. Thus, increased Pio shedding into the lumen reduces Pio linkages at the membrane, resulting in a pio mutant like tracheal overexpansion in btl-Gal4 driven Gal4 Np overexpression.

      Finally, we were able to prove the reviewer’s question in a new experiment. We used btl-Gal4 driven UAS-Np embryos for Pio antibody staining. This revealed Pio enrichment at the tracheal chitin cable in stage 14 and 15 embryos. In contrast, stage 16 embryos showed numerous Pio puncta appearing across the entire tube lumen, indicating that Np mediates Pio shedding specifically in stage 16 embryos and not before. This Np-controlled Pio releases modifies tube length control.

      Therefore, we stated this in the manuscript as follows:

      Results:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      Discussion:

      “The btl-Gal4-driven Np expression mimics the endogenous Np from stage 11 onwards in all tracheal cells throughout embryogenesis (Drees et al., 2019), suggesting that Np is not expressed at a wrong time point. However, the ratio between Np and Pio is essential. We assume that Np overexpression increases Pio shedding, resulting in a pio loss-of-function phenotype. Thus, the tube length overexpansion upon Np overexpression indicates that Pio cleavage is required for tube length control.

      Our observation that the membrane deformations are maintained in Np mutant embryos supports our postulated Np function to redistribute and deregulate membrane-matrix associations in stage 16 embryos when tracheal tube length expands. In contrast, Np overexpression potentially uncouples the Pio-Dpy ZP matrix membrane linkages resulting very likely in unbalanced forces causing sinusoidal tubes.”

      7.) Also for the discussion: We have two situations where Pio amounts/density are enhanced at the apical plasma membrane. The wurst experiments on lines 136ff show that Pio amount and density depends on endocytosis; is the wurst phenotype (Figure 2), at least partially, due to over-presentation of Pio? Likewise, in Figure 2C, there is more Pio in Cht2 overexpressing tracheae (but there is overall more Pio in these tracheae) - is actually endocytosis reduced in chitin-less luminal matrices? First: does the Pio signal at the apical plasma membrane correspond to membrane-Pio or free-Pio? Second, as in the case of wurst: would more Pio on the membrane (density) affect tracheal dimensions in Cht2 over expressing tracheae? Or are the consequences of Pio accumulation in the apical plasma membrane different in Cht2 and wurst backgrounds? Maybe cleavage of Pio and its endocytosis are dependent on its interaction with the chitin matrix. These questions connect to the question immediately above: how are the functions of the different players coordinated in space and time? We need a discussion on this issue.

      We thank the reviewer for this very important idea to discuss the functions of the different players in a coordinated space and time and apologize that we haven’t done before.

      As this is an important point, we tried to figure out all questions raised by the reviewer and discussed it in several new paragraphs in the discussion:

      "Indeed, the anti-Pio antibody, which can detect all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane marker Uif at the dorsal trunk cells of stage 16 embryos. Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents.

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.

      Furthermore, we show that Np-mediates Pio ZP domain cleavage for luminal release of the short Pio variant during ongoing tube length expansion. The luminal cleaved mCherry::Pio is enriched at the end of stage 16 and finally internalized by the subsequent airway clearance process during stage 17 after tube length expansion. Such rapid luminal Pio internalization is consistent with a sharp pulse of endocytosis rapidly internalizing the luminal contents during stage 17 (Tsarouhas et al., 2007). Wurst is required to mediate the internalization of proteins in the airways (Behr et al., 2007; Stümpges and Behr, 2011). In consistence, during stage 17, luminal Pio antibody staining fades in control embryos but not in Wurst deficient embryos.

      Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused the apical Pio density enrichment.

      Nevertheless, oversized tube length due to the misregulation of the apical cell membrane and adjacent chitin matrix may cause changes to local Pio set linkages and the need for Np-mediated cleavage. Strikingly, we observe a lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np function. The molecular mechanisms that coordinate the Np-mediated Pio cleavage are unknown and will be necessary for understanding how tubes resist forces that impact cell membranes and matrices. On the other hand, Pio is required for the extracellular secretion of its interaction partner Dpy. At the same time, Dpy is needed for Pio localization at the cell membrane and its distribution into the tube lumen. Consistently, in vivo, mCherry::Pio and Dpy::eYFP localization patterns overlap at the apical cell surface and within the tube lumen. These observations support our model that Pio and Dpy interact at the cell surface where Np-mediates Pio cleavage to support luminal Pio release by the large and stretchable matrix protein Dpy (Fig. 9).

      Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      8.) The sentence on line 242ff should be rephrased: "dynamic" and "elastic" are not opposites.

      We thank the reviewer for careful reading. We revised the sentence as follow:

      “Our FRAP data suggest that Pio is the dynamic part of the tracheal ZP-matrix, while the static Dpy modulates mechanical tension within the matrix”

      9.) A central question to me is the amounts and the density of factors in different genetic backgrounds as mentioned above. Is there any mechanism adjusting the amounts or the density of the players according to the size of the apical plasma membrane or the tracheal lumen? Pio seemingly responds to these changes.

      We would like to know the molecular mechanisms that control the density of players at the apical membrane. This question is important and could be the starting point for novel scientific investigations. Mechanisms of protein trafficking, such as exocytosis, recycling and endocytosis regulate delivery and internalization of proteins at the apical cell membrane. Furthermore, protein junctions at the lateral membrane may recognize and therefore may respond to low and high mechanical stresses between cells that appear during tube length expansion. However, we did not observe any hint for misregulation of Pio expression levels in the different mutants which affect endocytosis, SJs and luminal ECM. But we observed a shift of Pio levels between apical cell membrane/matrix and lumen in wurst, mega mutants and Cht2 overexpression. This shift is analyzed with diverse ZEN tools and quantified (Fig. 2D-F; Fig. S4B). As discussed in the new paragraph, this shift is very likely caused by changes at the apical cell membrane and chitin matrix which impact Pio shedding. Moreover, we observe the lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np-mediated cleavage. As discussed above, how Np is activated at the apical cell membrane to cleave Pio is not known.

      10.) The connection of Pio and taenidia is mentioned in the results section (page 7) but not discussed.

      We appreciate the careful reading and comments of the reviewer very much. We included the connection of Pio and taenidial in the discussion section as follows:

      “Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might also be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      11.) Dp remains cytoplasmic in pio mutant background - is the pio mutant phenotype due to defects by lack of Pio AND Dp function? What is the tracheal phenotype of dp mutants?

      It has been discussed that dumpyolvr and pio mutants show similar phenotypes in early tracheal development (Jazwinska, 2003) and it has been discussed that dumpyolvr mutant embryos compromise tube size in combination with shrub mutants. The additional quantifications of the dumpyolvr mutant showed significantly increased tube length (Dong 2014). We used dumpyolvr mutant [In(2L)dpyolvr], an X-ray induced mutation of the dumpy gene locus (Wilkin 2000). dumpyolvr mutant resemble pio null mutant tracheal phenotypes including detached dorsal and ventral branches and oversized tracheal dorsal trunk with curly appearance in late embryos. We included chitin and Uif staining’s of stage 16 dumpy mutant embryos (Fig. S10).

      This data suggest that Pio mutant phenotype is due to a lack of Pio and Dumpy, which would support our model, of Pio and Dumpy protein interaction in the extracellular space of the tube lumen.

      In wt embryos Pio is predominantly in the luminal chitin cable, in contrast in dumpy mutant embryos most Pio is predominantly not at the luminal chitin cable. Less luminal Pio staining in dumpy mutant embryos but Pio accumulation apically shows that Dumpy is required for luminal Pio release in stage 16 embryos. This supports our model that Pio and Dumpy interaction may link membrane and matrix and that this link reacts on mechanical stress during tube expansion by Np-mediated cleavage of Pio and its accompanied luminal release due to linked Dumpy.

      12.) Lines 374ff: the reduced dorsal trunk in Np mutants is not significant; the respective statement should be formulated carefully. If we believe the statistics (no significance), this would mean that attachment of the apical plasma membrane to the luminal chitin via Pio is needed to restrict axial extension; release of Pio is needed for differentiation (taenidia formation, luminal clearance) beyond morphogenesis.

      We agree with the reviewer that the reduction of the dorsal trunks in Np mutant is statistically not significant. However, the mean value is clearly below that of WT. Therefore, we revised our statement as follow: “In Np mutant embryos, tracheal dorsal trunk length shows the tends to be reduced compared to wt embryos.” Further, the btlG4-driven UAS-Np overexpression of Np suggests strong Pio release from the apical membrane and therefore resembles the pio mutant tube length overexpansion (Fig. 8A,B; Fig S13). Thus, our current observations indicate that Np-mediated Pio release at the cell membrane enables precise tube length elongation.

      We thank the referee for discussing that Pio is needed for taenidial fold formation which would fit to our findings in pio null mutant embryos. Pio mutant embryos show the appearance of taenidial folds in stage 16 embryos (airyscan) and stage 17 embryos (TEM images). However, TEM images also show chitin matrix reduction in pio mutant stage 17 embryos. Further, co-stainings of Pio with Crb and Uif, as well as co-stainings of mCherry::Pio with Dpy-GFP and cbp confirms that the Pio localize at the apical cell membrane where taenidial folds form in late stage 16 embryos. Thus, our observations suggest that Pio and Dumpy are required at the apical membrane and matrix to stabilize taenidial folds and tube lumen during 17. This also includes the Np-mediated Pio release at the apical cell membrane. As requested by the referee we summarized Pio function during late tracheal development in our simplified model (see Fig. 9).

      However, it is of note that Np-mediated Pio release increases at late stage 16 (Fig. 5A, 6D; Fig. S13) but is strongly reduced in stage 17 embryos. In contrast, thin taenidial fold are formed at late stage 16 and becomes thicker and form at fusion points during stage 17 and reach their most mature form when the intraluminal chitin cable is cleared (Öztürk-Colak et al., elife, 2016). Thus, the pattern of Pio release and taenidial fold differentiation do not fully match. Moreover, in preliminary experiments we observe Pio antibody staining in stage 17 embryos at the apical cell membrane of dorsal trunks (data not shown). Furthermore, lumen clearance of Obst-A, Knk, Sepr and Verm are not affected in pio mutant embryos, but unknown luminal ECM contents remained (Fig. 1D). Therefore, we will follow this very interesting idea in future experiments.

      Nonetheless, we state in the results that Pio shedding is essential:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      13.) Why don't we see the apical Pio signal in Figure 4B?<br />

      The red arrowhead points to apical mCherry::Pio punctuate staining in the Fig. 5B (before 4B) in the close up of the “bleached area” before bleaching and 56min post bleaching. However, in vivo bleaching experiments do not allow additional antibody stainings to detect precisely the apical cell membrane. Further, the Dpy::eYFP marks the tube lumen and the apical cell surface. The latter showed adjacent mCherry::Pio punctuate staining. However, due to bleaching Dpy signal was not detectable in the area.

      14.) The Strep signals in the merges in Figure 7C are not well visible.

      We are not sure which Strep signal the reviewer is referring to in Fig. 7C, which is now Fig. 8C. The top panel shows the Strep signal (right panel) overlapping with GFP in cells that do not express Np or human matriptase. Thus, the TGFB3 ZP domain is not cleaved, and the intracellular GFP and also the extracellular Strep signals are maintained and overlap.

      In contrast, when Np or human matriptase is added, the TGFB3 ZP domain is cleaved and only the intercellular GFP signal is retained, whereas the extracellular Strep signal is released from the cell surface. This explains why the Strep signal is barely detectable in the middle and lower panels of Fig. 8C.

      Reviewer #1 (Significance):

      This work brings together several factors (Pio, Dp, Np, Wst etc) already known to be needed for tracheal morphogenesis and differentiation in the embryo of D. melanogaster. Having worked myself with some of these factors, however, I recognize that the interaction between these factors is novel and very exciting. The experiments strongly indicate a new mechanism of cell-ECM connection that seems to be conserved to some extent (as they provide preliminary data on an example from humans). By integrating the functions of different factors, the work provides ample opportunity for future projects to elucidate this mechanism in detail. Therefore, I expect that it will have a significant impact not only on the field of developmental cell biology but also, due to the conserved proteins involved (ZP proteins, Matriptase), on the field of cell biology of human diseases.

      Reviewer #2 (Evidence, reproducibility and clarity):

      _The figures are clear, and the questions well addressed. However, I find that some of the claims are not completely backed by the data presented and have some suggestions that will hopefully make some points clearer.

      Major comments

      1.) In the abstract and at the end of the introduction the authors claim that they show that Pio, Dpy and Np support the balancing of mechanical stresses during tracheal tube elongation. However, this is not shown in this manuscript, where tension or mechanical stress were not measured and it is therefore speculative._

      As requested by the reviewer, we deleted “support balancing of” at the final sentence of the Introduction. Please, note that we did not use the term balancing of mechanical stresses at the abstract.

      However, we revised the abstract.

      It has been shown previously that forces and mechanical tension rise when apical membrane expands and elastic extracellular matrix, which is anchored to the membrane balances theses forces (Dong et al., 2014). Furthermore, its has been shown that the gigantic and elastic Dumpy protein modulates mechanical tension (Wilkin et al., 2000). Thus, these previous publications state that mechanical tension rise at the apical cell membrane and matrix when tubes expand during stage 16 and that Dpy is part of that molecular process, which we included in the abstract as essential background information.

      “The apical membrane is anchored to the apical extracellular matrix (aECM) and causes expansion forces that elongate the tracheal tubes. The aECM provides a mechanical tension that balances the resulting expansion forces, with Dumpy being an elastic molecule that modulates the mechanical stress on the matrix during tracheal tube expansion.”

      Nonetheless, our results show that Np-mediated Pio cleavage increases during stage 16 as response to tube length expansion which is accompanied by forces as postulated by others (see above). We further observe that the membrane bulges and chitin matrix tear off, when Pio cleavage does not occur in Np mutant embryos. Our data further show that Pio and Dumpy interact and that Pio release is prevented in Dpy mutant embryos. Altogether this suggests that the Np-mediated Pio cleavage responds to tube expansion and requires Dpy for luminal Pio release.

      We therefore claim in the final sentence of the introduction that “…ZP domain proteins Pio and, Dumpy, as well as the protease Np respond to mechanical stresses when tracheal tubes elongate”. The according changes are marked in red.

      2.) The authors state that all pio CRISPR/Cas9 generated mutants display identical tracheal phenotypes, however these data are not shown. Tracheal phenotypes, in particular DT phenotypes, of all mutants generated should be shown in supplementary materials.

      As requested by the reviewer, we included the data in the supplement. The pio5M and pio11R alleles showed embryonic lethality and a 100% gas filling defect resembling the pio17C allele. Additionally, we extended the tracheal analysis with the pio5M allele and identified tube size defects, irregular pattern of taenidial folds and apical membrane deformation, altogether resembling the pio17C allele. These new data are shown in the supplement Fig. S1.

      We clarify this in the results section as follows:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. “

      3.) At stage 16, pio null mutants display DT overelongation phenotypes (Fig. 1). The authors should quantify this phenotype.

      As requested by the reviewer, we quantified the DT overelongation phenotypes for pio5M (Fig. S1). The quantification of pio17C was shown already in Fig. 6B, now Fig 8B.

      4.) The authors analyse Pio distribution under tubular stress, using mega mutants and Chitinase overexpression. Pio localization changes in these genetic backgrounds and this is shown in Figure 2 only in a qualitative manner. The authors should measure Pio localization at the lumen and at the membrane and provide quantitative data.

      As requested by the referee, we measured Pio localization recognized by the anti-Pio antibody at the lumen and at the membrane to provide quantitative data. These are shown in Fig. 2E.

      All images were taken with a Zeiss Airyscan. For statistical analysis we used the the profile tool of the Zeiss ZEN 2.3 black software. This tool allows the measurement and comparison of fluorescence pixel intensities of individual channels. We determined the fluorescent intensities profile across the tube to identify values at apical membrane and tube lumen at minimum 10 different position of DTs (metameres 5 to 6) of two distinct embryos for each genetic background. The maximum values of membranes versus tube lumen were set into ratio and compared between control, mega mutant and Cht2 overexpression. The control embryos showed a ration below 0.4, the Cht2 overexpression a ratio of 1.2 and mega mutants a ratio of about ~0.9. These quantitative data confirm the statement that Pio localization increases at and near the apical cell membrane with respect to the lumen in mega mutants and in Cht2 overexpression embryos.

      5.) Surprisingly and interestingly, wurst;pio transheterozygotes display very strong tracheal defects. The authors say they observe gas filling defects; however it is not clear from figure 2E if this indeed the case. From the panel in the figure, it looks like these embryos suffer from strong tracheal morphogenetic defects. It would be necessary to have a better analysis of these embryos. What is the penetrance of this phenotype. If this is 100% penetrant, one would expect it to be lethal. Therefore, double mutant balanced stocks are not viable? Having analyzed the phenotypes and confirmed which morphogenetic defects the transheterozygote embryos present, how does this genetic interaction fit with the model presented?

      We are thankful to the reviewer for this interesting point of view suggesting that the wurst;pio embryos display tracheal morphogenetic defects. First, our data show that only 11.6% of the wurst;pio transheterozygous embryos completed gas filling and survived until adulthood. In contrast, 88.4% of transheterozygous wurst;pio mutant embryos did not complete gas filling which is now presented in Fig. 3B. The corresponding quantifications is presented in Fig. 3D. Importantly, the 88.4% wurst;pio transheterozygous embryos which show gas filling defects do not hatch as larvae and die.

      As requested, we performed a better morphogenetic analysis, which is presented in Fig. 3C. Analysis of the gas filling defects with light microscopy were repeated with a better objective (Zeiss Apochromat 25x Gly; 0.8 NA). Indeed, this analysis revealed a strongly compromised tube lumen morphology with irregular tube lumen pattern as if tubes twist and bend. This tube lumen deformation was further confirmed with the confocal analysis of chitin staining (cbp). The tube lumen of stage 17 transheterozygous wurst;pio mutant embryos showed irregular lumen pattern with unusual twists and even partially collapsed tubes.

      Furthermore, as asked by the referee, we generated the wurst,pio double mutation. All wurst,pio double mutant embryos lacked gas filling. In a more in-depth analysis of the tube lumen with a high-performance objective we could not identify any normal tube lumen in stage 17 embryos. Instead the double mutant embryos revealed completely collapsed tracheal tubes. This was confirmed by the chitin staining and confocal analysis. All new data are presented in the supplement.

      As shown in our manuscript and in previous publications, neither pio nor wurst mutant embryos affect cell polarity or gross organization of the actin and tubulin cytoskeleton. However, we found that wurst mutant embryos showed irregular apical membrane expansion at tube lumen (Behr et al., 2007; legend Fig. 4), irregular chitin fiber organization and to some extend collapsed tube lumen. In pio mutant embryos we found deformed apical membrane of DTs, irregular pattern of taenidial folds and to some extend collapsed tube lumen. Thus, the apical membrane is their common target of both proteins in late embryonic development, suggesting that pio functions provide stability and wurst functions the internalization of proteins at the apical membrane.

      We discussed it as follows:

      “Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused by the apical Pio density enrichment.”

      “Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      6.) mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments is very interesting. However, it is not clear to which degree bleaching occurs in the tracheal lumen. The authors claim that recovery is very fast and can be seen from minute 2, however, frame-by-frame analysis of Movie S2 does not show a clear different between luminal Pio from minute 0 to minute 2. Rough comparison with the luminal area surrounding the bleached area, does not show a clear difference in luminal Pio before and after photobleaching. To claim fast recovery of luminal Pio after photobleaching, the authors should quantify luminal Pio, before and after bleaching.

      We agree with the reviewer and deleted “fast”. The Video2 shows intracellular mCherry::Pio recovery within 2min after photobleaching. The Video 2 shows extracellular (luminal) recovery within 6min after photobleaching, when first large mCherry::Pio puncta appear at the apical surface of the bleached area. Nonetheless, mCherry::Pio puncta appear in the lumen indicating recovery, whereas Dpy::eYFP did not.

      We state this in the Results section as follows:

      “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      In addition, in figure 4D, the normalized mCherry::Pio fluorescence in the graph what does it refer to? Intracellular Pio?

      Figure 4D, now 5D, shows Western Blot signals. We guess that you refer to Fig 4B which is Fig. 5B.

      We are sorry for confusion and named it now Fig. 5B’.

      We stated in the Material section:

      “The bleaching was performed with 405nm full laser power (50mW) at the ROI for 20 seconds. A Z-stack covering the whole depth of the tracheal tubes in the ROI were taken at each imaging step. “Fluorescence intensity in the bleached ROIs was measured after correction for embryonic movements using Fiji.”

      Thus, to clarify this point, we added to the legends:

      “Fluorescence intensities refer to the bleached ROIs as indicated with the frame in corresponding Movie S2 and was measured after correction for embryonic movements.”

      7.) When mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments was done in an Np mutant background, the authors describe lack of Pio recovery within the lumen (Movie S3). However, when comparing control and Np mutant background embryos, Pio is not properly released into the lumen of Np mutants (as stated by the authors and seen by comparing movies S1 and S4). Furthermore, on minute 0 of the FRAP experiment in Np embryos, there is no detectable Pio in the DT lumen. Therefore, recovery was not expected in Np mutants and should not be claimed as a conclusion for this experiment.

      We thank the reviewer for careful reading and apologize our wrong description. We changed it accordingly as follows:

      “In contrast to the control, extracellular mCherry::Pio is not released into the tube lumen within 56 min after bleaching in Np mutant embryos (Fig. 6C, Video S3).”

      8.) Brodu et al (Dev Cell 2010) have shown that Pio is important for cytoskeletal modulation during tracheal maturation. Pio is important for non-centrosomal microtubule (MT) arrays anchored at the tracheal cell apical membranes. In addition, MT disruption in tracheal cells leads to lumen formation defects (Brodu et al, Dev Cell 2010). In the absence of Pio, the tracheal cytoskeleton is altered, and this could explain some of the results observed. Ideally, the work should be complemented with a basic cytoskeletal analysis, but if this is not possible, the authors should discuss some of the phenotypes in light of this Pio function.

      Dear reviewer, this is a great idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross irregularities can be realized. So, confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      Minor comments<br /> The model should not be in supplementary materials and should be moved to the main manuscript.

      We thank the reviewer for this suggestion and moved the model to the main part – now Fig.9. As requested by the reviewer 1, we extended the model, showing the timing events of Pio function.

      Throughout the manuscript embryonic stages are described using different nomenclature (stage X, stX and st X). Either way is correct, but the same nomenclature should be used throughout.

      We apologize for the different nomenclature and use "stage X" in the manuscript and "stX" in the figures for space reasons. Legend 1 clarifies the abbreviation.

      In Fig. S1 B and C the authors should specify which pio allele is being analysed (as in Fig. 7). The same should be done in the text.

      That's a fairly good point. To be clear from the beginning, we now state the following in the first paragraph of the results:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In the all other Figures, we show phenotypes of the pio17c allele.”

      Line 131, it is not correct to say that WGA visualizes cell membranes. WGA marks/stains cell membranes.

      Thanks for finding this mistake, it’s now corrected.

      Line 165 "leads to excessive tube dilation and length expansion due to strongly reduced luminal chitin" is not correct. Chitin reduction leads to excessive tube dilation but not to length expansion, as reported in the papers cited at the end of the sentence.

      Thanks very much for careful reading, we deleted “and length expansion” from the sentence.

      Line 220-221, what do authors refer to as "stage 16 wt-like control embryos"?

      Thanks for finding these mistakes. We corrected as follows:

      “In stage 16 embryos mCherry::Pio puncta….”

      Line 221, "some minutes" should be replaced by a specific number of minutes. According to Movie S2 reappearance of tracheal cell Pio happens from minute 16.

      We agree with the reviewer to state the time when mCerry::Pio puncta reappear. We observe first large puncta within two minutes after bleaching in tracheal cells at the ROI (Video S2, lower cell row at the movie). We further observe the reappearance of first large puncta at the ROI within 6 minutes in the tracheal tube lumen.

      We corrected it as follows: “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      Line 291 "time laps" should be lapse.

      Thanks for finding the typo, it is corrected now.

      Line 302, "Pio was not shedded into the lumen but remained at the cell" should be "Pio was not shed into the lumen but remained in the cell".

      Thanks for finding the typo, it is corrected now.

      _Referees cross-commenting

      I agree. Taken together, all the comments will improve the quality of the work and of a future manuscript. Also, everything seems quite doable and will not present any problems._

      Reviewer #2 (Significance):

      _The findings shown in this manuscript shed light on the regulation of tubulogenesis by ZP proteins and how their interaction with the ECM can be regulated by proteolysis. It was known that Pio is involved in tracheal development, is secreted into the lumen, regulating tube elongation (Jaźwińska et al., Nat.Cell Biol., 2003) and anchoring MTs to the apical membrane during tubulogenesis (Brodu et al, Dev. Cell 2010). This work provides additional molecular insights into Pio dynamics and regulation during tube maturation.<br /> This work will be of interest to a broad cell and developmental biology community as they provide a mechanistic advance in ZP proteins involved in morphogenesis. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Field of expertise:<br /> Drosophila, morphogenesis, tracheal tubulogenesis, cytoskeleton_

      Reviewer #3 (Evidence, reproducibility and clarity):

      _Summary<br /> In this manuscript, Drees and colleagues analysed, during the formation and growth of tubular systems, how cells combine forces at the cell membranes while maintaining tubular network integrity. A fundamental question is to understand how cells manage to integrate the axial forces to stabilise the cell membrane and the apical extracellular matrix (aECM).<br /> To address this question, the authors study the formation of the tracheal system in Drosophila embryos, a well-established and detailed model system to investigate formation of tubular networks. In particular, they focused on the formation of the larger tube of the tracheal network, the dorsal trunk. The formation of this tube depends in part of axial extension along the antero-posterior axis.<br /> They concentrated their work on the function of Piopio (Pio), a Zona-Pellucida (ZP)-domain protein. They showed that Pio together with the protease Notopleural (Np) contribute the sense and support mechanical stresses when tracheal tubes elongate, thus ensuring normal membrane -aECM morphology.

      Major Comments

      In a previous work, Drees et al. (PLOS Genetics 2019), showed the matriptase-prostasin proteolytic cascade (MPPC), is conserved and essential for both Drosophila ECM morphogenesis and physiology.<br /> The functionally conserved components of the MPPC mediate cleavage of zona pellucida-domain (ZP-domain) proteins, which play crucial roles in organizing apical structures of the ECM in both vertebrates and invertebrates. They showed that ZP-proteins are molecular targets of the conserved MPPC and that cleavage within the ZP-domains is a conserved mechanism of ECM development and differentiation.<br /> Here, Drees et al. investigate further how the coupling between membrane and matrix takes place to ensure proper tube growth.<br /> Pio distribution and phenotypes<br /> They first focused on the tracheal phenotypes observed in a pio null mutant context. So far, the only pio mutant characterised was a point mutation in the ZP domain. Using CRISPR/Cas9, they generated new alleles of pio which are lack of function alleles. In the context, Drees and colleagues observed over-elongated dorsal trunk tubes, with bulges appearing at stage 16 between the apical domain of tracheal cells and adjacent extra-luminal matrix.<br /> Additionally, pio mutant embryos showed impaired tube lumen clearance of the some of the aECM components, which prevent gas-filling of the airways.<br /> To detect Pio distribution, the authors used either anti-Pio antibody directed toward a short stretch with the Pio ZP domain or generated a CRISPR/Cas9 piomCherry::pio line.

      _

      1.) The Pio antibody shows a strong luminal staining as already published. But the authors reported an apical membrane signal in tracheal cells. I find this apical membrane signal really difficult to observe in panel Fig. 2B. The overlap between the Pio dots and the apical membrane labelled with Uif showed in Fig 2C can be due to the 3D projection. It is only when endocytosis is unpaired (Suppl Fig. 2), that data are more convincing.

      We thank the reviewer for this important point, we are sorry for the unconvincing presentation and for having the chance to improve it.

      We show the 3D image of Pio puncta as voxels overlapping with Uif at the apical cell membrane. The amount of Pio voxels overlapping with the Uif marked apical cell membrane increased in mega mutant and due to tracheal Cht2 overexpression. This result was indicated by a representative region (frame) and white arrows and is shown now in Fig. 2C.

      We further used orthogonal projections across the tracheal tube of the airyscan Z-stacks. Random usage confirmed that puncta of Pio antibody staining overlap with Uif at the tube lumen. We observed overlap in controls, but increasing overlap in mega mutant and Cht2 overexpressing embryos. This result is shown now in Fig. 2E.

      However, to overcome any misinterpretations of projections, we used single images of the original airyscan Z-stacks for co-localization analysis with the Zeiss ZEN software (black, 2.3, sp1). We used two available and independent standard methods to compare fluorescence pixel intensities of different channels namely the ZEN co-localization and the ZEN profile tool. Both are described in the Materials section.

      a.) With the co-localization tool we compared directly fluorescence pixel intensities of Pio and Uif. Highest overlap of the intensities, shown in the ZEN tool as third quadrant, were set to white for better visualization in the images. These new images are included as Fig. 2D and show recurrent overlap of Pio and Uif antibody stainings (punctuate pattern) along the apical cell membrane at the dorsal trunk of stage 16 control embryos. This overlap pattern increased in mega mutant and Cht2 overexpression embryos.

      b.) A second approach for comparing fluorescence intensities is the ZEN “profile” tool. Drawing a line across the tube allowed us to compare peak fluorescence pixel intensities of the different channels at distinct regions, such as the apical cell membrane and the tube lumen including the cbp marked chitin cable. This tool detected overlap of peak fluorescence intensities of UIF and Pio antibody staining’s, confirming that Pio is located together with UIF at the apical membrane of dorsal trunk tracheal cells. These new intensity profiles and the corresponding images are presented in the supplement as Fig. S4B-D. Quantifications of this method comparing the ration of Pio peak intensities between the apical cell membrane and the tube lumen are presented as Fig. 2F (as requested by Reviewer 2).

      2.) When the author used their CRISPR/Cas9 piomCherry::pio line to characterise Pio distribution (Fig.4), Pio is localised at the apical plasma membrane before stage 16. Only at stage 16, Pio is detected within the lumen. This timing of Pio release in the lumen is critical for the model proposed by Drees at al. This is an important point to assess the difference between the use of the antibody (which mostly label the lumen) while piomCherry::pio line is mostly at the membrane.

      We agree with the reviewer that the Pio antibody shows a different pattern within the tube lumen of earlier stages. The Pio antibody shows intense extracellular staining from early stage 12 onwards, presumably due to its early function at dorsal and ventral branches, as shown by Anna Jazwinska (Jazwinska et al., 2003). The intense luminal Pio antibody staining, predominantly at the chitin cable, persist until its disappearance due to airway protein clearance during stage 17. Unfortunately, this strong luminal Pio staining made it impossible to examine the Pio distribution pattern in more detail during stage 16. Nevertheless, Np overexpression experiments indicate that luminal Pio release occurs specifically in stage 16 embryos (Fig. S13), which was tested with the Pio antibody, see results, second last paragraph:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13).”

      We further agree with the reviewer that mCherry::Pio was used to characterize in vivo Pio distribution within the dorsal trunk cells and tube lumen during stage 16. The Fig. 5A shows apical mCherry::Pio distribution pattern in early and late stage 16 embryos. Importantly, the appearance of luminal mCherry::Pio increased during stage 16 and mainly enriched at late stage 16. See Figure 5A, red arrowheads point to apical Pio and red arrows to luminal Pio staining.

      Furthermore, as discussed above and shown by different ZEN tools, such as co-localization and fluorescence intensity profile tools, Pio antibody stainings revealed a punctuate pattern at the apical cell membrane of dorsal trunk cells in stage 16 embryos, which is reflected also by the appearance of apical mCherry::Pio puncta at the membrane surface. Additionally, we observed mCherry::Pio puncta also within the tube lumen (see the new Figures S4B & S8). Thus, subcellular Pio distribution at the apical cell membrane and lumen were observed for both, Pio antibody staining and mCherry::pio pattern.

      Nonetheless, there is different luminal appearance between the Pio antibody staining and mCherry::Pio. Pio antibody detects a short stretch at the ZP domain and thus detects all possible Pio variants, uncleaved and cleaved. Due to early tracheal Pio function, Pio enriches within the tube lumen in an intense core-like structure, which is recognized by the Pio antibody and is comparable with the Dpy::eYFP pattern. Also mCherry::Pio labels all Pio variants, uncleaved and cleaved. The spatial temporal mCherry::Pio expression pattern (Fig. S5) is comparable with the Pio antibody pattern and the staining at the membrane in stage 16 embryos. However, mCherry::Pio did not enrich in the lumen in a core-like structure, nonetheless, shows overlap with luminal Dpy::eYFP.

      Jaswinska showed that Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaswinska et al., 2003; Fig 2d). To understand more about the specificity of the antibody, we performed stainings in the null mutant embryos. In contrast, to the high number of intracellular Pio puncta in pio2R-16 point mutation embryos, Pio stainings were much more reduced in pio5m and pio17c mutants, but a low number of Pio puncta were still detectable in the embryos (Fig. S1G,H). It is of note that also dpy mutants showed strongly reduced Pio antibody staining (Fig. S10E). Thus, discussing underlying causes of enriched (Pio antibody) versus non-enriched (mCherry::Pio) luminal staining are speculative. However, observations by Jaswinska et al. (2003) and our new observations, investigating the Pio antibody stainings in pio null mutants, dpy mutants, eYFP::Dpy embryos and NP overexpression may hint to the possibility of cross-reactivity of the Pio antibody to other ZP domains which may intensify the appearance of luminal Pio antibody staining in control embryos.

      Anyway, we clarify the difference in luminal Pio pattern in the discussion as follows:

      “Indeed, the anti-Pio antibody, which detects all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane markers Crb and Uif at the dorsal trunk cells of stage 16 embryos (Fig. 2; Fig. S3,S4). Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio antibody staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents. In the pio5m and pio17c mutants Pio stainings were strongly reduced although some puncta were still detectable in the trachea (Fig. S1G,H). Similarly, Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaźwińska et al., 2003). Interestingly, also dpy mutants showed strongly reduced and intracellular Pio antibody staining (Fig. S10E).

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.”

      3.) Another important point is to explain the discrepancy between the pio mutant alleles. The allele containing a point mutation in the ZP domain shows no over-elongated tubes (Dong et al 2014, Jazwinska et al. 2003) while the lack of function alleles does.

      The reviewer is correct that the pio2R-16 mutation shows only a disintegration phenotype whereas our pio null mutations show in addition tube length defects. However, Dong et al. showed significantly increased dorsal trunk length in shrub; pio2R-16 double mutant embryos when compared with shrub mutant embryos (Supplemental Fig. S4A). Also, the shrub;dpyolvR double mutant embryos revealed increased tube length expansion when compared with shrub mutant embryos. Moreover, their quantifications show that the also dpyolvR mutant embryos revealed significantly increased tube expansion when compared with wt. Altogether these previous findings suggests that Pio and Dpy are involved in controlling tube length control during stage 16.

      Furthermore, we generated three independent pio null mutation alleles, which lost all the essential Pio protein domains, and caused all embryonic lethality, gas-filling defects, branch disintegration phenotype and tube length defects (quantifications are shown in Figs. 9 and S1). In addition, pio null mutations prevent Dpy::eYFP secretion. Thus, we are confident that the observed tube length defects as well as the air-filling defects are due to the loss of Pio, and in particular since these defects could be rescued by Pio Expression in the pio null mutation background, as shown in Fig. 3B.

      So, what could make the difference?

      The described pio2R-16 mutation allele contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain. It is not clear how the amino acid exchange affects the protein and the ZP domain. It may hamper pio function and maybe this amino acid replacement is problematic for the early tracheal function but not during stage 16. As stated by Jazwinska et al. 2003 (Fig. 2 legend), Pio antibody staining is intracellular in the mutants and extracellular in the trachea of wt at stage 13.

      They further speculate that the mutant Pio protein may retain in the secretory pathway, but this is not confirmed with co-markers. As luminal Pio function is required to provide a barrier for autocellular AJ formation, this fails in pio2R-16 mutation. In contrast, it is still possible that Pio interacts and supports Dpy secretion in pio2R-16 mutation and additionally it is thinkable that intracellular Pio may reach to some extend the apical cell membrane in pio2R-16 mutation stage 16 and thus can support tube size control. But these assumptions are speculations.

      Nevertheless, to clarify this point we explain the discrepancy between the pio2R-16 mutation and pio null mutations alleles as follows:

      “Using CRISPR/Cas9, we generated three pio lack of function alleles (Fig. S1A), all exhibiting embryonic lethality and identical tracheal mutant phenotypes. The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. The pio17c and pio5m null mutant embryos revealed the dorsal and ventral branch disintegration phenotype known from a previously described pio2R-16 mutation allele which contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain (Jaźwińska et al., 2003). Additionally, the late stage 16 pio17c and pio5m null mutant embryos showed over-elongated tracheal dorsal trunk tubes (see below).”

      4.) A minor point, the author should provide hypothesis to explain why only the clearance of CBP, Obstructor-A and Knickkopf are affected in a pio mutant background and not Serpentine and Vermiform.

      We thank the reviewer for careful reading and the comment on this point. We would be happy to see such a scenario which could give us a hind of Pio interaction partners at the chitinous matrix. However, we stated that luminal material, such as Obst-A and Knk are removed from the lumen (see Fig. S5A). We further describe that in pio mutant embryos, luminal Serp and Verm staining appeared reduced but showed wt-like distribution (see Fig. S6) in stage 16 embryos. We do not show Serp and Verm in stage 17 embryos, but they are removed from the tube lumen (not shown). These data are received from immune-staining’s and confocal analysis.

      Nevertheless, we also state that pio mutant embryos revealed lumen clearance defects in TEM analysis, of undefined material in the tube lumen (see Fig. 1D and Fig. S2B).

      To clarify this point we state in the results as follows:

      “Fourth, ultrastructure TEM images revealed aECM remnants in the airway lumen of pio mutant stage 17 embryos, while control embryos cleared their airways (Fig. S2B). Consistently, the in vivo analysis of airways in stage 17 pio mutant embryos revealed lack of tracheal air-filling (Fig 3B). The pan-tracheal expression of Pio in pio mutant embryos rescued the lack of gas filling (Fig 3B). Thus, TEM images suggest that pio mutant embryos showed impaired tube lumen clearance of aECM, which prevented subsequent airway gas-filling. “

      And

      “Also, the pio mutant embryos showed tracheal lumen clearance defects of chitin fibers in ultrastructure (TEM) analysis (Figs. 1D, S2B). In contrast, confocal analysis revealed that well-known chitin matrix proteins, such as Obstructor-A (Obst-A) and Knickkopf (Knk), are removed from the lumen of pio mutants (Fig. S5A). These results suggest that the Pio function did not affect airway clearance of Obst-A and Knk and therefore did not play a central role in airway clearance like Wurst. Nevertheless, airway clearance defects observed in TEM images in pio null mutant embryos and, in addition, defective tube lumen morphology in wurst;pio transheterozygous mutant embryos explain the occurrence of airway gas filling defects.”

      5.) Pio and Dumpy. Dumpy (Dpy) is another ZP domain protein secreted by the tracheal cells and detected in the lumen. To follow Dpy distribution, Drees and colleagues used a Dpy::eYFP protein trap line, the same used in Dong et al. However, in this latter paper, Dong et al. stated, using a Crb staining, that Dpy is not at the apical cell surface but only in the lumen. However, Drees and colleagues reported (line 227 and Fig. 4C) that Dpy appears both at the apical cell surface and in the lumen of the tracheal system. But they did not show a co-localisation with an apical marker. Furthermore, in their previous work, (Drees et al. 2019) they called the apical staining a "peripheral shell" layer. In addition, in S2R+ cell culture, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane. The in vivo localisation of Dpy is an important point that needs to be clarified as it is of importance for the final model proposed Supp Fig. 9.<br /> Drees at al. also performed FRAP experiments on Dpy::eYFP protein trap embryos. As excepted as already shown by Dong et al.

      The referee is correct, we state “In stage 16 embryos Dpy::eYFP (Lye et al., 2014) appears at the tracheal apical cell surface and predominantly within the lumen (Fig. 4C).” The corresponding Fig. 4C reveals Dumpy::eYFP staining overlapping with chitin at two subcellular regions: Dpy is enriched as a core-like structure within the lumen overlapping with the chitin cable of the control embryos. Additionally, Dpy::eYFP overlaps with the chitin part that might be part of the apical cell surface. But this observation is hard to see in images in Fig. 4C and we apologize it. We therefore repeated the Dpy::eYFP localization analysis and analyzed in more detail with the ZEN profile tools, which shows peak fluorescence pixel intensities of different channels and provides the possibility to prove, if they overlap in XY axis.

      We asked first, if cbp (chitin) appears at the apical surface of dorsal trunk cells, when Pio becomes cleaved and released. In mid stage 16 embryos cbp staining appeared in the luminal chitin cable and additionally in a distinctive pattern, which fits to the pattern of taenidial folds that start to form. We therefore used the apical cell membrane marker Crumbs to co-stain cbp. Airycsan microscopy fluorescence intensity profile analysis and corresponding close ups images confirmed the overlap of Crb and cbp stainings at this distinctive pattern indicating this shows the chitin matrix at the apical cell surface (Fig. S8A). But there was no overlap of cbp and Crb at the chitin cable structure. Thus, knowing the localization of the apical cell surface chitin matrix, we performed co-stainings of cbp with mCherry::Pio (RFP antibody). This revealed, as expected, overlap of cbp and RFP antibody staining at the apical cell surface chitin matrix (distinct pattern) and with the luminal chitin-cable (Fig. S8B,C). Finally we repeated the stainings and analysis with cbp, mCherry::Pio (RFP antibody) and Dpy::eYFP (GFP antibody). First, these results revealed overlap of Dpy::eYFP and cbp at the apical cell surface and in the tube lumen (Fig. S8D) and second, overlap of punctuate staining of Dpy::eYFP, cbp and mCherry::Pio at the apical cell surface chitin matrix and also at the luminal chitin cable (Fig. S8E).

      Very obvious from images and Z-projection in Fig. 4C is the lack of extracellular Dpy::eYFP staining in pio mutant embryos. Dpy::eYFP enriched intracellularly, and thus, the pio mutant caused Dpy::eYFP mis-expression fits well to our results from S2R+ cell culture. As the reviewer notes, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane.

      Altogether, Fig. 4C, cell culture experiments and our new stainings support our model, that Pio and Dumpy interact and are co-secreted at the apical cell membrane/surface, where Np mediates Pio cleavage. As requested by reviewer 2, we moved the model to Fig. 9. As requested by reviewer 1, we extended the model for timing events.

      A minor point, the Dpy::eYFP protein trap line used in this study is not listed in the Materials and Methods section of the supplementary data.

      Thanks, we included it into the List of sources (Supplement). This YFP-trap line (called CPTI lines) was published by Claire M. Lye et al., Development, 141, 2014. We cite it in our manuscript.

      6.) The serine protease NP and Pio release. Drees and colleagues have pervious shown, preforming in vitro studies, that protease Notopleural (Np) cleaves the Pio ZP domain (Drees at al. 2019). Here the authors went a step further in demonstrating that it is also true in vivo at stage 17. In addition, they showed that, in Np mutant embryos, mCherry::Pio is mostly detected within tracheal cells and the luminal staining is strongly reduced. In this mutant context, the authors conducted FRAP experiment on the mCherry::Pio signal even very weak in the lumen. They showed hardly no recovery after photobleaching.<br /> In Drosophila S2 cells, Drees and colleagues showed that co-expression of the catalytically inactive NpS990A with mCherry::Pio in showed as a prominent signal the 90kDa mCherry::Pio variant in the cell lysate (Fig. 5B), and live imaging revealed mCherry::Pio localisation at the cell surface (Fig. S6B). However, in this inactive form context, a strong signal is also detected at 60kDA corresponding to a cleaved form of the Pio ZP domain (Fig. 5B), and Pio localisation at the cell surface appears weaker than in controls. They authors did not consider that another protease could be at play.<br /> On the other hand, in their previous work, Drees et al. identified a mutant form of Pio (PioR196A) which is resistant to NP cleavage in vitro. It will be a step forward to establish by CRISPR/cas9, as the authors seems to be successful with this technique, a mutant line carrying this point mutation. It will be important to determine whether the observed phenotype resembles that of a mutant Np phenotype.<br /> In their previous work (PLOS Genetics 2019), in Np mutant embryos, Drees et al. did not report "budge-like" deformations from stage 16 onwards leading to the detachment of the tracheal cell from their adjacent aECM. Either the alleles or the allelic combination is different between the two studies which could explain this difference, or it is a new phenotype that has not been previously described. In the latter case, it becomes important to quantify the proportion of segments showing these bubbles. Is this a rare phenotype to observe?

      We thank the reviewer for the very interesting comments and the careful reading of our manuscripts and the very useful suggestions. We agree, the we cannot exclude the possibility that another protease is involved in the cleavage of Pio. Therefore, we included this important point in the discussion section as follows:

      “Unknown proteases may likely be involved in Pio processing since cleaved mCherry::Pio is also detectable in inactive NpS990A cells.”

      We think the generation of the pioR196A mutant to address Pio localization and tracheal phenotypes is a great idea, which we would like to address in future experiments. Unfortunately, the production of this fly line with such a specific point mutation at this position will take several months, not included the subsequent evaluation and phenotypic analysis of this fly line and mutants. Therefore, we apologize that we cannot pursue this question experimentally. Nevertheless, mentioning the possibility and the requirement of such an experiment is important and we discuss it as follows:

      “Previously we identified a mutation at the Pio ZP domain (R196A) resistant to NP cleavage in cell culture experiments (Drees et al., 2019). Establishing a corresponding mutant fly line would be essential in determining whether the observed phenotype resembles the phenotype of the Np mutant embryos.”

      However, knowing that we are not able to provide a new mutant fly line to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed, we sought to use an alternative approach by overexpressing Np in the trachea with btl-Gal4. This shows a clear pairing of Np overexpression and Pio release specifically at stage 16 dorsal trunk and associated tube overexpansion.

      Finally, the reviewer is correct, we did not mention the appearance of bulges in Np mutant tracheal dorsal trunk cells in our previous publication. We used that same Np alleles in 2019 and a closer look at the publication of 2019 likewise shows the appearance of bulges in Np mutant embryos, e.g. Fig. 1B (red-dextran, left part of the tracheal lumen shows bulges) and even the Dpy::YFP matrix tear off at the site of bulges (Fig. 4F’’, above the arrowhead). But we did not know at the time the link with Pio and Dumpy

      However, we agree, it is important to know more about the appearance of the phenotype by means of quantifications. The quantifications of bulges per dorsal trunk (n=16) is shown in Fig. 7B.

      7.) Minor point: I don't understand what the authors are trying to show in supplementary Figure 8. Tracheal cells detach and are found in the lumen?

      We are sorry for the unclear description in the legend. We corrected it as follows in the legend of Fig. S12:

      “This indicates disintegration of apical cell membrane at bulges and subsequent leaking of cellular content into the lumen.”

      8.) Np function conserved matriptase.<br /> In this work, Drees and colleagues showed that Np controls in vivo the cleavage of the Pio ZP domain.<br /> Dumpy and Piopio are not conserved in vertebrates but they both contain a ZP domain which is conserved. The authors tested if other ZP proteins can be cleaved by Np or the human homolog Matriptase. The authors tested in cell culture the ability of the type III Transforming growth factor-β receptor which contains a ZP domain to be cleaved either by Np or Matriptase.<br /> This could be a general mechanism that needs to be extended to other ZP domain proteins and that could be at play to structure the matrix and give it its physical properties.<br /> However, as it is all speculative, I find the discussion section related to these data, for too long and that does not help to understand better the work done in the formation of the tracheal tubes of the drosophila embryo.

      We show that Np mediates cleavage of the Pio ZP domain in vitro and in vivo in Drosophila embryos. We further showed that also the human matriptase was able to cleave the Pio ZP domain. To understand if this is a more general mechanism, we extended our studies with the human TβIII and its ZP domain. These data show that both Drosophila and human matriptases are able to cleave ZP domains of different proteins from different species. These data suggest that Matriptase-mediated ZP domain cleavage is not a Drosophila specific mechanism. We cannot follow the argumentation of the referee to state it all speculative. Nevertheless, we agree that it will need follow up studies to show that the mechanism is more general than two different species and ZP domain proteins. Anyway, as requested by the referee, we deleted the following sentences of the paragraph, since they are speculative in the context of our manuscript and do not directly describe a potential matriptase and ZP domain function:

      “Matriptase degrades receptors and ECM in pulmonary fibrinogenesis in squamous cell carcinoma (Bardou et al., 2016; Martin and List, 2019). TβRIII is a membrane-bound proteoglycan that generates a soluble form upon shedding (López-Casillas et al., 1991), a potent neutralizing agent of TGF-β. Expression of the soluble TβRIII inhibits tumor growth due to the inhibition of angiogenesis (Bandyopadhyay et al., 2002). Idiopathic pulmonary fibrosis (IPF) is associated with a progressive loss of lung function due to fibroblast accumulation and relentless ECM deposition (King et al., 2011; Loomis-King et al., 2013). “

      However, the comparisons of the tubular organ and the phenotypic expressions of the bulging membrane and the aortic aneurysm appear to us as an important element of the article. In both cases, cell membrane loses its integrity and can break in tubular networks. Thus, with our findings on the modification of extracellular ZP proteins, we offer a potential new molecular approach even for clinical investigation.

      9.) Minor points: Pio and cytoskeleton organisation.<br /> Line 78-79, the authors wrongly quoted a work from Brodu et al (2010). Pio does not anchor the microtubule severing enzyme Spastin. Instead, Spastin releases the microtubule-organising centre from its centrosomal location, then Pio contributes to its apical membrane anchoring. It can therefore be assumed that the organisation of the microtubule network is affected in a pio null mutant. In addition, ZP proteins have been shown to link the aECM to the actin cytoskeleton. Therefore, it would be interesting to look at the organisation of the actin and microtubule cytoskeletons in a pio mutant context in which enlarged apical cell surface area are observed.

      We are very thankful for finding this mistake in the introduction. We corrected it as follows:

      “Further, Pio is involved in relocating microtubule organizing center components γ-TuRC (γ-tubulin and Grips; gamma-tubulin ring proteins). This requires Spastin-mediated release from the centrosome and Pio-mediated γ-TuRC anchoring in the apical membrane.”

      Studying cytoskeleton in pio mutant embryos is a helpful idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross changes can be realized. The confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      _Referees cross-commenting

      I have just read the comments of the other two reviewers, who like me are specialists in the formation of the tracheal system in the drosophila embryo.<br /> I find the comments very fair and balanced. They are in the same spirit as my comments and are very complementary. I hope that all our comments will be constructive for the authors and will improve the quality of their work._

      Reviewer #3 (Significance):

      _Overall, the methodology is sound, the quality of the data is good and the paper is very well written. Authors combine in vivo, in vitro studies as well a cell culture approach. Using CRISPR/Cas9, they generated a large number of new tools allowing in vivo studies.<br /> Drees and colleagues generated new alleles of pio which are lack of function alleles. They described a new phenotype for pio mutant embryos, namely over-elongated tubes. But they authors do not comment on why these new alleles reveal a new phenotype. Furthermore, using their piomCherry::pio line, the authors state that Pio is localised to the plasma membrane. This location is very difficult to assess. Both new results require clarification.<br /> The authors had already demonstrated that Np cleaves the ZP domain of Pio in vitro. Here they demonstrate this in vivo. It appears important to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed.<br /> Finally, the model proposing a coupling between the extracellular matrix and the membrane of tracheal cells is very interesting. The demonstration that cleavage of Pio by Np could participate in this coupling is very interesting for those interested in the integration of mechanical stress and cellular deformation. However, such a model has already been discussed in Dong et al (2014). In this article, Dong et al. proposed that a "coupling of the apical membrane and Dpy matrix core is essential for tube length regulation".

      The audience for this article should be specialised and oriented towards basic research. It may be of interest to people working on tubular systems or working on ZP proteins.

      My field of expertise is cell biology and developmental biology in drosophila and formation of tubular networks._

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive criticism that helped us to improve the paper. We modified Fig.6I and Fig.7, replaced Fig.8, and added supplementary Figs. 3-5 and supplementary Tables S1-2. The manuscript was extensively re-written. A new paragraph was added in the Discussion section where relative adhesiveness was related to absolute adhesion strength and the cadherin knockdown result to earlier findings.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: This work examines the relationship between cell-cell contacts and pericellular matrix in Xenopus chordamesoderm, which is a tissue actively involved in convergent extension during gastrulation. By lanthanum staining of pericellular materials, the authors found that different types of pericellular matrix are present in cell-cell contacts in the chordamesoderm, which may mediate cell-cell adhesion. Knockdown of C-cadherin, Syndecan-4, fibronectin, and hyaluronic acid leads to the reduced abundance of cell contacts and cell packing density, but this does not seem to affect convergent extension. Based on these observations, the authors propose a model in which cell-cell contacts involve the interdigitation of distinct pericellular matrix units.<br /> Major points:

      1. Knockdown of adhesion molecules separates cells and leads to wide contacts with large interstitial spaces. Data in figure 1 show loosely packed morphant chordamesoderm cells. Intuitively, these should reduce cell-cell adhesion. However, a main conclusion from this manuscript is that reduced abundance of narrower contacts does not decrease adhesiveness. Although depletion of adhesion molecules modifies but not abolishes a contact, non-attached free surfaces increase significantly in morphant cells. It is therefore not easy to understand that how reduced cell contacts have no effect on cell adhesion.

      We added a section to the Discussion to address this issue (p.11ff). We show in the Results section (modified Fig.7) that relative adhesiveness is indeed significantly reduced in the morphants (Syn-4 always being the exception) when compared in the contact width range of normal chordamesoderm. However, contact width is strongly increased in the morphants, and adhesiveness increases linearly with width. We argue that these effects compensate for the initial lowering of adhesiveness. In other words, adhesive contacts become shorter (more gap surface) but wider (see Fig.6I), and become the more adhesive the wider they become. As in the original version of this paper, we then propose a model that explains the empirically observed increase of adhesiveness with width. How the abundance of cell-cell contact is reduced is less clear yet. Pericellular matrix deployment and structure is strongly affected by adhesion factor knockdown, and contact types are altered. Some contact types seem to widen but remain adhesive, others become non-adhesive, and still others may disappear without being replaced (see last paragraph of Discussion). To add detail to these notions and clarify this important issue to satisfaction will require future research.

      Importantly, the adhesiveness was not experimentally tested.

      Due to external circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to interpret our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a new section of the Discussion (p.11ff).

      1. It is surprising that reduced cell contacts, at least narrower cell contacts, do not affect convergent extension. Does this mean that active cell behavior changes in the chordamesoderm, which are required for convergent extension, are independent of cell contact types?

      We actually claimed that all treatments inhibited convergent extension, except for Syn-4 (Barua et al. 2021, and this manuscript, p.3, Fig.1B,C). Syn-4 knockdown had a dramatic effect on cell contacts, cell density and cell shape but none on convergent extension, at least up to the middle gastrula stage. This is surprising and does not fit easily to current views of cell intercalation during convergent extension, but analysing the underlying cell behaviors is beyond the scope of this article.

      1. Although the formation and localization of pericellular materials are differentially affected after knockdown of adhesion molecules, there is no clear evidence showing that different types of pericellular matrix mediate cell-cell adhesion in the chordamesoderm. It is possible that the disrupted distribution of pericellular materials in morphants only represents a secondary consequence of changed cell contacts. This may be supported by the fact that knockdown of adhesion molecules reduces narrow contacts and increases LSM-free gaps.
      2. The relationship between contact width spectra and LSM is also very elusive. Again, changes in contact width or abundance and distribution of LSM may be indirectly caused by loss of adhesion molecules. Therefore, although knockdown of adhesion molecules leads to changes of LSM localization, it cannot be concluded that cell-cell contacts in chordamesoderm are mediated different types of pericellular matrix.

      We find it difficult to interpret for example Fig.5A-F other than assuming an adhesive role for the pericellular matrix, in this case LSM, in normal and morphant tissue. What else would here hold two cells between two gaps together? The contacts are often much too wide for cadherin-cadherin binding. We indeed believe that changes in contact width or abundance are caused by the loss of adhesion molecules, directly or indirectly. Our LSM images show that remarkably, modified contacts (e.g. Fig.3D,F; Fig.5B,C) are still able to keep cells together over some distance, between interstitial gaps, and our quantitative data indicate similarly that e.g. contact widening is consistent with continued adhesion. However, some of the contacts may become non-adhesive, or be lost without being replaced, increasing non-adhesive gap surface. This is discussed now on p.11, middle paragraph.

      1. In contrast to the present observations, works by others using the same morpholinos have shown that Cadherin-dependent cell adhesion, fibronectin-rich extracellular matrix, and Syndecan-4-regulated non-canonical Wnt signaling are required for convergent extension. These discrepancies need to be appropriately addressed.

      As mentioned above, we found that all treatments affected convergent extension, as expected from the work of others and our own, except for Syn-4 depletion. We noticed that in the paper by Munoz et al. on Syn-4 overexpression and knockdown, only late gastrula/early neurula stages were evaluated. Syn-4 knockdown produced moderately strong axis defects, perhaps in part related to impaired neural plate closure. Unfortunately, we did not follow our morphants to these later stages to see whether defects developed then. But our main interest here is cell-cell contacts.

      1. If LSM and LSM-free contacts are similarly adhesive, what will be role of LSM in cell adhesion and how cell adhesion is established in these LSM-free contacts?

      We discuss now more explicitly the notion that gastrula non-epithelial cell adhesion is mediated by a mosaic of pericellular matrix patches of different composition, some containing LSM in different configurations, others not, but each similarly adhesive.

      Minor points:<br /> 1. It may be helpful to clearly define the pericellular matrix in this particular context and its relationship with LSM. It is also necessary to clarify whether the adhesion molecules examined in this work are considered as components of the pericellular matrix.

      We explain the use of these terms at the end of the first paragraph of the Introduction. The most general term is pericellular matrix; part of it is La3+ labeled – LSM; and some of the LSM can be compared to structures which in other systems are termed glycocalyx. We consider the adhesion molecules examined to be part of the pericellular matrix but are aware of other putative functions, like in cell signaling, which may indirectly affect contacts and thus contribute nevertheless to the phenomena studied here.

      1. In figure 1B, it appears that the Cadherin morphant has defects in chordamesoderm elongation and archenteron formation, suggesting impaired convergent extension.

      We find, in agreement with the work of others, that C-cad knockdown impairs convergent extension, and mention this when we describe Fig.1B.

      1. In figure 1C, the Syndecan-4 morphant gastrula clearly shows enhanced anteroposterior elongation of chordamesoderm and archenteron in comparison with the wild-type embryo. This seems to suggest that loss of Syndecan-4 promotes the movements of convergent extension. However, previous studies indicate that both gain and loss of Syndecan-4 impairs convergent extension.

      As mentioned above, late gastrula/early neurula stages were evaluated in the Munoz et al. paper, mid-gastrula stages in our work. One possible explanation would be that mild axis defects develop later, partly in connection with neural tube elongation and closure.

      1. Ideally, in knockdown experiments, control embryos should be injected with corresponding mismatch morpholinos.

      We explain in the Methods section that we only used morpholinos that were extensively characterized in previous publications.

      1. In figure 1E, it is unclear what type of cell contacts the light green arrowheads indicate.

      This is explained now in the figure legend.

      1. Figure 1 legend, "(wt) is from Barua et al. 2021". I am not sure it is appropriate to use previously published data.

      The present data were derived by further evaluations of the same samples and TEM sections as used in Barua et al. 2021. We show the previously published data (acknowledged in the legends) here for easy comparison (instead of citing the previous paper).

      1. There is no light blue arrowhead in figure 2, and in figure 3B and 3I, it seems that the same colored arrows are used to indicate different structures.

      This has been corrected.

      1. Triple-layered contacts are not clearly defined.

      We define this term now repeatedly, as consisting of two LSM layers enclosing a non-labeled layer between them.

      1. Page 2, "based on driven by" should be either "based on" or "driven by".

      Has been corrected.

      1. Page 8, "selectin" should be "selecting".

      Has been corrected.

      Reviewer #1 (Significance):

      Strengths:<br /> Demonstrated the effects of several adhesion molecules on the formation of cell contacts and pericellular matrix in Xenopus chordamesoderm.<br /> Limitations:<br /> The significance of chordamesoderm cell contact changes in convergent extension or gastrulation is not clear;

      Effects on gastrulation of PCM or membrane adhesion molecule depletion have very often been described as mediated by effects on cell signaling. Without excluding such possibilities, we liked to redirect attention here to other putative mechanisms by describing basic effects of treatments on cell-cell contacts including PCM deployment and structure. Future work must relate the specific, often dramatic, contact changes upon depletion of a specific factor to cell behavior during convergent extension and other tissue movements.

      there is no direct evidence showing the functional link between pericellular matrix, cell contacts and cell adhesion;

      Please see our response to main points 3 and 4 above.

      the absence of effects on convergent extension after depletion of several adhesion molecules is not fully consistent with previous reports.

      Please see our response to main points 2 and 5 and minor point 3 above.

      Advance: This work likely provides some fundamental and methodological advances for studying cell-cell adhesion. It shows promise for elucidating mechanisms underlying the regulation of cell contact changes in tissues involved in morphogenetic movements.<br /> Audience:<br /> This work likely interests readership studying embryonic cell adhesion in the field of developmental biology and cell biology. It may be also potentially interesting for people working on glycocalyx pericellular matrix in adult tissues.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: During gastrulation, cells within vertebrate embryos require the ability to both adhere to one another and rearrange with their neighbors to shape the emerging body plan. These authors posit that such flexible adhesive contacts are mediated in part by the pericellular matrix (PCM), including multiple types of glycocalyces containing molecules such as fibronectin, hyaluronic acid, and syndecans, which they previously characterized in multiple embryonic tissues (Barua et al, PNAS, 2021). Here, in a follow-up to their 2021 study, the authors use electron microscopy to characterize the pericellular matrix within the chordamesoderm of Xenopus gastrulae. They identify several types of adhesive contacts within the chordamesoderm and assess how they are altered in the absence of key PCM molecules via morpholino knock-down. They conclude that syndecan-4 and hyaluronic acid comprise and promote assembly of PCM plaques whereas fibronectin and C-cadherin anchor them to cell surfaces. Cell packing density is decreased upon loss of all 4 of these molecules, which the authors attribute to a decrease in the number of cell contacts without affecting the strength of the remaining contacts. They further conclude that adhesiveness increases linearly with contact width, and that this relationship is unaffected by loss of any aforementioned adhesive/ PCM molecules.

      Major comments:<br /> Many conclusions in this manuscript are based on measurements of cell contact angles, which indicate the reduction of tension at cell contacts vs. free cell surfaces and thus relative adhesive strength. While this lab previously applied the same approach to live tissues (David et al, 2014), it is not clear to what extent such measurements accurately reflect adhesive strength in fixed tissues and/or electron micrographs. Especially given the issue of random sectioning planes, which cause distortion of contact angles. Although a correction was applied, the authors note this is not theoretically derived because the heterogeneity of gap sizes made such calculations too difficult. Indeed, it appears that the large gaps between cells within morphant embryos affect contact angle measurements, but if this is corrected for in any way, it is not mentioned.

      Geometrically determined contact angle distortion should affect angle or relative adhesiveness distributions in all conditions or treatments similarly and thus should not or only little affect comparisons of distribution peaks, averages, etc. Beyond this effect of random sectioning planes, we don’t see how large contact width should by itself affect measurements of angles.

      Because this is the sole measure of cell adhesion provided in the study, this reviewer is not convinced of the conclusion that loss of PCM components does not affect adhesive strength.

      In response to this criticism, we re-evaluated our adhesiveness-width data (Fig.7A-E). We noticed that there is indeed a reduction of relative adhesiveness when morphants are compared to normal chordamesoderm within the width range of the latter. But the addition of increased widths in the morphants and the linear increase of adhesiveness with width compensated or overcompensated the initial reduction of adhesiveness.

      Could such measurements not be made from live cells/tissues after manipulating PCM components, as the lab has done previously? Because the lab already has the necessary reagents and expertise for such experiments, the time and resources needed for such measurements shouldn't be prohibitive.

      Due to circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to analyze our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a section added to the Discussion (p.11ff).

      • As mentioned above, these authors previously measured adhesive strength in live Xenopus cells and tissues (David et al, 2014). In that study, they found that C-cadherin MO reduced relative adhesiveness whereas the current study found that relative adhesiveness actually increases in this condition. What explains this discrepancy?

      We explain now in the new Discussion section (p.11ff) and with the help of supplementary Figure S5 how adhesion strength and relative adhesiveness are related overall (tissue surface vs. cell contacts) and at gaps within a tissue (gap free cell surface vs. cell contacts). In the previous study (David et al, 2014), we discussed relative adhesiveness in relation to overall adhesion strength, and both are decreased upon C-cad knockdown. Here we examined these parameters at interstitial gaps, where we find a small increase of relative adhesiveness, due to overcompensation caused by a strong increase of adhesiveness with contact width. Using our David et al, 2014 data we quantitated the effects. We previously found a similar increase of relative adhesiveness at gaps in C-cad morphant ectoderm (Barua et al. 2017) which we could not explain at the time, but explain now by analogy to our chordamesoderm results.

      • No control morpholinos are used, and for the morpholinos that are used, the doses are very large. An equally high dose of control MO should be used to ensure that all observed phenotypes are specific.

      We detail in the Methods section that we used here and in previous publications only previously characterized morpholinos.

      • It appears that all the images analyzed were collected in the sagittal plane, and the analyses don't seem to consider the intrinsic polarity of the chordamesoderm. For example: cells in different positions within the tissue (basal vs. apical), or that WT chordamesoderm cells are mediolaterally polarized and actively intercalating whereas disruption of PCM components like fibronectin disrupts cell intercalation and randomizes cell polarity. It is possible that 1) cell-matrix (in basal cells) and 2) cell-cell (during intercalation) interactions may affect the measurements made in this study. In other words, that cell contacts could differ by position within the embryo and intercalation/polarity status... have such effects been accounted for in the current analysis?

      Here we only analyzed cell contacts deep in the chordamesoderm. Basal contacts were examined to some extent in Barua and Winklbauer, 2022, apical contacts not yet. Our present analysis is based on sagittal sections. The cells in the chordamesoderm are elongated and aligned mediolaterally but not in register, i.e. they are randomly wedged between each other. Thus, all mediolateral positions in cells should be present in our samples. Nevertheless, trends in the occurrence of contacts related to medial-to-lateral positions on cells (e.g. recognizable in spindle-shaped cells as wide vs narrow cell cross-sections) may have escaped our attention, and in particular, the protrusion-bearing medial and lateral ends of cells may develop special contacts. However, our goal in this study was to analyse basic properties of cell-cell contacts in this tissue, as a foundation for further detailed studies.

      • In this study, the authors state that chordamesoderm movements are preserved in syndecan-4 morphants, and in their 2021 article (Barua et al) they state that convergent extension movements are accelerated. But another study describing this MO found that it causes severe convergent extension defects (Munoz et al, NCB, 2006). What explains this discrepancy?

      In their knockdown experiments, Munoz et al. find relatively mild axis defects in late gastrula/early neurula stage embryos while we studied the mid-gastrula. Perhaps defects develop during later stages in Syn-4 morphant embryos.

      Also, the syn-4 morphant showed in Fig. 1 appears more developmentally advanced than the other embryo... if the embryos are not stage matched it could affect the measurements and conclusions drawn from them.

      Stage matching was not possible since C-cad and FN morphants did not involute or engage in convergent extension (i.e. were arrested at the initial gastrula stage), Syn-4 morphants appeared to gastrulate faster than normally. Therefore, embryos were strictly time matched. A limitation remains, that the time course of cell contact development over gastrulation was considered low priority in this initial study and was thus not determined.

      • In figure 7, the authors plot relative adhesion (measured from contact angles) vs. contact width, then fit regression lines to the lower boundaries of these scatter plots. It is not clear why this analysis is focused only on the lower boundaries rather than considering the full spread of the data. Particularly for syn-4 morphants, whose values do not appear to be concentrated along the lower boundary. This analysis is further confused by the introduction of alpha*, which represents relative adhesiveness relative to the regression.

      The lower boundary line is most convenient to extract (Fig.7A’-E’). But we agree that the “interior” of the scatter plot distribution should also be analyzed. Using average adhesiveness gives rise to artifacts since the density of data points decreases strongly with contact width but also with distance from the lower boundary, leading to the preferential disappearance of large adhesiveness values for higher widths. Instead, we constructed a line tracing the highest density in the scatter plot near the lower boundary (Fig.7B’’-E’’), by determining the positions of adhesiveness distribution peaks in consecutive width brackets (new Fig.8, Fig.S3). We abstained from introducing alpha*.

      • Based on these regression lines alone, the authors conclude that all 4 conditions are similar enough to pool the data for further analysis. If these contacts have different properties, which the data in Figures 1-6 suggest they do, it seems inappropriate to pool them together.

      We no longer pooled the data, except in supplementary Fig.S4 where we consider angle distortion. Instead, we show in Fig.8 relative-adhesiveness frequency distributions for different treatments and width brackets. This emphasizes differences between the different adhesion factor depletions and shows that adhesiveness is not simply normal or log-normal distributed, in agreement with different contact types contributing differently though similarly to overall adhesion. It also allows to follow main peaks as they shift position with width, roughly in proportion to the lower surface boundary.

      Based on this pooling, the authors then conclude that relative adhesiveness increases linearly with contact width over the entire width range, regardless of adhesion factor depletion. This again assumes that all contacts (morphant and WT) are functionally equivalent, and that what is observed in morphant embryos in very wide contacts would also hold true in WT contacts. But because WT contacts occupy only a small portion of the width range, we cannot know how they would behave if scaled to be wider, and I am not convinced that very wide morphant contacts are representative of or functionally equivalent to WT. In other words, we cannot know that contact width is the only factor increasing their relative adhesion, given the experimental manipulations that structurally alter these contacts.

      Although differences between contact types are apparent, we think that the contacts function very similarly. We still hold that relative adhesiveness increases with contact width, as seen in each of the separate plots for wt and adhesion factor depletions. But re-evaluating the alpha-width scatter plots now we show that in the narrow width range of normal chordamesoderm, C-cad, FN and Has depletions show similar, significantly decreased relative adhesiveness (Fig.7A-E). With alpha proportional to width, and width strongly increased in morphants, this initial decrease is compensated in total adhesiveness averages. The relative independence of adhesiveness from contact type could hint at non-specific PCM-PCM adhesion (Winklbauer, 2019). We think that although adhesion factor depletion leads to the loss of some contact types or renders others non-adhesive (thus lowering contact abundances), it modifies some contact types (e.g. by widening them) while only moderately lowering their adhesiveness per unit interaction surface.

      Minor comments<br /> - In their descriptions of PCM in different experimental conditions, the authors overstate some conclusions drawn from EM data. For example, that type I glycocalyces are absent in chordamesoderm (although this signal is only reduced),

      We qualified the statement.

      or that because the Has2 morphant phenotype is intermediate between C-cad and fibronectin morphants this indicates an adhesive role for hyaluronic acid.

      Overall, Has2MO increases the abundance of gaps, i.e. HA normally reduces gaps between cells, strongly suggesting an adhesive role of HA. HA is also required for the formation of 10-20 nm gaps, again proposing a direct or at least indirect adhesion-promoting role.

      • The authors state of the data in figure 1 that "All treatments significantly increase the size of non-adhesive gaps", but they don't show a quantification of the gaps size (they show the abundance).

      Has been corrected.

      • The authors state that LSM contacts exist as 10-20 and 20-50 nm subtypes. It is not clear what about the data suggest this division.

      In the LSM width difference spectra, CadMO and SynMO both increase the abundances of ≤ 20 nm contacts and decrease those of 20-50 nm contacts (Fig.4). The different response suggests at least two differently reacting subtypes.

      • In the same paragraph, the authors state that "C-cad and Syn-4... favor LSM width between 20-50 nm." What is meant by "favor"? Given that the number of 20 nm contacts is increased and 50 nm contacts is decreased in both conditions, this statement is unclear.

      The whole paragraph has been reworded.

      • On page 7, the authors say that the size of LSM structures is "consistent with larger plaques being assembled from small units", but if that were the case, wouldn't the plaque sizes be multiples of the size of a single unit? I.e. 100, 200, and 300 nm peaks? Because this is not the case, the data seem more consistent with a continuous range of LSM plaque sizes than with discrete units.

      The size of the units has a peak at 100 nm but a long tail (Fig.6F-H). Moreover, we discuss lateral compression (piling up of PCM material) or active stretching of plaques (to separate units for interdigitation), all factors that would blur plaque length patterns, i.e. we did not expect plaque sizes to be multiples of 100 nm.

      • On page 8, the authors refer repeatedly to LSM volume. Given that these measurements are made from TEM sections, how is volume being measured?

      This is explained now (p.7).

      • The authors present a model in which PCM interdigitates within cell contacts, but this is based on measurements from static tissues alone. Could the measurements of contact width instead be explained by compression of the PCM or some other mechanism? The data as presented don't rule out such possibilities.

      The model is in agreement with the linear increase of relative adhesiveness with contact width, with LSM height at gap surfaces not adding up to adjacent contact width, with visible interdigitation of glycocalyx units (“bushes”) described previously for prechordal mesoderm (Barua et al. 2021), and with the good agreement of calculated unit size with the size of measured LSM units. In addition, it agrees with literature data on endothelial glycocalyx plaques being composed of 100 nm units and of complete interpenetration of glycocalyces during blood cell adhesion.

      Some terms used are not clear, for example: "partial LSM", "triple layer contact", "random removal [of LSM plaques]".

      We point out the meaning of the terms now more clearly. That “partial LSM” is identical with “triple layer contact” (but shorter, for use in figure) is explained in the legend to fig.6.

      • In figure 5, the graphs depict negative "abundance". Recommend "difference in abundance" instead.

      Done. For shortness, Δ Abundance.

      • Statistics: In figure 1I, it is not clear what the asterisk in this graph means or if statistical differences between these groups was determined. And in figure 6, some groups are marked as n.s., but P values for groups that are statistically different are not presented.

      The asterisk in fig.1I was meant to indicate that this column is from Debanjan et al. 2021, but this is indicated by different shading and mentioned in the legend. The non-used n.s. marks were removed.

      Reviewer #2 (Significance):

      This detailed electron microscopy study advances our understanding of pericellular matrix within vertebrate embryos and how loss of its constituent molecules affects cell interactions. It further addresses the relationship between structurally distinct pericellular matrices and their adhesive properties, although this analysis is less convincing. This study adds to a body of literature in which cell-cell and cell-matrix adhesion are known to regulate morphogenetic cell movements, but how such contacts are remodeled as cells rearrange is poorly understood. Previous work has also used measurements from live cells, embryos, and tissues to infer physical forces within embryos such as adhesive strength, cortical tension, and viscosity. This work follows up directly on a previous study from this group that characterized glycocalyces within various tissues within Xenopus gastrulae by electron microscopy. The hypothesis that pericellular matrix enables flexible/fluid adhesion within highly dynamic embryonic tissues is exciting, and is likely to be of interest to developmental biologists - particularly those who apply mechanical concepts to embryos. However, additional evidence, preferably from live tissues and embryos, is needed to support this hypothesis. This assessment is based on over 15 years' experience studying gastrulation morphogenesis in multiple vertebrate species.

    1. Reviewer #3 (Public Review):

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      2. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      3. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      4. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      5. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

  6. Jul 2023
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please find our point-to-point response to the reviewer’s comments below, where we marked all changes implemented in the manuscript in italics.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD

      (note from the authors: we apologise if this has not properly transpired from the manuscript but the difference between the TGDs is substantial and relevant: one has less than 3% of the protein left and hence can be considered to fully inactivate MCA2 and has a growth defect whereas the other contains about two thirds of the protein (1344 amino acids/~66% are left), has no growth defect, although it lacks the MCA2 domain (hence that domain can not be critical for the growth defect)),

      that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      We thank the reviewer for this thorough and insightful review.

      The limitations mentioned above were addressed in the response to the main points and a general detailed response in regards to the systems used for this research are added at the end of this rebuttal. Briefly summarised here: while we agree that there are limitations of the system used, we are convinced that

      • the advantages of using a large tag in most cases outweighs the drawbacks as it permits to track the inactivation of the target, if need be on the individual cell level

      • while not optimal for MyoF, the partial inactivation actually helps in its functional study as detailed in major point 23&28 or reviewer#3 major point 11: it shows a consistent correlation of the phenotype with different causes and degrees of inactivation (this is now better illustrated in Figure 1L1M). Further, regarding the concern of the large tag: the effect of the tag based on localisation was overestimated in the review by what seems to have been a mix up comparing numbers from MyoF with a number from MCA2 (there is a difference, but it is only small) (see reviewer#1 major point #23).

      • KS is the optimal method for most of the assays in this work (e.g. bloated food vacuole assays and RSAs); these assays would be impossible or difficult to use with other inactivation systems currently used in P. falciparum research (see details in the response to the specific points and after the rebuttal)

      In regards to the difficulty to interpret KIC12 data: this is only true for measuring absolute essentiality, everything else we believe we actually have the optimal method. If not KS, which method targets a specific pool of a protein with a dual localisastion? Again, our assays targeting the K13 pool and revealing the specific function would have been difficult or impossible with any other system.

      Ultimately the question is whether any other system would have resulted in a different conclusion on the function of the proteins studied. At present we are confident this would not be the case and other systems probably would not have delivered the specific functional data shown in this work. Clearly, more in depth work will provide more nuanced and detailed insights into the proteins analysed in this work and this likely will also include the use of other systems for specific aspects they are most suitable for. However, this (e.g. different complementations in a diCre cKO) is complex and therefore beyond what fits into this work which had the goal to assess which proteins are true positives for the K13 compartment and to place them into functional groups in regards to endocytosis.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question.

      The reviewer remarks that not every experiment was done for every target. Based on the rebuttal we tried to amend this but also note that there was some sentiment by the reviewers to better stick to the point and not make the manuscript more disjointed. We attempted to balance that as much as possible and hope we were able to honour both aspects (amendments were done as detailed in the point by point response below).

      In regards to endocytosis and choice of targets: We did do endocytosis assays for all proteins that showed a growth phenotype upon inactivation in this work. We therefore assume the reviewer here refers to major point #40 asking for endocytosis assays with KIC4 and KIC5 (which were not studied in this manuscript) as well as MCA2 (point 17). We fully agree with the reviewer that this would fill a gap in the work on K13 compartment proteins but such assays are difficult with TGDs (there are issues with non-comparable samples and compensatory effects) and proteins that are not essential (and hence likely have a smaller impact on endocytosis when truncated). We nevertheless now carried them out, but due to the limitations to do this with these lines would be hesitant to draw definite conclusions (see major point 17 and 40 for details and outcomes).

      But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      We would like to point out that the importance of the K13 compartment and endocytosis goes beyond ART resistance (see e.g. also newly published papers on the K13 compartment in Toxoplasma, (Wan et al., 2023; Koreny et al., 2023)). Endocytosis is an essential and prominent process in blood stages. However, in contrast to processes such as invasion, our understanding about endocytosis is only rudimentary. Hence, this manuscript provides important insights on an emerging topic that in our opinion deserves more attention:

      • it identifies novel proteins at the K13 compartment and provides 2 new proteins in endocytosis (MyoF and KIC12); getting an as complete as possible list of proteins involved in the process will be critical to study and understand it

      • it leads to the realisation that not all growth-relevant proteins detected at the K13 compartment are needed for endocytosis

      • it provides domains and stage specificity of function for several K13 compartment proteins, overall bolstering the model of endocytosis in ART resistance and providing a framework critical to direct future studies on endocytosis and their detailed mechanistic function at the cytostome

      • the identified vesicle trafficking domains (for instance now also found in UBP1) are expected to strengthen the support for the role of endocytosis of the K13 compartment; this and also the above points are important as (based on the current literature) there still seems to be prominent sentiment in the field that (in part due to the involvement of UBP1 and K13) the cause of ART resistance is due to various unclearly defined stress response pathways

      • with MyoF it also shows the first protein in connection with the K13 compartment that acts downstream of the generation of hemoglobin-filled containers in the parasite and provides the first protein that explains the suspected involvement of actin in endocytosis (so far this was only based on CytD studies)

      Overall we therefore believe this manuscript contains critical information and a framework for future studies on endocytosis and the K13 compartment. We hope the relevance of endocytosis as one of the most prominent and essential processes in the parasites and the connection to various aspects linked with many commercial drugs (in addition to the role of endocytosis in ART resistance), is adequately explained in the introduction. We also would like to mention that the main focus of the work is reflected in the title of the manuscript which does not mention ART susceptibility.

      Major Comments

      1) line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      We apologise, but we do not fully understand this comment. We did identify novel proteins not before known to be at the K13 compartment (MCA2 (admittedly this one was likely but had not previously been verified), MyoF, KIC11 and KIC12). In our view "further defining the composition of the K13 compartment" therefore is an accurate statement. Additionally, the identification of previously not-discovered domains, the stage-specificity and function of these proteins helped to further define the K13 compartment.

      If the reviewer is referring to the fact that the proteins analysed in this study were taken from a previously generated list of hits, we would like to stress that the presence in such a list (obtained from a BioID, but also if from an IP etc) can not be equalled for them to be true positives, they are merely candidates that still need to be experimentally validated. This is what we did in this work to find out which further proteins from the list can be classified as K13 compartment proteins (for hits with lower FDRs this is even more relevant as illustrated by the fact that 6 of the here analysed hits were not at the K13 compartment). In an attempt to address this comment in the manuscript, we changed the wording of this sentence to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      We realized that the groups description wasn’t clear in the abstract. Please see response to major comment #41 for a detailed answer to this (endocytosis is an overarching criterion, ART resistance is a subgroup and applies only to those proteins with a function in endocytosis in ring stages). To clarify this (see also major point #8) we added an explanation on the influence of stage-specificity of endocytosis on ART susceptibility to the introduction (line 76): In contrast to K13 which is only needed for endocytosis in ring stages (the stage relevant for in vitro ART resistance), some of these proteins (AP2µ and UBP1) are also needed for endocytosis in later stage parasites (Birnbaum et al., 2020). At least in the case of UBP1, this is associated with a higher fitness cost but lower resistance compared to K13 mutations (Behrens et al., 2021; Behrens et al., 2023). Hence, the stage-specificity of endocytosis functions is relevant for in vitro ART resistance: proteins influencing endocytosis in trophozoites are expected to have a high fitness cost whereas proteins not needed for endocytosis in rings would not be expected to influence resistance.” The abstract was changed in response to this and other comments and hope it is now clearer in regards to the groups.

      3) Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth".

      We fully agree with the reviewer, we reworded the sentence as suggested.

      4) Line 40: please change 'second group' to 'this group'

      We reworded this part of the abstract and it know reads: (line 38): “While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process.”

      5) line 41: state here that despite it being essential, it is unknown what it is involved in.

      With the newly added data we show that this protein either has a function in invasion or very early ring development although we did not see any evidence for the latter. We therefore changed the sentence to (line 43): “We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion*..” *

      6) Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Done as suggested.

      7) Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Done as suggested.

      8) Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      As suggested by the reviewer, we included a sentence about non-K13 mutations linked with reduced ART susceptibility in the introduction (line 74): Beside K13 mutations in other genes, such as Coronin (Demas et al., 2018) UBP1 (Borrmann et al., 2013; Henrici et al., 2020b; Birnbaum et al., 2020; Simwela et al., 2020) or AP2µ (Henriques et al., 2014; Henrici et al., 2020b)* have also been linked with reduced ART susceptibility." *

      We here also added data on fitness cost that is related to this and is also relevant for the issue of proteins with a stage-specific function in endocytosis, making a transition for this statement which might help clarifying the grouping of K13 compartment proteins (see also major point #2).

      9) Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      We thank the reviewer for pointing this out, Ref 43 was removed from the manuscript.

      10) Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      We agree with the reviewer that we did not identify further candidates, we identified new K13 compartment proteins from the list of potential K13 compartment proteins. We therefore changed “identified further candidates” into “identified further K13 compartment proteins” (line 116). Please see also response to major comment #1.

      11) Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      This is a good point. One reason why we did not analyse more in our previous publication was that we had to stop somewhere and adding more would have been very difficult to fit into what was already a packed paper. However, as shown in this work, the list does contain further interesting candidates (e.g. K13 compartment proteins that are involved in endocytosis).

      We altered the relevant part of the introduction to highlight that we previously analysed the top hits, clarifying that the 'remaining' hits analysed in this work were further down in the list. This now reads: (line 113)“We reasoned that due to the high number of proteins that turned out to belong to the K13 compartment when validating the top hits of the K13 BioID (Birnbaum et al., 2020), the remaining hits of these experiments might contain further proteins belonging to the K13 compartment.” We hope this clarifies that we simply moved further down in the candidate list.

      12) Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      We thank the reviewer for alerting us to this. The issue here is that the 3 non-analysed proteins belong to a 'lower stringency' group comprising hits significant with FDRThe information about ranking is now also included as “Table legend” in the revised manuscript and the Table heading has been changed to: List of putative K13 compartment proteins, proteins selected for further characterization in this manuscript are highlighted.”

      13) Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      This mutation was first spotted in the MalariaGEN database (https://www.malariagen.net) (MalariaGEN et al., 2021), which allows online accessing of the data by using the “variant catalogue” tool, which is in a table format of frequency rather than in a sequence context. Hence, only after further research later on it became evident to us, that this mutation does not occur alone when looking at individual MCA2 sequences from patient samples in (Wichers et al., 2021b). We hope this is accurately reflected in our results section.

      14) Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts.

      The key difference lies in transcription vs protein expression (usually protein levels peak after mRNA levels peak and - depending on turnover - protein levels can stay high even after mRNA levels have declined). Figure 4 of the Birnbaum et al paper presents transcriptomic data, but with a peak in trophozoites (The axis label in Fig. 4l of that publication is a bit confusing, as hour 0 is at the top, 48 h at the bottom; it is clearer in Fig. S13 of that paper) which would fit very well with the multiplication of the signal between trophozoites and schizonts mentioned by the reviewer. So, overall, the temporal peaks of transcripts and protein of that protein fit well.

      For the signal in rings: Likely the protein has a turnover rate that is sufficiently low for some protein to be taken into the new cycle after re-invasion. Also different transcriptomic datasets e.g. (Otto et al., 2010; Wichers et al., 2019; Subudhi et al., 2020) available on plasmoDB show some mRNA present across the complete asexual development cycle, with each dataset showing maximum peak at a slightly different stage.

      Even when located in foci and hence aiding detection of small amounts of protein (as is the case for MCA2-Y1344-GFP), the MCA2 signal in rings is not strong. For MCA2-TGD, the GFP signal is dispersed and therefore likely below our detection limit, while the same amount of protein concentrated at the K13 compartment is visible as foci in the MCA2-Y1344 cell line. Please note that MCA2-TGD has only 2.8% of the protein left whereas MCA2-Y1344 has 66.5% left and based on our manuscript is almost fully functional, hence fitting the different locations between the two versions.

      Overall we believe this shows that there are actually no significant discrepancies of the expression of the different MCA2 versions.

      15) Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      We appreciate the reviewers caution here. However, considering that MCA2Y1344STOP-GFPendo co-locates with mCherryK13 and endogenously HA-tagged full length MCA2 does the same to a similar extent, there is in our opinion little doubt that MCA2 is found at the K13 compartment and that this is similar with both constructs. If there are minor differences, these might as well occur if MCA2 is episomally (as suggested in the comment) instead of endogenously expressed. Given the limited insight, we therefore decided against the episomal overexpression (which due to its size of > 6000bp may also be somewhat less straight forward than it may sound).

      16) Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      We agree that this can’t be categorically excluded. However, a ~5 fold difference in ART sensitivity was observed between the parasites with MCA2 truncated at amino acid 57 compared to those with MCA at amino acid 1344 even though both do not contain the MCA2 domain. Hence, at least this difference is not dependent on the MCA2 domain. The larger construct missing the MCA domain shows only a very moderate reduction in RSA survival, again suggesting the MCA domain is not the main factor. We amended our statement in an attempt to more accurately reflect the data (line 487): This considerable reduction in ART susceptibility in the parasites with the truncation at MCA2 position 57 compared to the parasites still expressing 1344 amino acids of MCA2, despite both versions of the protein lacking the MCA domain, indicates that the influence on ART resistance is not, or only partially due to the MCA domain.” We would be hesitant to state the reviewer's conclusion that “resistance is dependent on the loss of the MCA domain”, as the larger construct missing the MCA2 domain has a milder RSA effect compared to MCA2-TGD, which suggests the reduction in ART susceptibility is independent of the MCA domain. These considerations also agree with the fact that the parasites with the longer MCA2 version (in contrast to the MCA2-TGD) do not have any detectable growth defect which indicates that the protein can fulfil its function without the MCA2 domain.

      17) Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      So far were very hesitant to do bloated FV assays with TGDs (even though TGDs were available for the genes encoding MCA2 and KIC4 and KIC5). The reason for this was:

      1. the fact that these proteins could be disrupted indicated either redundancy or only a partial effect on endocytosis which might lead to only small effects that likely are difficult to pick up in an assay scoring for the rather absolute phenotype of bloated vs non-bloated. Using the refined assay measuring FV size could partly amend this but we note that also FV without hemoglobin have a certain size, reducing the relative effect if there are smaller differences.
      2. a TGD line does not permit tightly controlled inactivation of the target which makes comparing the outcome of bloated food vacuole assays difficult if there are smaller growth and stage differences to the 3D7 control.
      3. in contrast to conditional inactivation parasites, the TGD lines had ample times to adapt to loss of the target protein (compensatory mechanisms are well known for endocytosis, for instance in clathrin mediated endocytosis loss of individual components can be compensated (Chen and Schmid, 2020)). We nevertheless see the reviewer's point that this should at least be attempted and now conducted these assays (see also major point 40). For MCA2 (as requested in this point), the data is shown in Figure S5C-E. This assay showed that in MCA2-TGD, MCA2Y1344STOP-GFPendo (similar to the 3D7 control) >95% of parasites developed bloated food vacuoles. Additionally, we also measured the parasite and food vacuole size of individual cells in an attempt to solve some of the problems with TGDs with such assays. In order to specifically solve problem 2 mentioned above, we analysed the food vacuoles of similarly sized parasites, however, they were non-distinguishable between the three lines. Of note, in agreement with the reduced parasite proliferation rate (Birnbaum et al., 2020) a general effect on parasite and food vacuole size was observed for MCA2-TGD parasites, indicating reduced development speed in these parasites. Hence, it is possible that a potential endocytosis reduction was accompanied by a slowed growth, and the comparison of similarly sized parasites may have obscured the effect. It is therefore not sure if there indeed is no endocytosis phenotype, although we can exclude a strong effect in trophozoites.

      Based on the RSA results at least rings can be expected to have a reduced endocytosis in the MCA2-TGD. Apart from options 1-3 mentioned above, it is therefore possible there is an effect restricted to rings, although in that case the reduced growth in trophozoites would be due to other functions of MCA2. Overall, we can conclude that the MCA2-TGD parasites do not have a strongly reduced endocytosis, but given the fact that the parasites are viable, this is not surprising. Whether the MCA2-TGD has no effect at all on endocytosis we would be very hesitant to postulate based on these results.

      18) The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      We attempted to re-organise as suggested but because we now included additional fluorescence microscopy images of schizont and merozoites (in response to reviewer 2 major comment 3) the main figure would become even larger. To prevent this, we kept the 3xHA data in the supplement.

      19) Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      We thank the reviewer for pointing this out – we removed Ref 43.

      20) Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      This is a valid point. We originally did not focus on schizonts because most markers end up in some focal area in the forming merozoite but other proteins (such as e.g. K13) also have one or more additional foci at the FV, making interpretation unclear, particularly if the schizont is still organizing to become fully segmented. This is why we generally focused the K13 co-localisations on the trophozoite stage to obtain the clearest information on endocytosis. However, given the fact that this manuscript gives the first localization of MyoF in P. falciparum parasites, we now provide a comprehensive time course (Figure 1C, S1A) including schizonts, which show quite a complex pattern: while the MyoF-GFP localization in trophozoites appeared as multiple foci close to K13 and also the FV, the MyoF-GFP pattern changes in late schizonts (fully segmented) and merozoites, appearing as elongated foci no longer close to K13 or the FV. Of note, this pattern has been previously reported for MyoE in P. berghei (Wall et al., 2019).

      We therefore revised the statement about MyoF localization in schizont to better reflect the observed localization: (line 175): In late schizonts and merozoite the MyoF-GFP signal was not associated with K13, but showed elongated GFP foci (Figure 1C, S2A) reminiscent of the MyoE signal previously reported in P. berghei schizonts (Wall et al., 2019).”

      21) Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      We see the reviewers point, but prefer to keep this data included in the supplement, particularly because potential differences in the location of tagged MyoF were a major concern.

      Related to the tag issue: in order to get a better understanding of the effect of C-terminally tagging with different sized tags we now performed a more detailed analysis of the MyoF-3xHA cell line (Figure S2F-G), showing that this cell line shows a growth rate similar to the 3D7 wild type parasites, and has less vesicles than the 2x-FKBP-GFP-2xFKBP cell line, but still slightly, but significantly more than 3D7 parasites. Overall, this indicates that the smaller 3xHA tag has less effect on the parasite, than the larger 2x-FKBP-GFP-2xFKBP tag (see also new Figure 1L, showing a correlation of level of inactivation and the endocytosis phenotype for MyoF).

      22) Line 212: The overlap of K13 with MyoF in Figure 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      As suggested we exchanged the trophozoite image of panel Figure 2 C (now Figure 1C) and expanded this panel with images covering the complete asexual development cycle including schizonts in response to this and the previous points. As indicated above (point 20), schizont stages are complex to interpret. While late schizonts likely are not very relevant for endocytosis this is the first description of the location of the protein in this parasite and we therefore now provide a more thorough representation of the MyoF location across asexual stages in Figure1C and S2A.

      23) Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      We agree with the reviewer that the location of this MyoF-GFP in the cell might differ due to the partial inactivation but in contrast to this comment, the data does not indicate any large differences. It seems the reviewer mixed something up (the 59% mentioned might come from the MCA2 figure?). The data with the two lines with differently tagged MyoF co-localised with K13 are actually quite comparable: GFP-tagged vs HA-tagged MyoF overlapping with K13 was 8% vs 16% full overlap, 12% vs 19% partially overlapping foci, 36% vs 63% foci that were touching but not overlapping (compare what now is Figure 1D and Figure S2C). Only in the 'no overlap' there is a much smaller proportion in the HA-tagged line. However, given that these are IFAs which on the one hand are more sensitive to see small protein pools but on the other hand also have pitfalls due to fixing of the cells (e.g. tiny increase in focus size due to fixing could increase the number of touching foci that in live cells might be close but did not touch), some variation can be expected to the live cells. We agree though that the partly reduced functionality of MyoF might be the reason for the consistent tendency of a lower overlap even though the difference is much less than indicated in the comment. We added "with a tendency for higher overlap with K13 which might be due to the partial inactivation of the GFP-tagged MyoF" to the sentence "IFA confirmed the focal localisation of MyoF and its spatial association with mCherry-K13 foci"

      While we expect the fact that the difference between these parasites is only small somewhat reduces the "pinch of salt" with the MyoF line, we do agree that the partial functional inactivation of the GFP-tagged MyoF line may have some impact. However, we do not think that this means the results with the MyoF-GFP line are untrustworthy. On the contrary, it provides insights into its function that in some ways is equivalent to a knock down or TGD. Overall all the MyoF lines show: few vesicles occur in the MyoF-HA-line, more in the MyoF-GFP line and even more after knock sideways of MyoF-GFP. Importantly the severity of this phenotype correlates with the growth rates in these lines. Hence, together with the bloated food vacuole assays, this provides consistent data indicating that MyoF has a role in the transport of HCC to the FV and its level of activity correlates with the number of vesicles and growth. To better highlight this, it is now summarised in Figure 1M.

      24) Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion.

      We thank the reviewer for pointing this out. We now provide a detailed time course (see also previous points) which shows that there is no detectable MyoF-GFP signal during ring stage development until the stage where the parasites starts the transition to trophozoites (i.e. MyoF-GFP signal could only be observed in parasites already containing hemozoin). In addition to the extended time course in Figure 1C (previously 2C) we included a panel of example ring stage images below to further highlight this. We also changed the labelling of the parasite with MyoF-GFP signal the reviewer mentions in Figure 1C to “late ring stage” (it already contains hemozoin) to clarify this.

      The description of Figure 1A is now changed to: (line 153) *“The tagged MyoF was detectable as foci close to the food vacuole from the stage parasites turned from late rings to young trophozoite stage onwards, while in schizonts multiple MyoF foci were visible (Figure 1A, S2A).” *

      Please see our answer to major comment #45 where we provide an explanation for the difference between MyoF-3xHA and MyoF-GFP signal in ring stage parasites.

      [Figure MyoF]

      25) Line 237: Showing a DNA marker (DAPI, Hoecht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Please see response to major comment #64 for a detailed answer on why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      26) Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      We did do several controls for bloated assays (including +/- rapalog of an irrelevant knock sideways line as well as using a chemical insult for which the control was 3D7 without treatment) in previous work (Birnbaum et al., 2020), which indicated that there is no effect of rapalog to reduce bloating. Although these controls are more stringent, we nevertheless did a 3D7 +/- rapalog control and added this to the manuscript (Figure S2I). As it is not possible to do this side by side with the assays that are already in the manuscript and the +/- rapalog 3D7 cells consistently showed no or very low numbers of cells without bloating (and stringent controls in the past equally did not show an effect), we believe adding this control once suffices.

      27) Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      This was now done and is provided as Figure 1J-K, S2J. The results confirm the assessment scoring bloated vs non-boated food vacuoles.

      28) Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here. Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Fortunately, this concern is unfounded, as the survival (measured by parasitemia after one cycle) of the same sample + and - DHA is assessed, isolating the DHA effect independent of potential growth defects which are cancelled out. Hence, if there were parasites dying in the MyoF line (please note that they might not actually die, but simply grow more slowly), this factor applies for both the + and - ART condition. As we are testing for a decreased susceptibility to ART which would manifest as an increased survival in RSA surfacing above 1%, antagonistic effects of reduced MyoF function and ART treatment would not result in detectable differences as without effect, the RSA survival is always close to zero.

      The same applies for the knock sideways where we assess the survival of +rapalog between +ART and -ART. If the reduced MyoF activity of the knock sideways leads to a decreased survival, this applies to both +ART and -ART. Please also note that rapalog was lifted after the DHA pulse (see e.g. Figure S2K).

      That effects on growth are cancelled out is nicely illustrated for proteins where there is a stronger and more rapid effect on growth upon their conditional inactivation. For instance when KIC7 is knocked aside, there is a considerable increased of RSA survival, even though continued inactivation of KIC7 would have a severe growth defect (Birnbaum et al., 2020). Vice versa, a growth defect alone does not result in reduced RSA susceptibility as evident from knock sideways of an unrelated protein or using a chemical insult (Figure 4H in (Birnbaum et al., 2020) or simply slowing the ring stage by e.g. reducing EXP1 levels (Mesén-Ramírez et al., 2019). Hence, a growth reduction is not expected to alter the RSA outcome. And even if it did, it would only lead to an underestimation of the readout if growth is too severely affected (which would be obvious in the + rapalog without DHA sample, which was not the case).

      In that respect it is valuable to have the rapid kinetics of knock sideways which permit inactivation of a protein before severe growth defects occur (although the only partial responsiveness of MyoF clearly is not the most optimal). In contrast, the absolute loss of a gene (as is the case if diCre is used) prevents (or at least makes it extremely difficult as the timing would need to exactly hit sufficient protein reduction without killing the parasite until the end of the RSA) using this system in these experiments (again see (Mesén-Ramírez et al., 2021) where in a EXP1 diCre based knock out RSA was only possible because we complemented with a lowly, episomally expressed EXP1 copy to have parasites with only a partial phenotype to do this assay).

      29) Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified?

      The different steps in endocytosis are explained in the introduction and we now tried to further clarify this (line 98). So far VPS45 (Jonscher et al., 2019), Rbsn5 (Sabitzki et al., 2023), Rab5b (Sabitzki et al., 2023), the phosphoinositide-binding protein PX1 (Mukherjee et al., 2022), the host enzyme peroxiredoxin 6 (Wagner et al., 2022) and K13 and some of its compartment proteins (Eps15, AP2µ, KIC7, UBP1) (Birnbaum et al., 2020) have been reported to act at different steps in the endocytic uptake pathway of hemoglobin. While inactivation of VPS45, Rbsn5, Rab5b, PX1 or actin resulted in an accumulation of hemoglobin filled vesicles (Lazarus et al., 2008; Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023), indicative of a block during endosomal transport (late steps in endocytosis), no such vesicles were observed upon inactivation of K13 and its compartment proteins (Birnbaum et al., 2020), suggesting a role of these proteins during initiation of endocytosis (early steps in endocytosis).

      VPS45 has not apparent spatial connection to the K13 compartment but the fact that MyoF does - and its inactivation also results in vesicle accumulation - indicates that it is downstream of vesicle initiation, providing the first connection from the initiation phase to the transport phase. More evidence for these different steps of endocytosis has been published in a recent preprint from our lab, where we simultaneously inactivated a protein of both “endocytosis steps” (Sabitzki et al., 2023).

      To clarify this in the results as requested, we changed the statement to: (line 256) Overall, our results indicate a close association of MyoF foci with the K13 compartment and a role of MyoF in endocytosis albeit not in rings and at a step in the endocytosis pathway when hemoglobin-filled vesicles had already formed and hence is subsequent to the function of the other so far known K13 compartment proteins.”

      30) Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      We split this point to address all issues raised here. Please see response to point 29 which clarifies that this was meant in a different way and our response to point 28 which explains why the dying parasite issue is not expected to affect the RSA (please also note that we do not have evidence of actually dying parasites in the MyoF-2xFKBP-GFP-2xFKBP line, most likely the growth is slowed).

      The mutation issue is interesting. In fact evidence exists that MyoF mutations may be associated with resistance (Cerqueira et al., 2017) (please note that there it is still called MyoC) but in a recent preprint from our lab we did not find any evidence for a significantly changed RSA survival in 12 tested mutations in the corresponding gene (Behrens et al., 2023).

      To clarify this we added the following statement to the discussion (line 709): "Of note, mutations in myoF have previously been found to be associated with reduced ART susceptibility (Cerqueira et al., 2017), but 12 mutations tested in the laboratory strain 3D7 did not result in increased RSA survival (Behrens et al., 2023)*. *

      31) Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      We think there is a misunderstanding here, as our figure legend was not detailed enough and we apologise if this had been misleading. The growth effect is restricted to invasion or possibly the first hours of ring stage development (see point 4&5, reviewer 2), which in asynchronous cultures more rapidly takes effect as the culture also contains schizonts that immediately generate cells that re-invade but can't due to inactivation of KIC11 (due to the rapid action of the knock sideways, KIC11 is already inactivated). In contrast, in highly synchronous cultures, this effect can only be evident once the parasites reached the schizont stage (starting with rings this takes close to 2 days). We now clarify that Figure 2E (previously Figure 3D) shows growth data obtained with an asynchronous parasite culture, while in Figure 2F the growth assay is performed with tightly synchronized (4h window) parasites as stated in the Figure legend.

      We now explicitly state in each Figure legend and for each growth experiment throughout the manuscript whether we used asynchronous or synchronized parasites for growth assays.

      Related to this, the incorrect y-axis label of what is now Figure 2E mentioned in major comment #58 is now corrected.

      32) Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      This is a valid point and this has now been addressed. We performed an invasion/egress assay revealing similar schizont rupture rates, but significantly reduced numbers of newly formed ring stage parasites (Figure 2H, S3G), indicating an effect of KIC11 inactivation either on invasion or possibly the first hours of ring stage development. A very similar point was raised by Reviewer 2, please see reviewer 2; major comment #4. This is now also reflected in line 302, which now reads: ”… indicating an invasion defect or an effect on parasite viability in merozoites or early rings but no effect on other parasite stages (Figure 2F-H, Figure S3F-G).”

      We further included an assessment of mislocalization 80 hours after the induction of knock-sideways by addition of rapalog in Figure S3E which showed mislocalization of KIC11 to the nucleus.

      33) Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Done as suggested.

      34) Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      We changed the sentence (line 339) to: “…nuclear signal and a faint uniform cytoplasmic GFP signal was detected in late trophozoites and early schizonts and these signals were absent in later schizonts and merozoites (Figure 3A, Figure S4A,B).” in order to emphasize that the nuclear signal disappears early during schizont development.

      35) Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed.

      The observation that mRNA levels of early ring stage expressed proteins tend to increase already in mature schizonts and merozoites is well established (e.g. (Bozdech et al., 2003)). A very good example for this are exported proteins of which most show a transcription peak in schizonts but the proteins are only detected in rings see e.g. (Marti et al., 2004). Hence, our observation for KIC12 is quite typical.

      We originally did not include merozoites, as in the last row of Figure 3B fully developed merozoites within a schizont with already ruptured PVM are shown and no GFP signal can be detected in these parasites. We now provide images of free merozoites in Figure S4A-B showing again no detectable GFP signal.

      We thank the reviewer for pointing out the typo, "peak" has been corrected.

      36) Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised?

      The Lyn mislocaliser is at the PPM which is continuous with the cytostomal neck where the K13 compartment likely is found. The effect of the Lyn mislocalizer on the KIC12 protein pool localizing at the K13 compartment is therefore somewhat unclear. For this reason we already had the following statement in the original submission (line 400): “Foci were still detected in the parasite periphery and it is unclear whether these remained with the K13 compartment or were also in some way affected by the Lyn-mislocaliser.” We would like to stress here that the same does not apply to the nuclear mislocaliser, which is only a trafficking signal delivering KIC12 to the nucleus and hence likely does not affect the nuclear pool of KIC12, only the K13 compartment pool (the main interest of this manuscript).

      We realised that the statement towards the end of this paragraph was unnecessarily ambiguous in regards to the K13 compartment pool of KIC12 which might have caused some confusion about the function of this pool of KIC12 and therefore modified it to (line 374): "Due to the possible influence on the K13 compartment located foci of KIC12 with the Lyn mislocaliser, a clear interpretation in regard to the functional importance of the nuclear pool of KIC12 other than that it confirms the importance of this protein for asexual blood stages is not possible. In contrast, the results with the nuclear mislocaliser indicate that the K13 located pool of KIC12 is important for efficient parasite growth.". It is also important to note that this limitation does not apply to the NLS knock sideways in regard to the K13 compartment and that the endocytosis function of this pool of KIC12 seems solid which with this statement is enforced.

      37) Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF?

      This was now done and is provided as Figure 1J-K, S2J, confirming our previous interpretation, see also point #27 which raises the same point.

      38) Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      This is based on our current definition as stated in the introduction. It assumes a directional vesicular transport of hemoglobin to the food vacuole where inhibition of early stages will prevent transport before HCC-filled autonomous vesicular containers have formed and entered the cell. In contrast later inhibition stops such containers from further transport, leading to their accumulation. Such an accumulation is visible after VPS45-inactivation and other proteins (Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023) or treatment with cytochalasin D (Lazarus et al., 2008). While it is possible that there may be smaller intermediates formed at the K13 compartment that later on unite or fuse with the compartment evident after VPS45 inactivation and these might be missed due to small size (i.e. inhibition of a step between K13 compartment and an early endosome or equivalent), this would still be upstream of the VPS45 induced containers and hence would be earlier. We therefore believe that based on the framework given in the introduction (see also (Spielmann et al., 2020)) to assume that a phenotype manifesting as reduced food vacuole bloating without formation of detectable vesicles likely signifies inhibition of the process early whereas reduced bloating but with vesicles signifies inhibition later in the process.

      39) Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      This is an interesting point. The endocytosis proteins we studied so far indicate that efficient impairment of endocytosis manifests as a severe growth defect. Hence, lack of a growth defect can be assumed to be an indicator for absence of an important role for endocytosis (or any other growth relevant process). Clearly there is a gradual response, such as seen in the different MyoF versions resulting in proportional growth and vesicle appearance phenotypes. Hence, a protein with a minor role might have slipped our attention but then it probably is also not a very important protein in endocytosis.

      To further strengthen our assessment of PF3D7_1365800 importance for asexual blood stage development, we now also generated a cell line expressing the PPM Mislocalizer, enabling knock sideways to the PPM. This was done because this protein consistently has a focus at the nucleus that may be within the nucleus. Again this revealed no growth defect upon inactivation (Figure S7D).

      40) Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper. Identical as major comment #17.

      As stated in the manuscript and above, we were originally hesitant to do these assays due to the fact that we can't induce inactivation which is less ideal than comparing the identical parasite population split into plus and minus and is further complicated by the likely smaller effect as the TGDs still permitted growth. However, we see the point of the reviewer and now performed these assays using 3D7 as controls and taking extra care to account for stage differences between the TGD lines and 3D7. However, there was no significant difference in the bloated food vacuole assays with these cell lines. Due to the reasons mentioned in major point 17, we are not sure this indeed means these proteins have no role in endocytosis. One possible reason why we were able to obtain these TGDs may have been because the effect on endocytosis is less than in the essential proteins (or is ring stage specific) and in a TGD an endocytosis defect may therefore not be detectable with our assays (see details and further possible explanations in response to point 17).

      In an attempt to address the TGD issue, we generated knock sideways cell lines for KIC4 and KIC5. Unfortunately, the mislocalization of KIC5 to the nucleus was inefficient (see figure below). As this did not result in a growth defect (in contrast to the clear KIC5-TGD growth defect (Birnbaum et al., 2020)), this line is not suitable to study a potential role of this protein in endocytosis. Therefore, we performed the bloated food vacuole assay only with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites. However, this revealed no effect on HHC uptake, which is in line with the normal growth of KIC4-TGD parasites (Birnbaum et al., 2020) and suggests that this protein could only have a minor or redundant role in endocytosis (it is the line that shows the smallest effect in RSA). As the KIC4 and KIC5 knock sideway lines did not permit any conclusions, we did not include them into the revised manuscript but they can be found here:

      [Figure KIC4 knock sideways & KIC5 knocksideways]

      Figure legend: (A) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+ 1xNLS mislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Relative growth of asynchronous KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser plus rapalog compared with control parasites over five days. Three independent experiments were performed. Growth of knock sideways (+ rapalog) compared to control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (blue) or KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (red) parasites over five days. Mean relative parasitemia ± SD is shown. (B) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Growth of asynchronous KIC5-2xFKBP-GFP-2xFKBPendo+ 1xNLSmislocaliser plus rapalog compared with control parasites over five days. Four independent experiments were performed. __(C) __Bloated food vacuole assay with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 8 hours after inactivation of KIC4 (+rapalog). Cells were categorized as with ‘bloated FV’ or ‘non-bloated FV’ and percentage of cells with bloated FV is displayed; n = 3 independent experiments with each n=19-30 (mean 21.4) parasites analysed per condition. Representative DIC are displayed. Area of the FV, area of the parasite and area of FV divided by area of the corresponding parasites were determined. Mean of each independent experiment indicated by coloured symbols, individual datapoints by grey dots. Data presented according to SuperPlot guidelines (Lord et al., 2020); Error bars represent mean ± SD. P-value determined by paired t-test. Area of FV of individual cells plotted versus the area of the corresponding parasite. Line represents linear regression with error indicated by dashed line.

      41) Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      We agree that this was not well phrased. To account for the fact that not all endocytosis proteins confer increased RSA survival to the parasites when inactivated we changed this statement (line 604): "This analysis suggests that proteins detected at the K13 compartment can be classified into at least two groups of which one comprises proteins involved in endocytosis or in vitro ART resistance whereas the other group might have different functions yet to be discovered.

      Generally, we believe that endocytosis is the overarching criterion and we therefore would like to keep the definitions of the main groups (endocytosis or not). As indicated by the title, the focus of the manuscript is on the K13 compartment for which so far endocytosis is the only experimentally associated function. That this group contains proteins that do not confer reduced ART susceptibility when conditionally inactivated (KIC12 and MyoF) is explained by their stage-specificity, making this a subgroup of the overarching endocytosis group.

      We realise that with the endocytosis data on the KIC4, KIC5 and MCA2 TGD there is now also a subgroup we were unable to demonstrate an endocytosis effect in trophozoites although they show changes in RSA survival. However, as indicated above, we would be hesitant to fully exclude some role of these proteins in endocytosis in rings. Particularly as a comparably small reduction in endocytosis protein activity or abundance is sufficient to increase RSA survival (Behrens et al., 2023). A principal classification of "endocytosis or ART resistance" or "neither endocytosis nor ART resistance" still accounts for this and therefore seems to us to be the most useful, particularly also in light of our domain identification that then can be linked with one or the other group.

      42) Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      We respectfully disagree with the reviewer in this point, we did expand the repertoire of known K13 compartment proteins. Only independently experimentally validated proteins from proximity biotinylation experiments can be considered part of the K13 compartment (or any other cellular site or complex). Without validation of the location, the identified proteins can only be considered candidates. This is highlighted in this manuscript by the finding that several proteins of the list did not localize at the K13 compartment.

      43) Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      We now included this experiment. In agreement with a lacking need of MyoF in rings and no effect on RSA survival, there was no increased survival of the parasites in RSA (neither on 3D7 nor on K13 C580Y parasites) after cytD treatment (new part in Figure 1M). We thank the reviewer for pointing out that this experiment might also inform on whether other myosins influence endocytosis in ring stages. We added (line 250): Similarly, also incubation with the actin destabilising agent Cytochalasin D (Casella et al., 1981), had no effect on RSA survival in 3D7 or K13C580Y (Birnbaum et al., 2020) parasites, indicating an actin/myosin independent endocytosis pathway in ring stage parasites (Figure 1M) and speaking against other myosins taking over the MyoF endocytosis function in rings.”

      44) Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      The inhibitors used in the cited studies (Kumari et al., 2018) are validated metacaspase inhibitors, such as Z-FA-FMK (Lopez-Hernandez et al., 2003). Activity against the other parts of PfMCA2 - which apart from the MCA domain shows no homology to other proteins - is therefore unlikely.

      45) Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      This comment is related to major point #24. We also would like to stress that while the MyoF-GFP line already shows a phenotype, the impression of defectiveness based on its location is due to a mix up (see major point #23).

      We now provide a comprehensive time course of the MyoF-GFP signal (Figure 1C, S2A) showing that there is no detectable MyoF-GFP signal until the transition from ring to trophozoite stage. As this is all under the endogenous promoter, we do not think the partial functional inactivation of the tagging is the reason for the absence of the signal. If anything, we would have expected adding a stably folded structure such as GFP to increase the stability of the protein. The main reason for the discrepancy of MyoF signal in rings between the GFP-tagged line (of note there is also no detectable MyoF-GFP signal in MyoF-2xFKBP-GFP ring stage parasites (Figure S2B)) and the HA-tagged line likely is that IFA is much more sensitive than live GFP detection (similar to the high sensitivity the reviewer mentions in regards to WB). This discrepancy therefore is likely due to the fact that the lowly expressed MyoF only become apparent with the HA-tagged line due to the IFA. We therefore believe that MyoF is 'lowly expressed in rings' is an appropriate description of our results obtained with three different cell lines (MyoF-2xFKBP-GFP-2xFKBP, MyoF-2xFKBP-GFP and MyoF-3xHA). We hope this is sufficiently well reflected in the manuscript where we write ‘a low level of expression of MyoF in ring stage parasites.’ not that it is ‘not there in rings’ (line 174).

      46) Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      See response for major comment #41, we now consistently used "or" instead of "and". See line 490-493 how this was resolved for what previously was line 635.

      47) Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      We thank the reviewer for pointing this out, we corrected this typo and will look out for symbol font conversion errors for the resubmission.

      48) Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      The bloated FV assay is well established (Jonscher et al., 2019; Birnbaum et al., 2020; Sabitzki et al., 2023). Although the bloating of the FV is a human judgment call, it is actually quite obvious: bloating appears as an easily spotted bulging of the FV in DIC. As also minor bloating is scored as 'bloated', it is a very conservative assay. Using an-add on to measure this is not straight forward. It is unclear how this bulging effect of the FV in DIC could be spotted by a software and due to the obviousness to human operators, potentially lengthy and complicated efforts to design appropriate machine learning options were not undertaken. The situation faced by the scorer of the assay is evident from Figure S4F-G which contains close to 50 "on rapalog" cells and close to 50 control cells, giving representative cells from all replicas of bloated FV assays with KIC12. Please note that these images shows the most complicated situation as far as bloated assays go, because the phenotype is not 100% (see Figure 3F) compared to e.g. KIC7 inactivation which leads to lack of bloating in almost all cells (see (Birnbaum et al., 2020) Figure 3E) but nevertheless the difference is still obvious. We are aware that in such situations (less than absolute inhibition) this assay scoring of "yes" or "no" is a surrogate for the actual level of inhibition and may be more subjective. This is why in this case we also did the FV size measurements (which are less dependent on human judgment) to further support this and give a better quantifiable measure. Of note, the bloated food vacuole judgments are done "blinded", i.e. the examiner does not know which sample they are looking at.

      In response to this reviewer's point we now also added the FV size refinement of the assay for MyoF inactivation which is one of the cases where inhibition of bloating is not in 100% of the cells (see major comment #27). Please also note here the advantage of the rapidly acting knock sideways technique for these assays which shows the sum of effect 8 h after initiating inactivation and for which we carefully control size of the cells which shows that there is no significant growth reduction over the assay time, excluding secondary effects due to a generally reduced viability. Compared to slower acting systems suggested to have been used instead (see introductory part and significance of this review), the rapid speed of knock sideways reduces the risk of potential pleiotropic or compensatory effects due to the time needed for proteins to be depleted if the gene or mRNA is targeted instead.

      The suggestion to include a ‘white circle’ (raised also as minor comment#27) is useful as an aid to see the food vacuole. However, in contrast to the Figures in (Birnbaum et al., 2020) (where we did add such a circle), we here included the DHE staining images in the figure, labelling the parasite cytosol which readily shows the FV (the FV corresponds to the region where there is no DHE staining). As this shows the position of the FV we would prefer to not obscure the DIC images with additional features to permit the reader to see the difference between bloated or non-bloated food vacuoles and keeping the image as natural as possible.

      49) Line 863-864: this sentence seems to be out of place.

      We thank the reviewer for pointing this out, the details of nucleus staining were moved to the correct part.

      50) Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      This has been corrected.

      51) Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      These are the individual replicates of the growth curves shown in Figure 1G of the same cell lines done on a different occasion. We always try to show as much of the primary data as possible and believe that showing individual data points from the different experiments is better than only the combined values which obscure the actual course of each experiment.

      52) Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      We thank the reviewer for pointing this out, this was due to a copy & paste error in the figure legend that was now amended. We also fixed the incorrect axis label. For the last part (growth defect) please see detailed answer to Major comment#31 raising the same concern for KIC11 (in synchronous parasites the defect only takes effect once the cells reached the relevant stage whereas in asynchronous cultures there are always cells in the relevant stage that due to the rapid effect of the knock sideways already have a growth phenotype).

      53) Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Done as requested.

      54) Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Both are trophozoites (early trophozoite in top panel and late trophozoite in bottom panel). This is now labelled in what now is figure 1B. As stated above, schizont stages are less relevant for the topic of this manuscript and in order to prevent the manuscript from getting more disjointed and keeping it more focussed on the main topic, we decided to not include a schizont in the manuscript. Nevertheless, we included an example image below.

      [Figure MyoF_p40px schizont]

      55) Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      While we in principle fully agree with the reviewer in showing the course of the full experiment (which is available in Figure S2E), the key here is to show the overall difference. Hence, we would like to keep this comparison of the overall effect on growth in what now is Figure 1E and G. It is part of the argument to the doubts this reviewer raises to the function of MyoF (mainly in the overall assessment and the significance statement) to show that the phenotype is actually very consistent (partial inactivation through tagging or further inactivation using knock sideways increases endocytosis phenotypes, correlating with parasite viability).

      Please also note, that the growth curves upon knock sideways shown in Figure 1G, S2E are performed with asynchronous parasite cultures, which doesn’t allow us to draw direct conclusions about growth cycle effects.

      Nevertheless, we now also included the suggested combined data representation in Figure S2E.

      56) Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      This now has been done (confirming our results) and is included as Figure 1J-K, S2J. This point was also raised as major comment #37, please also see detailed answer there.

      57) Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      This is now included in Figure 2C.

      58) Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      We apologise for the inadequate legend and colour issues, this was amended. This point was also raised in major comment #31 and #52, please find detailed answer there.

      59) Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      We thank the reviewer for pointing this out, the missing label is now included and the colour has been adapted to make them better distinguishable.

      60) Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      This is in fact a ring, but we realize that we accidentally included an incorrect size bar in the ring image of Figure 4A (now Figure 3A) (size bar for 63x objective instead of the correct one for the 100x objective), we apologise for this oversight. We don’t think this parasite has multiple nuclei, instead the Hoechst signal shows the often elongated nucleus seen in rings that can appear as two foci in Giemsa stained smears which leads to the typical diagnostic feature of P. falciparum rings in diagnostics. In order to exclude any doubts about the nuclear localization of KIC12 in rings, we here attached a panel with more examples of KIC12-2xFKBP-GFP-2xFKBP ring stage parasites.

      [Figure KIC12]

      61) Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      This is now provided in Figure S4A. As suggested by the reviewer, we independently quantified the association for ring stage, early trophozoite and late trophozoites stage. As there is no KI12 signal in schizonts, we did not include a quantification for this stage.

      62) Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Please see major comment #64 for a detailed answer why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      63) Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      We see the point the reviewer is raising here, Figure 4D (now Figure 3D) also contains the data with the Lyn mislocaliser while we first talk about the NLS mislocaliser. This permits a better comparison between the two mislocaliser lines. However, first explaining the Lyn-mislocaliser and then going back to the NLS would make it rather complicated for the reader to follow the storyline and therefore we would like to keep the order as it is. We realise that this means the reader has to go back one figure part for seeing the Lyn growth data, but believe this is worth the benefit that the data is there compared to the NLS result.

      64) It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      We did not include DNA staining (DAPI or Hoechst) for any of the images used to assess the efficacy of mislocalization, as we would prefer to keep the parasites as representative of a viable parasites in culture as possible. Hence they were imaged without DNA stain (these stains are toxic). We would like to point out that a DNA stain is not necessary, as the mislocaliser already marks the nucleus (in the case of the NLS mislocaliser), actually even somewhat more accurately, as it fills the entire nuclear space rather than only the DNA which is marked by DAPI or Hoechst.

      For LYN this admittedly is not the case, there the mislocaliser marks the plasma membrane. However, we think the proper control for efficient mislocalisation is the comparison between the GFP-tagged protein of interest and the mCherry mislocaliser to show mislocalisation, as previously done in our lab (e.g. (Birnbaum et al., 2017; Jonscher et al., 2019; Birnbaum et al., 2020)).

      Due to their toxicity, we also avoided nuclear staining in some other parts of the manuscript when we were of the opinion that a nucleus signal was not necessary.

      65) Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      We did perform Western blot analysis for both MCA2 cell lines. MCA2 is the only gene-product for which we generated a disruption for this work, and together with the severe truncation from previous work, we provided a Western blot-based confirmation of the correct size.

      The MCA2 disruptions are at least partially dispensable for in vitro parasite growth, hence if degradation occurred, this might not have been noticed. In that case we considered it relevant to show that the truncations were of the expected size. The other proteins in the main figures are essential for growth. Hence, if the tagging approach would lead to unexpected changes in protein integrity (which we assume is what was intended by this concern to be assessed with a Western blot), the parasites expressing the tagged MyoF, KIC11 and KIC12 would - due to their importance for asexual blood stage development - not have been obtained. Hence, we can assume the integrity of the tagged protein is very unlikely to have been affected in a functionally relevant way.

      66) None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      We thank the reviewer for this comment. This has now been amended, individual channels of fluorescence microscopy images are now shown in greyscale, while the overlay was changed to green/magenta.

      Minor Comments

      1) line 29: remove 'are'.

      Done.

      2) Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      Done.

      3) line 44: remove 'the'

      Done.

      4) Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Done.

      5) Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      We now cite the newest WHO report.

      6) Line 53: please insert the word 'have' between now and also.

      Done.

      7) Line 54: please change 'was linked' to is linked

      Done

      8) Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic.

      Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Done.

      9) Line 90: authors should either say "in previous works" or "in a previous work"

      The text has been altered to say: “ in a previous work”.

      10) Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Done.

      11) Line 95: please change 'rate' to number

      Done.

      12) Line 109: Please include a coma before (ii).

      Done.

      13) Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Done.

      14) Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Done.

      15) Line 131: please define SNP - this is the first use of the acronym.

      Done.

      16) Line 133-134: South-East Asia instead of "South Asia"

      Done.

      17) Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      We apologise for this oversight. We now explain what is meant with TGD at the suggested point of the manuscript.

      18) Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      To the best of our knowledge this issue has not been resolved, some Journals capitalize the “W” (e.g. Science), while others don’t (e.g. Nature). We would prefer to continue to capitalize the “W”, as this is consistent with the original publication from (Burnette, 1981), but if there are strong objections, we would be happy to change this____.

      19) Line 152: add "the" between 'and spatial'

      Done.

      20) Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Done.

      21) Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Done.

      22) Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Good point, this was done.

      23) Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Please see major comment #64 for a detailed answer.

      24) Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Reference 65 (Wichers et al., 2019) provides an RNAseq transcriptome dataset for asexual blood stage development of 3D7 (originating from the same source as the 3D7 used in this study). While Ref 66 (Subudhi et al., 2020) indeed contain transcriptomic data from P. chabaudi, the authors also provide a nice 2h window RNAseq transcriptome dataset for asexual blood stage development of Plasmodium falciparum. Both datasets are therefore suitable as reference for the statement about myoF transcription pattern. Both datasets are also easily accessible and show the pattern in a graph in PlasmoDB.

      25) Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Done.

      26) Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      The suggested wording introduces "mainly" for "frequently" and likely was in part motivated by the discrepancy in location between cell lines that we hope we now could clarify to be only minor (see major point #23). We therefore think the original wording appropriately summarises the findings (line 178): “*Taken together these results show that MyoF is in foci that are frequently close or overlapping with K13, indicating that MyoF is found in a regular close spatial association with the K13 compartment and at times overlaps with that compartment.” *

      27) Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      In contrast to the Figures in Birnbaum et al. 2020, we here included the DHE staining (parasite cytosol) in images of bloated FV assays which visualizes the FV. We therefore decided to avoid any further marking, to keep the image as unprocessed as possible (see also major point 48).

      28) Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      The interpretation of the reviewer is correct, we indeed graded this subconsciously based on level of overlap. Based on the newly added quantification shown in Figure 2C, we describe KIC11 now as K13 compartment protein.

      29) Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Done, we now included Birnbaum et al. 2020 as reference for this.

      30) Line 377: Figure 4I, please correct 1st panel Y axis legend

      Done.

      31) Line 404: replace "dispensability" with dispensable

      Done.

      32) Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      As some of these proteins were less well or less consistently enriched, they could be background of the experiment. Alternatively, some could be proteins that only transiently interact with the K13 compartment.

      33) Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit a" come from. Do you have a reference?

      The statement now includes references and reads (with small changes to original submission): "More than 97% of proteins containing these domains also contain an Adaptin_N (IPR002553) domain (Blum et al., 2021) and in this combination typically function in vesicle adaptor complexes as subunit α (Hirst and Robinson, 1998; Traub et al., 1999) (Figure 5D) but no such domain was detectable in KIC5."

      34) Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      The critical issue is the combination of domains and their position within the protein. While KIC4 also contains a VHS domain, the VHS domain in KIC4 is N-terminal, not in a central position and it is also not the first structural domain to be identified in KIC4. The similarity to adaptin domains was already described ((Birnbaum et al., 2020) and annotated in PlasmoDB) and these domains are also involved in vesicle formation and trafficking. These aspects of the statement can therefore not be extended to KIC4. With regards to VHS domains being involved in vesicle trafficking, this is already stated in line 538: «KIC4 contained an N-terminal VHS domain (IPR002014), followed by a GAT domain (IPR004152) and an Ig-like clathrin adaptor α/β/γ adaptin appendage domain (IPR008152) (Figure 5A-C, Figure S8). This is an arrangement typical for GGAs (Golgi-localised gamma ear-containing Arf-binding proteins) which are vesicle adaptors first found to function at the trans-Golgi (Dell’Angelica et al., 2000; Hirst et al., 2000)

      35) Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      We rephrased this sentence and it now reads (line 592): However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 was observed, suggesting PF3D7_1365800 is not needed for endocytosis“.

      36) Line 535: Have AP-2a or AP-2b been shown to be at the K13 compartment?

      AP2m is at the K13 compartment (Birnbaum et al., 2020). Adaptor complexes are heterotetramers and their subunits do not typically function on their own and this is conserved across evolutionarily distant organisms. In agreement that this is also the case in P. falciparum, Henrici et al. (Henrici et al., 2020a) showed that both, AP-2a and AP-2b, were present in an AP2µ Co-IP, indicating that the AP2 complex consist of the ‘classical’ subunits in P. falciparum. Therefore, the presence of all subunits at the K13 compartment is very likely, although this has only been experimentally confirmed for AP2µ. Of note, for Toxoplasma gondii the presence of AP-2a and AP-2b at the micropore has been experimentally confirmed (Wan et al., 2023; Koreny et al., 2023) and interaction suggested by presence in the same IP as DRPC (Heredero-Bermejo et al., 2019).

      37) Line 569: reference 43 is wrong

      We thanks the reviewer for pointing this out – we removed Ref 43.

      38) Line 746: typo "ot" instead of or.

      Changed.

      39) Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Done. In addition, we have now applied a more stringent cut-off of 4Å over more than 60 amino acids to ensure a higher reliability of our hits. This decision was based on results from our preprint (Behrens and Spielmann, 2023). Because of this the phosphatase domain in KIC12 is no longer included in this manuscript and accordingly the following sentence has been deleted. In KIC12 we identified a potential purple acid phosphatase (PAP) domain. However, with the high RMSD of 4.9 Å, the domain might also be a divergent similar fold, such as a C2 domain, which targets proteins to membranes.”

      40) Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Done.

      41) Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing.

      We agree that ideally the order of sample loading should be consistent and we apologise for this. The explanation for this is that these gels were run by different people at different times before we were able to better standardize the loading scheme. However, in the interest of not unnecessarily using resources for something that has a similar meaning, we would prefer not to repeat these PCRs and re-run them only for consistency reasons (as the conclusion is not affected by the different loading schemes).

      42) Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      See response to major comment 56.

      43) Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      We agree that this is the case, however it is also the case for all other proteins that either are not involved in endocytosis and/or lowered susceptibility to ART. We therefore now added a summary statement addressing this in line 602: In contrast, the K13 compartment proteins where no role in ART resistance (based on RSA) or endocytosis was detected, KIC1, KIC2, KIC6, KIC8, KIC9 and KIC11, do not contain such domains (Figure 5E).” We did not add this at the suggested part of the manuscript as at that point the domain search results are not yet introduced and doing this each time for all the individual proteins would disconnect the flow of the manuscript.

      44) Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      The only protein for which mutations do not have a large fitness cost is K13 (see also our preprint on fitness cost of ubp1 mutation (Behrens et al., 2023) and even with K13 the level of resistance seems to be limited by amino acid deprivation when endocytosis is reduced (Mesén-Ramírez et al., 2021). We therefore do not think that this pathway is particularly prone for mutations. Further, the number of commercial drugs targeting the "endproduct" of endocytosis (hemoglobin digestion and detoxification of heme) highlight it as the most prominent vulnerability for drug-based intervention if we go by number of commercially available drugs acting on things associated with a single process.

      45) Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Done.

      ** Referees cross-commenting**

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      We now rearranged the manuscript for better flow but would like to highlight that the many requests for smaller experimental issues (and "better description of results") worked somewhat in the opposite way of a more linear description. We hope the rearranged version acceptably balances these two issues. The issues raised in regards to target selection and potential partial mis-localisation are addressed in our responses mainly to this reviewer. Please also see comments on systems used at the end of the rebuttal.

      Reviewer #1 (Significance (Required)):

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

      In the significance statement the reviewer indicates that other systems would have been more reliable for the work here. This is addressed in our response above and in a detailed considerations on the properties of conditional inactivation systems at the end of the rebuttal. The systems used in this work were not only chosen because they permit rapid targeting of many different proteins, but because they have merits that are beneficial for our assays. In fact many of the functional assays in this manuscript are difficult or impossible to carry with the suggested conditional inactivation systems (please note that we have extensive experience with the systems considered preferable:

      • DiCre (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021; Wichers et al., 2022; Kimmel et al., 2023)

      • glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023)).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In a previous publication the Spielmann lab identified the molecular mechanism of ART resistance in P. falciparum by connecting reduced levels of the protein K13 to decreased endocytosis (uptake of hemoglobin from the RBC cytosol), which results in reduced ART susceptibility. Using quantitative BioID the authors further identified proteins belonging to a K13 compartment, highlighting an unusual endocytosis mechanism.

      In the present manuscript the authors follow up on this work and closely examine ten more proteins of the K13/Eps15-related "proxiome". They successfully link MCA2 to ART resistance in vitro, while the proteins MyoF and KIC12 are involved in endocytosis but do not confer in vitro ART resistance when impaired. They further characterize one candidate (KIC11) that partially colocalizes with K13 in trophozoites but to a lesser degree in schizonts. Growth assays suggest an important function for KIC11 in late stages of the intraerythrocytic developmental cycle. Five analyzed proteins however do not colocalize with the K13 compartment, while a sixth was refractory to endogenous tagging.

      Using AlphaFold predictions of the KIC protein structures the author identify domains in most constituents of the K13 compartment, highlighting vesicle trafficking-related features that were not identified on primary sequence level before.

      The combination of functional data together with structure predictions leads them to propose a refinement of the K13 compartment as being divided into proteins participating in endocytosis and proteins that have an unknown function.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      1) -Table 1 is missing

      We apologise for this mistake; Table 1 is now included.

      2) -Lines 117-123: Given the total list of uncharacterized candidates encompasses 13 proteins, can the author gives the reason why only the top 10 and not all 13 were characterized in this study?

      A similar point has been raised by Reviewer 1 in major comment #12, please see our response there for an explanation why we chose which targets.

      3) -Line 174: 20% of observed MCA2 foci show no overlap with K13 and 21% only partially overlap, can the author confirm that the observed MCA2 foci in schizonts are the ones that co-localize with K13. (Addition of a schizont stage image in Fig 1C would be sufficient).

      We now extended Figure 4C with images of MCA2-Y1344STOP-GFP+mCherryK13 parasites covering the schizont and merozoite stage, showing that the majority of the MCA2 foci in schizonts are also mCherry-K13 positive.

      4) -The localization and observed phenotype of KIC11 is interesting but unfortunately the authors do not explore it further. Does KIC11 localize with markers of e.g. the secretory organelles (micronemes or rhoptries) in schizonts and could therefore be involved in RBC invasion?

      While we intended to focus mainly on the endocytosis aspect of these proteins, we see the reviewer's point and now generated new cell lines enabling assessment of spatial association of KIC11 with markers for rhoptry (ARO), micronemes (AMA1), and inner membrane complex (IMC1c). This revealed that the KIC11-GFP signal in schizonts does not overlap with apical organelle markers and the signal does not resemble a typical apical localization. In addition, we assessed all three organelle markers after inactivating KIC11 by knock sideways which showed that KIC11 inactivation has no apparent effect on the appearance of these markers, suggesting no major alterations in schizont morphology in respect to apical markers. These results are now presented as Figure S3A and in line 304 of the results.

      5) Can the author distinguish if KIC11 is involved in RBC invasion or in establishment of the ring-stage parasite?

      In order to look into this, we performed egress/invasion assays, quantifying schizont and ring stage parasites in tightly synchronized parasites at two different time points (pre-egress: 38-42 hpi & post-egress: 46-50 hpi). This revealed a significant decrease in newly formed ring stage parasite per ruptured schizont in parasites with inactivated KIC11, while the egress efficacy remained unaffected. This indicated an invasion or very early ring stage development defect (new Figure 2H, Figure S3G). To further determine at which point exactly the phenotype occurs (ie during invasion or early after invasion) would require extensive experimentation that goes beyond the scope of this study (e.g. invasion assays using video microscopy with a representative number of parasites or sophisticated flow based quantification assays). We hope by excluding egress and gross changes of apical organelles as well as no indication for similar number of early rings (indicating it is invasion or a very early ring-establishment phenotype) will sufficiently narrow down the phenotype for labs interested in invasion to more definitely answer this question.

      Minor comments:

      1) Table S1: Please add the criterion for the order of proteins (abundance in "proxiome"?) in the table as a separate column. I would also suggest adding a new column that highlights the 10 proteins investigated in this study as I found the color-coding slightly confusing.

      Done as suggested: we now include the “average log2 Ratio normalized Kelch13” values from the four DiQ-BioID experiments performed with K13 in (Birnbaum et al., 2020), as well as the suggested column to highlight the investigated proteins. Please also see reviewer 1 major point # 12 for additional information on the selection criteria and how this was added to the manuscript.

      2) -154-155: There is a discrepancy between the text and Fig1C regarding the % of partial overlapping and non-overlapping foci.

      We thank the reviewer for pointing this out, this was corrected.

      3) -The y-axis label is missing in Fig 3E

      Done.

      4) -Fig 4I left graph, the superscript 2 is missing in μm2

      We thank the reviewer for pointing this out, this is now changed.

      5) -Did the author colocalize KIC11 in schizonts with other proteins found in the K13 compartment group of proteins not involved in endocytosis/ART resistance? This may help to further subgroup these proteins.

      This is an interesting point but would actually be technically challenging to do. For this we would need to generate a KIC11endo parasite line for each of these KICs and then do co-localisation in schizonts. However, the outcome of this likely would not be very clear. The reason for this is as follows. There are foci of KIC11 that do overlap with K13 in schizonts. One can expect that these foci show KIC11 at the K13 compartment and that the other KICs would overlap with KIC11 in these K13 foci in schizonts. Hence, we would also need to see K13 to find the non-K13 compartment KIC11 foci and see if these contained the KIC of interest. This is technically challenging because it would mean we would need a third fluorescent protein which is not that trivial to do. Due to the difficulty to do this and the large amount of work involved and the already considerable amount of data in this manuscript, we believe this will be better suited for a different study.

      6) -As a general comment: to make the beautiful IFAs more accessible to a broader readership, I would encourage the authors to switch the color-coding to green/magenta/blue or an equivalent color system or add grayscale images.

      This was done as suggested, all fluorescence images are now provided as greyscale images and the overlays are shown in magenta/green.

      Reviewer #2 (Significance (Required)):

      Characterizing the molecular components involved in Plasmodium endocytosis will not only reveal interesting biology in these highly adapted parasites, but will more importantly lead to a better understanding and potentially open new avenues for intervention of ART resistance. The here presented manuscript is a carefully executed follow-up on previous work done in Dr. Spielmann's lab focusing on the K13 compartment. The authors use established assays to characterize novel components and reveal three new players in endocytosis with one mediating ART resistance in vitro. The proposition that parts of the K13 compartment have a function other than endocytosis is interesting, but will have to await more data from future studies. Taken together this manuscript adds significantly to our understanding of endocytosis in P. falciparum.

      This work is of interest for cell and molecular biologists working on Apicomplexa, but especially for the Plasmodium community.

      We thank the reviewer for this positive assessment.

      I am a cell and molecular biologist working on Toxoplasma gondii

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors characterized 4 proteins from P. falciparum via cellular (co-)localization, endocytosis, parasite growth, and artemisinin resistance assays. These proteins have been identified as candidates for Kelch13 compartment and a possible role in endocytosis in their previously work with quantitative BioID for potential proximity to K13 and Eps15 (Birnbaum et al. 2020). In the current work, additional 6 proteins were not confirmed as being associated to the K13 compartment. This experimental work was complemented by an in-silico analysis of protein domains based on AlphaFold algorithm. For this protein structure evaluation all proteins were chosen, which were experimentally confirmed to be linked to the K13 compartment in the current publication and previous work. With the work 3 novel proteins linked to artemisinin resistance or endocytosis could be functionally described (KIC12, MCA2, and MyoF) and a number of hypotheses were generated.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      The quality of the presented work is solid, the experimental design is adequate, and methods are presented clearly. The publication contains a lot of results both presented in text and in the figures and it is not always straight forward for the reader to follow the descriptions due to many details presented and a lack of context for some of these experiments.

      We thank the reviewer for this overall positive assessment.

      We now reordered the results section in an attempt to increase the flow of the manuscript. We also made changes to improve the context for the results. Given the further (very valid) requests for data on schizonts and invasion, there was an increased danger for a less linear manuscript that we hope to have acceptably managed with the re-arrange.

      Specific suggestions for consideration by the authors to improve the manuscript. Abstract: 1) R 31: Mention how the 4 proteins were identified as candidates, you need to refer to previous work to clarify this

      To clarify this the sentence was changed to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) R38: "Second group of proteins" is confusing - different from the 4 mentioned above? Significance to endocytosis unclear. Please unify terminology in the manuscript, see also comment below on proxiome.

      We changed the wording to clarify the group issue in the abstract as follows line 34: "Functional analyses, tests for ART susceptibility as well as comparisons of structural similarities using AlphaFold2 predictions of these and previously identified proteins showed that canonical vesicle trafficking and endocytosis domains were frequent in proteins involved in resistance or endocytosis (or both), comprising one group of K13 compartment proteins, While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process. Another group of K13 compartment proteins did not influence endocytosis or ART susceptibility and lacked detectable vesicle trafficking domains. We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion.”

      3) Abstract can only be understood after reading the full publication

      We attempted to amend this by expanding the abstract, particularly the changes highlighted in the previous two points.

      Results: 4) Table 1 is missing from the submitted materials

      We apologise for this mistake. Table 1 is now included.

      5) Consider to shorten and stratify the result section to focus on the significant data

      We rearranged the results in an attempt to streamline this section and are now starting with MyoF in the revised manuscript. However, as highlighted by the requests from reviewer 1, many details need to be available to support our conclusions. For instance the fact that GFP-tagging partially inactivated MyoF asked for further data to support our conclusion (HA-tagged version, showing that the location of the GFP-tagged version was consistent with the HA-tagged version, showing to what extent the different constructs affected growth and correlated with number of vesicles and bloating, see new figure 1M) or that KIC12 has two locations. Overall, we are therefore hesitant to remove data or description from the result part.

      6) Unclear how the localization and functionalization assays might be impaired by the fusion proteins Significance of ART resistance assay is not clear, in presence of strong growth effects due to inactivation or truncation of genes/proteins

      As indicated also in the example given in the previous point (this reviewer #5), the use of different cell lines (GFP-tagged live cells and small epitope tag in IFA) for targets with an indication for an effect of the tagging confirm that the location we assigned is reasonable. In the case of MyoF, the HA-tagged line, the partial inactivation due to GFP and the further inactivation in the GFP-tagged line by knock sideways show plausible increase of phenotypes (vesicle accumulation and bloated FV assays). Thereby the GFP-tagged line can be seen as a partial inactivation line that further supports our conclusions and overall this paints a consistent picture of the function of this protein in endocytosis (see new Figure 1M better illustrating this). Please note that the difference in location shown by this line compared to the HA-tagged proteins is only small (see also reviewer 1 major point 23ff). See also general discussion on tags at the end of this rebuttal.

      Significance of ART resistance assay: The ‘ART resistance assay’ is done comparing +/- ART (DHA) in identical parasites (originating from the same culture and the same condition). Hence, any growth effects are cancelled out and effects in reducing ART susceptibility would - if at all - be underestimated (see more detailed response to point 28, reviewer 1 and controls in Birnbaum et al., 2020 where we tested an unrelated essential protein, unrelated chemical insult and rapalog on 3D7 and did not detect any effect on RSA survival).

      MCA 7) Stratify results, order by significance of findings, it appears to be described in chronological order, improve readability/flow, eg ART resistance if mentioned in r138, but only reported in r183ff

      We attempted to stratify, but then the reason for generating the partial MCA2 disruption parasite line becomes very arbitrary and would leave the reader wondering why we at all truncated the protein at two thirds of the protein. Hence, we do not see a way around this chronological reporting. However, this part is now not at the start of the experimental results section anymore, possibly making it overall a bit more palatable.

      MyoF 8) R195 to 197 - consider moving to discussion as it is distracting here

      This was shortened and additional information (asked for by reviewer 1, major point 22) to clarify that MyoF was previously called MyoC, was added (line 147): “The presence of MyosinF (MyoF; PF3D7_1329100 previously also MyoC), in the K13 proxiome could indicate an involvement of actin/myosin in endocytosis in malaria parasites. "

      9) Term proxiome is introduced above, but not used in result section - suggest to unify language, eg r195 uses "K13 compartment DiQ-BioIDs" instead, which is not very convenient for the reader

      We carefully reviewed this and made this more consistent.

      10) What is the enrichment factor? Please provide for this and the following proteins, eg in Table 1

      The enrichment factor is log2 enrichment over control and this is now provided in table S1 (see also detailed answer for Reviewer 1 major point 12).

      11) R225 to 243 - overall significance of the growth experiments with mislocaliser is not clear, consider removing from manuscript or explain relevance more clearly

      See also point 28, reviewer 1: This experiment is actually quite important. It shows that if we conditionally inactivate the GFP-tagged MyoF, the growth is further reduced, as stated in line 208. It might have been confusing that the mislocalisation is only partial, but this is equivalent to a partial knock down and hence is useful. This becomes even more relevant with the specific assays following in the next paragraph: while the tagging of MyoF already resulted in vesicles, conditional inactivation with KS generated even more vesicles, showing that the same phenotype was rapidly increased when MyoF was further inactivated by a different means and this also correlated with growth. Hence, this is actually a very consistent phenotype that despite some shortcomings of the tools available to analyse this protein (due to the partial inactivation by the GFP tag) in our eyes looks very convincing. We now added a graph showing the correlation of growth and phenotypes to illustrate this (Figure 1L).

      We also tried to make this clearer by changing line 200 to: Hence, conditional inactivation of MyoF further reduced growth despite the fact that the tag on MyoF already led to a substantial growth defect, indicating an important role for MyoF during asexual blood stage development.” And line 208 to:“ This was even more pronounced upon conditional inactivation of MyoF by KS (Figure 1H), suggesting this is due to a reduced function of MyoF.”

      12) KIC11/KIC12 Enrichment factor?

      The enrichment (’average log2 Ratio normalized Kelch13 from Birnbaum et al. 2020’) is 1.65 for KIC11 and 1.32 for KIC12, which is now also explicitly shown in column D of Table S1.

      ** Referees cross-commenting**

      I would like to applaud reviewer #1 for a great, very thorough review and lots of detailed suggestions. I agree with the conclusions mentioned in the significance evaluation from reviewer #1 and #2: the work presented does not contain novel methods and the scope is rather narrow with the current results. (I am working on clinical studies with novel antimalarial agents)

      Reviewer #3 (Significance (Required)):

      On the one hand side, the authors have wrapped up some of the remaining protein candidates of the K13 compartment and could verify 4 of 10 proteins. The work is of interest for the scientific community working on endocytosis and malaria drug resistance mechanisms. Overall, the conclusions and findings from the previous work, Birnbaum et al. 2020, could be confirmed and extended mainly using the methods previously described. On the other hand, the authors made use of progress in protein structure predictions and identified domains linking the K13 compartment proteins to putative functions. The overlaid protein folds of the newly identified domains in figure 5 look convincing, but I can't comment on the technical details or cut-off used for this in-silico analysis.

      Extended general remarks on the systems used for this work:

      Mainly reviewer 1 suggest (in the general comments and the significance statement) that other systems would have been better suited to use for this work, namely glmS and diCre and also has concerns about the large tag which is seconded by a comment of reviewer 3. In light of this we here provide some extended considerations on the properties for conditional systems and tagging in regards to the goals of this work.

      We would like to point out that we do have experience with the systems considered better-suited by the reviewer (one of the first authors has extensively used glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023) and our lab was one of the first to adopt the diCre system in P. falciparum parasites and we regularly us it (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Kimmel et al., 2023)). Clearly, these methods have a lot of strengths but there are a number of issues to be considered for the assays we use in this work (see the next section on conditional inactivation systems). In a nutshell, we believe diCre would give a more reliable readout of the absolute level of "essentiality" (i.e. importance for growth) but is unsuitable or at least difficult to use for the assays that reveal the function of our interest in this work. GlmS basically combines the drawbacks of diCre and knock sideways and hence for most targets is not expected to give a better readout of level of "essentiality" but is similarly difficult to use for our specific assays. The fact that both of these systems are possible to use without adding a tag to the target may be an advantage but without tag one loses some very important features that can be critical to understand the outcome with a given system (see considerations on the tag further below).

      Conditional inactivation systems:

      1. __ speed of inactivation:__ glms acts on mRNA and diCre on the gene level, which makes them slower than techniques acting directly on the protein such as DD or KS. With diCre, mRNA and protein is still left, even if the gene is very rapidly excised. For instance for Kelch13 it takes 3-4 days after excising the gene until protein levels have waned enough that this manifests in a reduced growth (Birnbaum et al., 2017). While in some instances diCre permits same cycle analyses if the protein has a very rapid turn-over (e.g. Rab5a, (Birnbaum et al., 2017)), control in a few hours is still difficult. For vesicle accumulation and bloated food vacuole assays, which are done over comparably short time frames and with specific stages, it is rather challenging to hit the correct time of induction to have all the cells at the correct stage with suitably (and uniformly, ie all cells) sufficiently reduced target protein levels during the assay time. Slow acting systems are also more prone to secondary effects. The more immediate the inactivation, the closer it is to the core of the affected function. With vesicle trafficking processes this is particularly relevant as all vesicle trafficking in a cell is interconnected and there are always recycling pathways that maintain the membrane and protein homeostasis of individual compartments. Particularly for endocytosis there seem to be compensatory capacities at least in other organisms (see e.g. (Chen and Schmid, 2020)). One reason why knock sideways was developed is that it permitted to avoid compensatory changes when vesicle adaptors are inactivated (Robinson et al., 2010).

      The comparably short time frame for malaria parasites to go through different stages during blood stage development also is an issue relevant for inactivation speed. The advantage of speed and the danger of obscured phenotypes is highlighted by our work on VPS45 which showed that in trophozoites this protein is involved in the transport of hemoglobin to the FV whereas in late stages it also has a role in secretory processes. Both of these functions we were able to specifically assess in the same growth cycle using KS to rapidly inactivate the protein (Bisio et al., 2020) but with a slower system would have been more complicated to dissect.

      Speed of effect with glmS: unless the KS does not work well, glmS is slower acting than KS (it does not target the already synthesised protein which can remain in the cell) and also often suffers from only partial inactivation, hence the benefit of using it here is unclear. The option to have an untagged protein is a plus, however it also is a minus, as assessing efficiency (particularly in live cells e.g. for bloated assays etc a fluorescent tag is the only direct option to assess inactivation of target) is critical to ensure the phenotype manifests at the stage of interest.

      lethality/absolute phenotypic effects are detrimental to some assays to study the functions we are interested in for this work: no RSA can be conducted, if the gene is lost and the parasites die. Again, with diCre, one could attempt to hit the point when the parasites have lost sufficient amounts of the target protein when they are placed under ART but then the parasites need to continue growing for ~3 days, which is not possible if the cKO is lethal except for very slowly turning over proteins. However, in that latter case, the parasites likely still had full functionality of the target protein at the beginning of the RSA, when the drug pulse happens and there would be no effect. Knock sideways solves these problems by permitting knock sideways inactivation only under ART (or with a few hours pre-incubation depending on the inactivation speed) to not yet affect growth in a severe manner but inhibiting the process the protein is involved in. It may be possible to use glmS for RSAs, but the slow speed would complicate it (it would not permit control of target protein levels in a matter of a few hours to inactivate the target protein and then re-install it).

      None-absolute inactivation is also a strength for some functional assays. While we really like using diCre, in the case of EXP1 it made it necessary to complement the exp1 cKO parasites with low levels of EXP1 to be able to do functional assays without killing the parasites (Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021). While the lethality issue does not apply to glmS (like knock sideways, it also can be tuned), it is unclear what would be gained over knock sideways. Knockdown levels with glmS vary from gene to gene and cannot be predicted, it is in most cases considerably slower than KS, it requires glucosamine which becomes toxic at higher concentrations and might introduce off target effects and tracking protein levels during the assay would equally need GFP tagging.

      Integration of properties of conditional systems

      Given the above discussed properties, several factors have to be considered to be able to use a system for a given assay. Stage-specific transcription is one example. For diCre a protein not expressed in e.g. rings permits to remove the gene and the protein is never made in that parasite development cycle. We exploited this for instance for two proteins only expressed from the trophozoite stage onwards (Kimmel et al., 2023). However, if lethal (absolute effect problem), this also means one can also only see the phenotype on onset of expression of the target (e.g. if in mitosis, the first nuclear division in case the protein is absolutely essential for the process). This is just one example of such issues. Expression timing, turnover of the protein and homogeneity of stage-specific loss of protein will all influence how clearly the phenotype can be determined. All this will decide the exact time of loss/inactivation of the target protein to levels generating a phenotype and ideally therefore can be monitored during an assay (see considerations on tagging).

      For these reasons vesicle accumulation or bloated food vacuole assays are difficult with slow systems as ideally the target should rapidly be inactivated at the trophozoite stage and the result monitored before the cells have moved to the schizont stage. For this a well responding knock sideways is ideal as the protein can be rapidly taken away (sometimes within seconds) to visualise the immediate, direct effect in the cell.

      As shown for KIC11, there is also no disadvantage of using KS for proteins with other assays or proteins that result in different phenotypes. It permits stage-specific same cycle inactivation without having to worry about the turnover of mRNA and protein (Fig. 2F,G). Thus, besides the advantages of knock sideways for endocytosis related assays and RSAs, we also see no disadvantage of using knock sideways for the functional study of KIC11 which has a role other than endocytosis. KS also permits to specifically target the K13 pool of KIC12, something impossible or very difficult to do with other systems. Hence, we are of the opinion that the system for inactivation was adequate for most of the proteins analysed in this manuscript.

      Large tag: we agree that GFP-tagging can be a disadvantage but in our opinion its benefits often outweigh the drawbacks because it permits easy and immediate (on individual cell level, if need be) monitoring of the presence/location of the target protein (e.g. after KS, but given the discrepancy of the timing between gene excision and protein loss, it might be even more important for techniques such as diCre). No fixing/permeabilisation (prone to artifacts, prevents immediate view of cells) to detect a target with specific antibodies or via a small tag is needed with GFP. Similarly, the use of Western blots to do this is time consuming and impractical if monitoring of left-over protein in the course of an assay such as a bloated food vacuole assay is needed.

      In many cases, adding GFP has no negative effect. In addition, if the bulky folded structure of GFP is tolerated, it usually also tolerates the 2 to 4 12kDa FKBP domains in our standard tag. We also typically add a linker. This approach has worked for a large number of different proteins, including many essential ones for which we would not otherwise have obtained the integration cell lines (Birnbaum et al., 2017; Jonscher et al., 2019; Hoeijmakers et al., 2019; Birnbaum et al., 2020; Kimmel et al., 2023; Sabitzki et al., 2023). Hence, whenever a cell line is obtained with it, this tag in most cases is not a disadvantage. Admittedly an exception in this is MyoF and to some extent maybe MCA2 (we would like to stress that in the case of MCA2 the reason for not being able to obtain the full length tagged cell line is unclear: the protein can be severely truncated to less than 3% of its amino acid sequence and a GFP-tag is tolerated on the version with 2/3s of the protein left, which gives no good reason why the full length was not obtained; a potential reason could be a dominant negative effect). However, we obtained the full length with a small tag detected by IFA for both, MyoF and MCA2 and the location of these agreed well with the GFP tagged versions, indicating that the GFP-tagged versions are useful to show the location of these proteins in live cells.

      There are also tricks to attempt monitoring the effect of e.g. diCre without tagging the target. For instance, if a fluorescent protein is connected to excision without actually being fused to the target (ie excision of the gene leads to its expression of e.g. GFP), which would avoid adding a tag to the target itself. However, the problem with this is that expression of GFP does only show excision, but mRNA producing the target protein and left over target protein may still be there in the cell. All in all, the GFP-tag on the target, while with some drawbacks, is still our preferred method to control to monitor the target protein in the cell (in principle permitting quantification of ablation efficiency on the individual cell level).

      Conclusion on these considerations for this manuscript

      Based on these considerations we do not see the immediate benefit of changing the system for the conclusions drawn from this study and are unsure if they are indeed better suited for this work as suggested. While a more exact readout of "essentiality" might be possible with the diCre system we are of the opinion this is less important than learning the function of a protein which - as outlined above - we believe to be considerably more difficult with diCre and even more so with glmS considering our target functions. The same applies to target specific cellular pools of a protein as done here for KIC12. Clearly MyoF is one example where the employed systems shows limitations, but with the new Figure part showing consistency in phenotype with degree of inactivation (importantly with two different forms of inactivation) and the clarification that the location of the GFP-tagged and HA-tagged versions are actually quite similar in location, we do not think employing an extra system is warranted for the conclusions of this work. Admittedly, the apparent lack of need in ring stags might give an opening to attack MyoF using diCre (by excision before its major expression peak), but depending on lethality this might preclude extended analyses (possibly vesicle assays, for sure not RSAs).

      In the end the question is, if our approach provides the function of target analysed in this work and based on the data in our manuscript and the arguments in the rebuttal, we are reasonably confident that this is the case. It is not very likely the other mentioned techniques would result in a different conclusion on the function of the here studied proteins. In fact, we expect other commonly used techniques to be less suitable for the key assays in this work.

      References used in our responses to the reviewers’ comments:

      Behrens, H.M., Schmidt, S., Peigney, D., Sabitzki, R., Henshall, I., May, J., et al. (2023) Impact of different mutations on Kelch13 protein levels, ART resistance and fitness cost in Plasmodium falciparum parasites. bioRxiv 2022.05.13.491767.

      Behrens, H.M., Schmidt, S., and Spielmann, T. (2021) The newly discovered role of endocytosis in artemisinin resistance. Med Res Rev med.21848.

      Behrens, H.M., and Spielmann, T. (2023) Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710.

      Birnbaum, J., Flemming, S., Reichard, N., Soares, A.B., Mesén-Ramírez, P., Jonscher, E., et al. (2017) A genetic system to study Plasmodium falciparum protein function. Nat Methods 14: 450–456.

      Birnbaum, J., Scharf, S., Schmidt, S., Jonscher, E., Hoeijmakers, W.A.M., Flemming, S., et al. (2020) A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science (80- ) 367: 51–59.

      Bisio, H., Chaabene, R. Ben, Sabitzki, R., Maco, B., Baptiste Marq, J., Gilberger, T.W., et al. (2020) The zip code of vesicle trafficking in apicomplexa: Sec1/munc18 and snare proteins. MBio 11: 1–21.

      Blum, M., Chang, H.Y., Chuguransky, S., Grego, T., Kandasaamy, S., Mitchell, A., et al. (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49: D344–D354.

      Borrmann, S., Straimer, J., Mwai, L., Abdi, A., Rippert, A., Okombo, J., et al. (2013) Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep 3.

      Bozdech, Z., Llinás, M., Pulliam, B.L., Wong, E.D., Zhu, J., and DeRisi, J.L. (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1: e5.

      Burnette, W.N. (1981) “Western Blotting”: Electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal Biochem 112: 195–203.

      Casella, J.F., Flanagan, M.D., and Lin, S. (1981) Cytochalasin D inhibits actin polymerization and induces depolymerization of actin filaments formed during platelet shape change. Nature 293: 302–305.

      Cerqueira, G.C., Cheeseman, I.H., Schaffner, S.F., Nair, S., McDew-White, M., Phyo, A.P., et al. (2017) Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance. Genome Biol 18: 78.

      Chen, Z., and Schmid, S.L. (2020) Evolving models for assembling and shaping clathrin-coated pits. J Cell Biol 219.

      Dell’Angelica, E.C., Puertollano, R., Mullins, C., Aguilar, R.C., Vargas, J.D., Hartnell, L.M., and Bonifacino, J.S. (2000) GGAs: A family of ADP ribosylation factor-binding proteins related to adaptors and associated with the Golgi complex. J Cell Biol 149: 81–93.

      Demas, A.R., Sharma, A.I., Wong, W., Early, A.M., Redmond, S., Bopp, S., et al. (2018) Mutations in Plasmodium falciparum actin-binding protein coronin confer reduced artemisinin susceptibility. Proc Natl Acad Sci 201812317.

      Henrici, R.C., Edwards, R.L., Zoltner, M., Schalkwyk, D.A. van, Hart, M.N., Mohring, F., et al. (2020a) The plasmodium falciparum artemisinin susceptibility-associated ap-2 adaptin μ subunit is clathrin independent and essential for schizont maturation. MBio 11.

      Henrici, R.C., Schalkwyk, D.A. van, and Sutherland, C.J. (2020b) Modification of pfap2μ and pfubp1 Markedly Reduces Ring-Stage Susceptibility of Plasmodium falciparum to Artemisinin in Vitro. Antimicrob Agents Chemother 64.

      Henriques, G., Hallett, R.L., Beshir, K.B., Gadalla, N.B., Johnson, R.E., Burrow, R., et al. (2014) Directional selection at the pfmdr1, pfcrt, pfubp1, and pfap2mu loci of Plasmodium falciparum in Kenyan children treated with ACT. J Infect Dis 210: 2001–2008.

      Heredero-Bermejo, I., Varberg, J.M., Charvat, R., Jacobs, K., Garbuz, T., Sullivan, W.J., and Arrizabalaga, G. (2019) TgDrpC, an atypical dynamin-related protein in Toxoplasma gondii, is associated with vesicular transport factors and parasite division. Mol Microbiol 111: 46–64.

      Hirst, J., Lui, W.W.Y., Bright, N.A., Totty, N., Seaman, M.N.J., and Robinson, M.S. (2000) A family of proteins with γ-adaptin and VHS domains that facilitate trafficking between the trans-golgi network and the vacuole/lysosome. J Cell Biol 149: 67–79.

      Hirst, J., and Robinson, M.S. (1998) Clathrin and adaptors. Biochim Biophys Acta - Mol Cell Res 1404: 173–193.

      Hoeijmakers, W.A.M., Miao, J., Schmidt, S., Toenhake, C.G., Shrestha, S., Venhuizen, J., et al. (2019) Epigenetic reader complexes of the human malaria parasite, Plasmodium falciparum. Nucleic Acids Res 47: 11574–11588.

      Jonscher, E., Flemming, S., Schmitt, M., Sabitzki, R., Reichard, N., Birnbaum, J., et al. (2019) PfVPS45 Is Required for Host Cell Cytosol Uptake by Malaria Blood Stage Parasites. Cell Host Microbe 25: 166-173.e5.

      Kimmel, J., Schmitt, M., Sinner, A., Jansen, P.W.T.C., Mainye, S., Ramón-Zamorano, G., et al. (2023) Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell Syst 14: 9-23.e7.

      Koreny, L., Mercado-Saavedra, B.N., Klinger, C.M., Barylyuk, K., Butterworth, S., Hirst, J., et al. (2023) Stable endocytic structures navigate the complex pellicle of apicomplexan parasites. Nat Commun 14: 2167.

      Kumari, V., Singh, A.P., Singh, J., Sharma, R., Akhter, M., Mishra, P.K., et al. (2018) Biochemical characterization of unusual cysteine protease of P. falciparum, metacaspase-2 (MCA-2). Mol Biochem Parasitol 220: 28–41.

      Lazarus, M.D., Schneider, T.G., and Taraschi, T.F. (2008) A new model for hemoglobin ingestion and transport by the human malaria parasite Plasmodium falciparum. J Cell Sci 121: 1937–1949.

      Lopez-Hernandez, F.J., Ortiz, M.A., Bayon, Y., and Piedrafita, F.J. (2003) Z-FA-fmk inhibits effector caspases but not initiator caspases 8 and 10, and demonstrates that novel anticancer retinoid-related molecules induce apoptosis via the intrinsic pathway. Mol Cancer Ther 2: 255–263.

      Lord, S.J., Velle, K.B., Mullins, R.D., and Fritz-Laylin, L.K. (2020) SuperPlots: Communicating reproducibility and variability in cell biology. J Cell Biol 219.

      MalariaGEN, Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., et al. (2021) An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome open Res 6: 42.

      Marti, M., Good, R.T., Rug, M., Knuepfer, E., and Cowman, A.F. (2004) Targeting malaria virulence and remodeling proteins to the host erythrocyte. Science 306: 1930–3.

      Mesén-Ramírez, P., Bergmann, B., Elhabiri, M., Zhu, L., Thien, H. von, Castro-Peña, C., et al. (2021) The parasitophorous vacuole nutrient pore is critical for drug access in malaria parasites and modulates the fitness cost of artemisinin resistance. Cell Host Microbe 0: 283.

      Mesén-Ramírez, P., Bergmann, B., Tran, T.T., Garten, M., Stäcker, J., Naranjo-Prado, I., et al. (2019) EXP1 is critical for nutrient uptake across the parasitophorous vacuole membrane of malaria parasites. PLoS Biol 17: e3000473.

      Mukherjee, A., Crochetière, M.-È., Sergerie, A., Amiar, S., Thompson, L.A., Ebrahimzadeh, Z., et al. (2022) A Phosphoinositide-Binding Protein Acts in the Trafficking Pathway of Hemoglobin in the Malaria Parasite Plasmodium falciparum. MBio 13.

      Otto, T.D., Wilinski, D., Assefa, S., Keane, T.M., Sarry, L.R., Böhme, U., et al. (2010) New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol Microbiol 76: 12–24.

      Robinson, M.S., Sahlender, D.A., and Foster, S.D. (2010) Rapid Inactivation of Proteins by Rapamycin-Induced Rerouting to Mitochondria. Dev Cell 18: 324–331.

      Sabitzki, R., Schmitt, M., Flemming, S., Jonscher, E., Hoehn, K., Froehlke, U., and Spielmann, T. (2023) Identification of a Rabenosyn-5 like protein and Rab5b in host cell cytosol uptake reveals conservation of endosomal transport in malaria parasites. bioRxiv 2023.04.05.535711.

      Simwela, N. V., Hughes, K.R., Roberts, A.B., Rennie, M.T., Barrett, M.P., and Waters, A.P. (2020) Experimentally engineered mutations in a ubiquitin hydrolase, UBP-1, modulate in vivo susceptibility to artemisinin and chloroquine in plasmodium berghei. Antimicrob Agents Chemother 64.

      Spielmann, T., Gras, S., Sabitzki, R., and Meissner, M. (2020) Endocytosis in Plasmodium and Toxoplasma Parasites. Trends Parasitol 36: 520–532.

      Subudhi, A.K., O’Donnell, A.J., Ramaprasad, A., Abkallo, H.M., Kaushik, A., Ansari, H.R., et al. (2020) Malaria parasites regulate intra-erythrocytic development duration via serpentine receptor 10 to coordinate with host rhythms. Nat Commun 11.

      Traub, L.M., Downs, M.A., Westrich, J.L., and Fremont, D.H. (1999) Crystal structure of the α appendage of AP-2 reveals a recruitment platform for clathrin-coat assembly. Proc Natl Acad Sci U S A 96: 8907–8912.

      Wagner, M.P., Formaglio, P., Gorgette, O., Dziekan, J.M., Huon, C., Berneburg, I., et al. (2022) Human peroxiredoxin 6 is essential for malaria parasites and provides a host-based drug target. Cell Rep 39: 110923.

      Wall, R.J., Zeeshan, M., Katris, N.J., Limenitakis, R., Rea, E., Stock, J., et al. (2019) Systematic analysis of Plasmodium myosins reveals differential expression, localisation, and function in invasive and proliferative parasite stages. Cell Microbiol 21.

      Wan, W., Dong, H., Lai, D.-H., Yang, J., He, K., Tang, X., et al. (2023) The Toxoplasma micropore mediates endocytosis for selective nutrient salvage from host cell compartments. Nat Commun 14: 977.

      Wichers-Misterek, J.S., Binder, A.M., Mesén-Ramírez, P., Dorner, L.P., Safavi, S., Fuchs, G., et al. (2023) A Microtubule-Associated Protein Is Essential for Malaria Parasite Transmission. MBio .

      Wichers, J.S., Gelder, C. van, Fuchs, G., Ruge, J.M., Pietsch, E., Ferreira, J.L., et al. (2021a) Characterization of Apicomplexan Amino Acid Transporters (ApiATs) in the Malaria Parasite Plasmodium falciparum. mSphere 6.

      Wichers, J.S., Mesén-Ramírez, P., Fuchs, G., Yu-Strzelczyk, J., Stäcker, J., Thien, H. von, et al. (2022) PMRT1, a Plasmodium -Specific Parasite Plasma Membrane Transporter, Is Essential for Asexual and Sexual Blood Stage Development. MBio 13.

      Wichers, J.S., Scholz, J.A.M., Strauss, J., Witt, S., Lill, A., Ehnold, L.-I., et al. (2019) Dissecting the Gene Expression, Localization, Membrane Topology, and Function of the Plasmodium falciparum STEVOR Protein Family. MBio 10: e01500-19.

      Wichers, J.S., Tonkin-Hill, G., Thye, T., Krumkamp, R., Kreuels, B., Strauss, J., et al. (2021b) Common virulence gene expression in adult first-time infected malaria patients and severe cases. Elife 10.

      Wichers, J.S., Wunderlich, J., Heincke, D., Pazicky, S., Strauss, J., Schmitt, M., et al. (2021c) Identification of novel inner membrane complex and apical annuli proteins of the malaria parasite Plasmodium falciparum. Cell Microbiol 23: e13341.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD, that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question. But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      Major Comments

      line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth"

      Line 40: please change 'second group' to 'this group'

      line 41: state here that despite it being essential, it is unknown what it is involved in.

      Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts. MCA2 expression seems also different for the MCA2TGD-GFP with no expression in rings.

      Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      Line 212: The overlap of K13 with MyoF in Fig 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion. Line 237: Showing a DNA marker (DAPI, Hoescht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here.<br /> Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified? Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed. Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised? Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF? Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper.

      Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      Line 863-864: this sentence seems to be out of place.

      Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      Minor Comments

      line 29: remove 'are'.

      Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      line 44: remove 'the'

      Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      Line 53: please insert the word 'have' between now and also.

      Line 54: please change 'was linked' to is linked

      Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic. Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Line 90: authors should either say "in previous works" or "in a previous work"

      Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Line 95: please change 'rate' to number

      Line 109: Please include a coma before (ii).

      Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Line 131: please define SNP - this is the first use of the acronym.

      Line 133-134: South-East Asia instead of "South Asia"

      Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      Line 152: add "the" between 'and spatial'

      Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Line 377: Figure 4I, please correct 1st panel Y axis legend

      Line 404: replace "dispensability" with dispensable

      Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit " come from. Do you have a reference?

      Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      Line 535: Have AP-2 or AP-2 been shown to be at the K13 compartment?

      Line 569: reference 43 is wrong

      Line 746: typo "ot" instead of or.

      Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing. Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Referees cross-commenting

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      Significance

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank all four reviewers for their helpful and constructive comments. We have gone through each and every comment and proposed how we would address each point raised by the reviewers. We are confident our proposed revisions are feasible within a reasonable and expected time frame. Some of the comments regarding minor typo/aesthetics and extra references have already been addressed in the transferred manuscript. The changes are highlighted in yellow in the transferred manuscript.

      2. Description of the planned revisions

      Reviewer #1

      Major points:

      1. The presented work itself (Figures 1-4) does not need significant adjustments prior to publication, in my view, with only a few points to address. However, the work in Figure 5- doesn't really support the claims the authors make on its own, and would require some additional experiments or at the very least discussion of the caveats to its current form.

      We thank the reviewer for these comments and will follow the reviewer’s suggestion by discussing the caveats regarding the interpretation of Figure 5. We will also add to the discussion to suggest future research approaches beyond the scope of this manuscript that would address the functional importance of localised mRNA translation. We will briefly mention in the discussion methods such as the quantification of the mRNA foci and the disruption of the mRNA localisation signals to disrupt localised translation and the use of techniques such as Sun-Tag (Tanenbaum et al, 2014) and FLARIM (Richer et al, 2021) to visualise local translation directly.

      Tanenbaum et al, 2014 DOI: 10.1016/j.cell.2014.09.039

      Richer et al, 2021 DOI: 10.1101/2021.08.13.456301

      1. Localized glia transcripts, are they "glial/CNS/PNS" significant or are they similar to other known datasets of protrusion transcriptomes? The authors compared their 4801 "total" localized to a local transcriptome dataset from the Chekulaeva lab finding that a significant fraction are localized in both. As the authors note, this is in good agreement with a recent paper from the Talifarro lab showing conservation of localization of mRNAs across different cell types. What the authors haven't done here, is further test this by looking at other non-neuronal projection transcriptomic datasets (for example Mardakheh Developmental Cell 2015, among others). If the predicted glia-localized processes are similar to non-neuronal processes transcriptomes, this would further strengthen this claim and rule out some level of CNS/PNS derived linage driving the similarities between glia and neuronal localized transcripts.

      This is a good point and we thank the review for pointing out this interesting cancer data set. We will do as the reviewer suggests and intersect our data with Mardakheh Dev Cell 2015 to test the further generality of localisation in neurons and glia, in other cell types. Specifically, we plan to intersect both glial (this study) and neuronal (von Kuegelgen & Chekulaeva, 2020) dataset with protrusive breast cancer cells (Mardakeh et al, 2015).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      Mardakeh et al, 2015 DOI: 10.1016/j.devcel.2015.10.005

      1. The presentation/discussion around Figure 3 is a bit weaker than other parts of the manuscript, and it doesn't really contribute to the story in its current form. Notably there is no discussion about the significance of glia in neurological disorders until the very end of the manuscript (page 21), meaning when its first brought up.. it just sits there as a one off side point. The authors might consider strengthening/tightening up the discussion here, if they really want to keep it as a solo main figure rather than integrating it somewhere else/putting it into supplemental. In my view, Figures 2 & 3 should be merged into something a bit more streamlined.

      This is a good point. We plan to strengthen the presentation of Figure 3 and discussion of the significance of glia in neurological disorders by adding a description of the Figure in the Results section and highlighting the significance of glia in nervous system disorders in the Discussion section.

      1. Why aren't there more examples of different mRNAs in Figure 4? Seems a waste to kick them all to supplemental.

      We agree that it could be helpful to show different expression patterns in the main figure. To address this point we will add Pdi (Fig. S4D), which shows mRNA expression in both the glia and the surrounding muscle cell. This pattern is in contrast to Gs2, which is highly specific to glial cells. We will also note that although pdi mRNA is present in both the glia and muscle, Pdi protein is only abundant in the glia, suggesting that translation of pdi mRNA to protein is regulated in a cell-specific manner.

      1. The plasticity experiments, while creative, I think need to be approached far more cautiously in their interpretation. Given that the siRNAs will completely deplete these mRNAs- it really needs to be stressed any/all of the effects seen could just be the result of "defective" or "altered" states in this glial population- which has spill over effects on plasticity in at the NMJ. Without directly visualizing if these mRNAs are locally translated in these processes and assessing if their translation is modulated by their plasticity paradigm, all these experiments can say is that these RNAs are needed in glia to modulate ghost bouton formation in axons. This represents the weakest part of this manuscript, and the part that I feel does not actually backup the claims currently being made. Without any experiments to A. quantify how much of these transcripts are localized vs in the cell body of these glia, B. visualize/quantify the translation of these mRNAs during baseline and during plasticity; the authors cannot use these data to claim that localized mRNAs are required for synaptic plasticity.

      We are grateful to the reviewer for pointing out that we were not precise enough in defining our interpretation of the structural plasticity assay. We did not intend to claim that our results show that local translation of these transcripts is necessary for plasticity, only that these transcripts are localized and are required in the glia for plasticity in the adjacent neuron (in which the transcript levels are not disrupted in the experiment). Definitively proving that these transcripts are required locally and translated in response to synaptic activity would require genetic/chemical perturbations and imaging assays that would require a year or more to complete, so are beyond the scope of this manuscript. To address this point, we will clarify that the results do not show that localized transcripts are required, only that the transcripts are required somewhere specifically in the glial cell (without affecting the neuron level), and we can indeed show in an independent experiment that there are localized transcripts.

      Reviewer #2

      Major points:

      1. The authors analyse the 1700 shortlisted genes for Gene Ontology and associations with austism spectrum disorder, leading to interesting results. However, it is not clear to what extent the enrichments they observe are driven by their presumptive localization or if the associations are driven to a significant extent by the presence of these genes in the selected cell types in the Fly Cell Atlas. One way to address this would be to perform the GO and SFARI analysis on genes that are expressed in the same cells in the Fly Cell Atlas but were not shortlisted from the mammalian cell datasets - the results could then be compared to those obtained with the 1700 localized transcripts.

      This is a fair point raised by the reviewer as genes involved in neurological disease such as Autism Spectrum Disorder may be enriched in CNS/PNS cell types. We will follow the reviewer’s suggestion to perform GO and SFARI gene enrichment analysis in genes that were not shortlisted for presumptive glial localisation.

      1. Although the authors attempt to justify its inclusion, I'm not convinced why it was important to use the whole cell transcriptome of perisynaptic Schwann cells as part of the selection process for localizing transcripts. Including this dataset may reduce the power of the pipeline by including mRNAs that are not localized to protrusions. How many of the shortlisted 1700 genes, and how many of the 11 glial localized mRNAs in Table 5, would be lost if the whole cell transcriptome were excluded. More generally, what is the distribution of the 11 validated localizing transcripts in each dataset in Table 4? This information might be valuable for determining which dataset(s), if any, has the best predictive power in this context.

      We thank the reviewer for raising this point, which we will address with further analysis and adding to the discussion. We propose to address the criticism by running our analysis pipeline without the inclusion of the dataset using Perisynaptic Schwann Cells (PSCs) and then intersect with the PSCs-expressed genes, since their functional similarity with polarised Drosophila glial cells is highly relevant. We also agree with the reviewer that it would be a useful control for us to assess the ‘predictive power’ of each glial dataset by calculating their contribution to the shortlisted 1,700 glial localised transcripts and to the 11 experimentally validated transcripts via in situ hybridisation. To address this point, we plan to add this information in the revised manuscript.

      1. Did the authors check if any of the RNAi constructs are reducing levels of the target mRNA or protein? Doing so would strengthen the confidence in these important results significantly. In any case, the authors should also mention the caveat of potential off-target effects of RNAi.

      We thank the reviewer for their useful comment and agree that the extent to which the RNAi expression reduces the levels of mRNA is not specifically known. We will add a FISH experiment on lac, pdi and gs2 RNAi showing very strong reduction in mRNA levels. We will also add an explanation of the caveats of the use of the RNAi system to the discussion.

      1. Methods: what is the justification for assuming that if the RNAi cross caused embryonic or larval lethality then the 'next most suitable' RNAi line is reporting on a phenotype specific to the gene. If the authors want to claim the effect is associated with different degrees of knockdown they should show this experimentally. An alternative explanation is that the line used for phenotypic analysis in glia is associated with an off-target effect.

      We thank the reviewer for this comment. We agree that off target effects cannot in principle be completely ruled out without considerable additional experimental analysis beyond the scope of this manuscript. To address the criticism we will remove the expression data of the lines that cause lethality and revise the discussion to explain that the level of knockdown in each line is unknown, and would require further experimental exploration.

      Minor points:

      1. It would be helpful to have in the Introduction (rather than the Results, as is currently the case) an operational definition of mRNA localization in the context of the study. And is it known whether or not localization in protrusions is the norm in mammalian glia or the Drosophila larval glia? I ask because it may be that almost all mRNAs diffuse into the protrusion, so this is not a selective process. One interesting approach to test this idea might be to test if the 1700 shortlisted transcripts have a significant underrepresentation of 'housekeeping' functions.

      We thank the reviewer for this excellent suggestion. To address the comment, we will move our explanation of the operational definition of mRNA localization to the Introduction. We will also perform enrichment analysis of housekeeping genes within 1,700 shortlisted transcripts compared to the transcriptome background, as the reviewer suggested.

      Reviewer #3

      Major points:

      1. The authors have pooled data from different studies across different type of glial cells performed from in vitro to in vivo. While pooling datasets may reveal common transcripts enriched in processes, this may not be the best approach considering these are completely different types of glial cells with distinct function in neuronal physiology.

      We thank the reviewer for highlighting the need for us to further justify why we pooled datasets. We will revise the manuscript to better emphasise that the overarching goal of our study was to try to discern a common set of localised transcripts shared between the cells. The problem with analysing and comparing individual data sets is that much of the variation may be due to differences in the methods used and amount of material, rather than differences in the type of cells used. We will revise the discussion to make this point and plan to explain that our approach corresponds well with a previous publication pooling localised mRNA datasets in neurons (von Kugelgen & Chekulaeva 2021).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      1. It is important to note the limitations of the study. For example, DeSeq2 is biased for highly expressed transcripts. How robust was the prediction for low abundance transcripts?

      The presented 1,700 transcripts were shortlisted based on their presence and expression level (TPM) in glial protrusions rather than their relative enrichment. Nevertheless, the reviewer makes a valid criticism of our use of DESeq2, where we compared enriched transcripts in glial and neuronal protrusions in Figure 1D. To address this point we will discuss this caveat in the relevant section.

      The issue raised regarding low abundance transcript prediction raises an important question: does the likelihood of localisation to cell extremities correlate with mRNA abundance? We have already partially addressed this point, since our analysis of the fraction of localised transcripts per expression level quantiles shows only limited correlation. To address this comment, we will add these results in the revised manuscript as a supplementary figure.

      1. The authors identify 1,700 transcripts that they classify as "predicted to be present" in the projections of the Drosophila PNS glia. This was based on the comparison to all the mammalian glial transcripts. Since the authors have access to a transcriptomic study from Perisynaptic Schwann cells (PSCs), the nonmyelinating glia associated with the NMJ isolated from mice; it would be more convincing to then validate the extent of overlap between Drosophila peripheral glial with the mammalian PSCs. This may reveal conserved features of localized transcripts in the PNS, particularly associated with the NMJ function.

      Thank you for the valuable suggestion. A similar point was also raised by [Reviewer #2 - Major point 2] to re-run our pipeline excluding the PSCs dataset and intersect with the PSC transcriptome post-hoc. Please see the above section for our detailed response.

      1. Fig 2: What is the extent of overlap between the translating fractions versus the localized fraction? It will be informative to perform the functional annotation of the translating glial transcripts as identified from Fig 1D.

      This is an interesting question. To address this point, we plan to: (i) compare transcripts that are translated vs. localised in glial protrusions, and (ii) perform functional annotation enrichment analysis on the translated fraction of genes.

      1. "We conclude predicted group of 1,700 are highly likely to be peripherally localized in Drosophila cytoplasmic glial projections". To validate their predictions, the authors test some of these candidates in only one glial cell type. It might be worthy to extend this for other differentially expressed genes localized in another glial type as well.

      The presented in vivo analyses made use of the repo-GAL4 driver, which is active in all glial subtypes, including subperineurial, perineurial and wrapping glia that make distal projection to the larval neuromuscular junction. We agree that subtype-specific analysis would be highly informative, but we believe this is outside the scope of the current work where we aimed to identify conserved localised transcriptomes across all glial subtypes. Nevertheless, to address the comment, we plan to further clarify our use of pan-glial repo-GAL4 driver in the Results and Method section of the revised manuscript.

      1. Figure 5: The authors perform KD of candidate transcripts to test the effect on synapse formation. However, these are KD with RNAi that spans across the entire cell. To make the claim about the importance of "target" RNA localization in glia stronger, ideally, they should disrupt the enrichment specifically in the glial protusions and test the impact on bouton formation. Do these three RNAs have any putative localization elements?

      We agree with the review, that we would ideally test the effect of disruption of mRNA localization (and therefore localised translation). However, we feel these experiments are beyond the scope of this current study, as they will require a long road of defining localisation signals that are small enough to disrupt without affecting other functions. To address this comment we will revise the Discussion section to mention those difficulties explicitly, and clarify the limitations of the approach used in our study for greater transparency.

      Reviewer #4

      Major points:

      1. The authors use FISH to validate the glial expression of their target genes, though these experiments are not quantified, and no controls are shown. The authors should provide a supplemental figure with "no probe" controls, and/or validate the specificity of the probe via glial knockdown of the target gene (see point 2). Furthermore, these data should be quantified (e.g. number of puncta colocalized with NMJ glia membrans).

      Thank you for requesting further information regarding the YFP smFISH probes. We have validated the specificity and sensitivity of the YFP probe in our recent publication (Titlow et al, 2023, Figure 1 and S1). Specifically, we demonstrated the lack of YFP probe signal from wild-type untagged biosamples and showed colocalization of YFP spots with additional probes targeting the endogenous exon of the transcript. Nevertheless, we will address this comment by adding control image panels of smFISH in wild-type (OrR) neuromuscular junction preparations.

      Titlow et al, 2023 DOI: 10.1083/jcb.202205129

      1. For the most part, the authors only use one RNAi line for their functional studies, and they only show data for one line, even if multiple were used. To rule out potential false negatives, the authors should leverage their FISH probes to show the efficacy of their knockdowns in glia. This would serve the dual purpose of validating the new probes (see point 1).

      Thank you for the suggestion. This point was also raised by [Reviewer #2 - Major point 3]. Please see above for our detailed response.

      1. In Figure 5 E, given the severe reduction in size in the stimulated Pdi KD animals, the authors should show images of the unstimulated nerve as well. Do the nerve terminals actually shrink in size in these animals following stimulation, rather than expand? The NMJ looks substantially smaller than a normal L3 NMJ, though their quantification of neurite size in F suggests they're normal until stimulation.

      We share the same interpretation of the data with the reviewer that the neurite area is reduced post-potassium stimulation in pdi knockdown animals. We will follow the reviewer’s suggestion and add an image showing unstimulated neuromuscular junctions.

      Minor points:

      1. The authors claim that there is an enrichment of ASD-related genes in their final list of ~1400 genes that are enriched in glial processes. It is well-appreciated that synaptically-localized mRNAs are generally linked to ASDs. Can the authors comment on whether the transcripts localized to glial processes are even more linked to ASDs and neurological disorders than transcripts known to be localized to neuronal processes?

      This is an interesting point. To address the comment, we will add a comparison of the degree of enrichment of ASD-related genes in neurite vs. glial protrusions in the revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      1. The use of blue/green or blue/green/magenta is difficult to resolve in some places. Swapping blue for cyan would greatly aid in visualizing their data.

      This comment is much appreciated. We have swapped blue for cyan in Figures 4 and S4. We have also changed Figure S1 to increase contrast and visibility as per reviewer’s comment.

      1. Make the colouring/formatting of the tables more consistent, its distracting when its constantly changing (also there is no need for a blue background.. just use a basic white table).

      This comment is much appreciated. We have applied a consistent colour palette to the Tables without background colourings and made the formatting uniform.

      Reviewer #2

      1. Introduction: 'Asymmetric mRNA localization is likely to be as important in glia, as it is in neurons,...'. Remove commas

      Thank you for pointing this mistake out. We have made the corresponding edits.

      Reviewer #3

      1. RNA localization in oligodendrocytes has been well studied and characterized. The authors should cite and discuss those papers (PMID: 18442491; PMID: 9281585).

      We thank the reviewer for this useful suggestion. We have added these references to the paper.

      Reviewer #4

      1. In Figure 5D, the authors should include a label to indicate that these images are from an unstimulated condition.

      We thank the reviewer for pointing this out. We have added the label as requested.

      1. The authors are missing a number of key citations for studies that have explored the functional significance of mRNA trafficking in glia, and those that have validated activity-dependent translation:

      - https://pubmed.ncbi.nlm.nih.gov/18490510/

      -https://pubmed.ncbi.nlm.nih.gov/7691830/

      -https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001053

      -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7450274/

      -https://pubmed.ncbi.nlm.nih.gov/36261025**_/

      _**

      We thank the reviewer for the comment. We have added these references to the text.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

      Major issues:

      • page 7, bottom: The authors state that measuring nuclear height gives an indication of confinement and force balance. But, if the nuclear mechanical properties have changed, then the nuclear height could change without any change in contractility. So, the authors would need to also verify that the level of contractility hasn't changed and that the mechanical properties haven't changed to really confirm that the cell height is a good measure of confinement. The level of contractility can be assessed by staining for pMLC. The nuclear mechanical properties may have been measured by others.
      • In general, are the changes in contractility resulting from drug treatments sufficiently large to deform the nucleus? Can the authors show a time course of nuclear height in response to a treatment for WT for example? This would allow to link contractility to nuclear height.
      • Page 9: The authors do not find any change in nuclear shape. Can they measure shape pre/post treatment on the same cells? It could be that the effect is lost in variability unless you do paired measurements?
      • Page 11: the authors find nuclear ruptures unchanged in LMA -/- even when there is no contractility. They then state: "We hypothesized that LMNA-/- nuclei do not show bleb-based behaviors because this perturbation cannot, due to reported disrupted nuclear-actin connections". I do not understand this sentence.
      • To characterise actin contractility better, it would be good to present images of the actin cables in each condition and pre/post treatments. This would allow to visually assess whether the morphology of the F-actin cytoskeleton has changed. This is one of the main topics of the study and as such it should be examined.
      • On all bar charts, the authors should indicate: the number of independent experiments, the number of cells examined.
      • I find the diagrams on Fig 1A, 2A etc do not help to illustrate what the authors think is happening. Can they redraw them in a more informative way?
      • The abstract, introduction, and discussion are overly long and lack focus. These should be rewritten succinctly.

      Minor issues:

      • page 4: inhibitors of Rho-kinase will also modulate actin polymerisation indirectly through the action on Lim-kinase and cofilin.
      • page 5, second paragraph: the authors should state that they are measuring the frequency of ruptures. At first, I thought this might be a mechanical strain.
      • Page 7: In general, it may be useful to discuss the temporal evolution of the c/n and the circularity side by side. The change in circularity over time could be an indicator of mechanical strain, while the c/n would report on any transient loss of integrity of the nuclear membrane.
      • Fig 1B: it would be nice to present the time course of the c/n as well.
      • Fig S1: it might be interesting to characterise the dynamics/amplitude of the c/n for the different conditions. There doesn't appear to be any difference between the nuclear blebbing rupture and the non blebbing rupture. This suggests that the two phenomena (nuclear blebbing and nuclear rupture) are independent: i.e. rupture is not causally linked to blebbing.

      Significance

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses

      By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments

      1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      (1) According to Figure 1A (according actually to Meyer et al., 2012, Prufer et al., 2017, and Prüfer et al., 2013), the phylogenetic distance from modern humans to Denisovan is shorter than the distance to Altai Neanderthal. However, also according to these studies, the branch of Denisovan is more remote to modern humans than Altai Neanderthal. Thus, it is not unreasonable to find that 2514 and 1256 DBSs have distances > 0.034 in genes in Denisovans and Altai Neanderthals, respectively. Probably, both the phylogenetic distances and DBS distances depend considerably on the sampled genomes of Altai and Denisovan who lived on the earth for quite long. When new samples are obtained, these distances may be somewhat changed.

      (2) Regarding “they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3”, the second type of distances were discussed in section 3, and the distances computed in the first way were not further analyzed because “This defect may be caused by that the human ancestor was built using six primates without archaic humans”.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      Indeed, the TFs-TFBSs and lncRNAs-DBSs relationships are comparable, and which one contains more QTLs is an interesting question. In this sense, it is reasonable to use TFBSs as the control. However, for three reasons, we did not perform the comparison and use TFBSs as the control. First, most TFBSs are predicted by varied methods, making us concern the reliability of comparing two sets of predictions. Second, most QTLs in DBSs are mQTLs but most QTLs in TFBSs are eQTLs. Third, probably a greater portion of TFBSs than DBSs are not in promoters, and the time consumption of LongTarget made us unable to predict DBSs truly genome-wide. Nevertheless, this is an interesting question deserving further exploring.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

      Multiple sugar metabolism-related pathways, including “glucose homeostasis” and “glucose metabolic process”, are found to be enriched only in Altai Neanderthal but not in chimpanzees (Figure 2). Indeed, HS lncRNAs are across a much longer time frame than the transition to agriculture. However, given that apes and monkeys know picking the ripe, sugar-rich fruits at the right time and place, we conjecture that archaic humans as hunter-gatherer could effectively explore natural sugars.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      1) Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      Length is an important metric of DBS, but it has a defect – a triplex of 100 bp may have 50% or 70% of nucleotides bound; in the two situations, the binding affinity of DBD and DBS is very different.

      2) There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      More details are described in the citation Wen et al. 2022. We will put the sites into Supplementary Tables in the revised version.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) If, say, three transcripts of a gene share the same promoter region (i.e., they have the same TSS) but differ only in 3’UTR, the promoter region was used to predict DBSs just for once. Otherwise, if the three transcripts have different TSS, the three promoter regions were used to predict DBSs.

      (2) A gene may have many DBSs if it has many transcripts, or few ones if it has just a few transcripts. We did not correct for this uneven distribution of transcripts, because our GTEx analysis was on the transcript level; it is well recognized that transcripts of the same gene can be expressed in different tissues.

      (3) We randomly sampled a pair of non-HS lncRNA and a transcript for 10000 times (i.e., 10000 pairs). It is a point that multiple draws of 40 non-HS lncRNAs should be made to make the statistics more robust.

      3) Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      The over-representation analysis using g:Profiler was performed taking the whole genome as the background. Analyzing more DBSs (especially weak DBSs) would generate more results, but the results could be less reliable. Thus, there is a trade-off between analyzing fewer DBSs with relatively high reliability and analyzing more DBSs with relatively low reliability. Inevitably, the handling of this trade-off is somewhat subjective, and to carefully compare the two classes of DBSs per can be an independent question. Although weak DBSs were not systematically analyzed, the results from the strong DBSs undoubtedly suggest that HS lncRNAs have contributed greatly to human evolution.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We examined Tajima’s D in DBSs (Supplementary Figure 9) and in HS lncRNA genes (Supplementary Figure 18). In both cases, we compared the Tajima’s D values with the genome-wide background.

      4) There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We used the same workflow (and the same cutoff 0.034) to analyze Vindija and Altai Neanderthal and Denisovan. If a smaller cutoff was used, one would see more Vindija genes. The question again is that there is a trade-off. Analyzing epigenome and epigenetic regulation in archaic genomes is an interesting direction, and much more studies are needed before more reasonably setting related parameters and cutoffs.

      5) Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

    2. Reviewer #1 (Public Review):

      Summary<br /> While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses<br /> By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments<br /> 1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

    1. Author Response

      The primary concern of Reviewer 1 is that Ne might affect gBGC and hence GC, and this might act as a confounding effect. The reviewer suggests that we should investigate how gBGC (with GC presumably as its proxy) might affect CAIS, and to what extent any relationship here could explain the relationship between CAIS and body mass. We believe that we have already dealt with this both in Supplementary Figure S5A (where we regret having inserted the wrong figure panel, a mistake we will correct), and its PIC-corrected counterpart in S5B. These two panels show (or will show) that CAIS is not correlated with GC. Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that mutation biases, including but not limited to the strength of gBGC, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, our CAIS thus corrects for mutation bias, in order to isolate the effects of selection.

      Reviewer 1 also suggests that we examine the relationship between gene expression and GC corrected RSCU, as we would expect codon adaptation to be stronger in more highly expressed genes, as was previously shown in the non-GC corrected CAI metric (Sharp et al 1987). Correlations with gene expression are outside the scope of the current work, which is focused on producing a single value of codon adaptation per species. It is indeed possible that our general approach could be useful in future work investigating differences among genes.

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences per species. Our approach thus remains appropriate even for scenarios e.g. where different cell types, different environmental conditions, and/or different genes have different codon preferences (Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne. Through use of a more sensitive methodology, we believe we have expanded our ability to detect codon adaptation into animals of somewhat higher Ne than in previous work.

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. In our revisions, we will more clearly acknowledge that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. We believe our approach worked despite this because the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene.

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. we believe the complications documented by Gingold et al. 2014 cited above are pertinent, but incorporating them into simulations would require a complex set of assumptions.

      In response to the final comment of reviewer 2, the reason that we hard-coded genome-wide %GC values is that we took them from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan conducted in that work, of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The code used in the current work to calculate the intergenic %GC, as well as that used to calculate amino acid frequencies, is located at https://github.com/MaselLab/Codon-Adaptation-Index-of-Species. We agree that more user-friendly tools would be useful, but producing robust tools falls outside the scope of the current manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      “Liu et al present a very interesting manuscript investigating whether there are distinct mechanisms of learning in children with ASD. What they found was that children with ASD showed comparable learning to typically developing children, but that there was a difference in learning strategy, with less plasticity and more stable learning representations in children with ASD. In other words, children with ASD showed similar learning performance to typically developing children but were more likely to use different learning rules to get there. Interestingly greater fMRI-measured brain plasticity was associated with learning gains in typically developing children, whereas more stable (less plasticity) neural patterns were associated with learning gains in autistic children. This was mediated by insistence on sameness (from the RRIB) in the ASD group. This is a good paper, well reasoned and with strong methods.”

      We appreciate the positive comments from the reviewer.

      1.1) “The biggest issue is related to subject numbers...With n=35 it is only possible to make a generalized statement about autism.”

      Thank you for this comment. Although the sample size in the current study was modest, we would like to note that acquiring high-quality behavioral and brain imaging data at multiple time points a is a challenge in children with ASD. The current training study with unique longitudinal behavioral and brain imaging data provides an unprecedented opportunity to investigate the potentially atypical training-induced learning and brain plasticity in children with ASD relative to TD peers. To our knowledge, the present longitudinal sample is largest of its kind in studies of neurocognitive function in children with ASD. We have acknowledged these points in the revised Discussion section (Page 15), including the following statement:

      “First, larger sample sizes are required to further characterize heterogeneous patterns of atypical learning and whether the findings can be generalized to a broader ASD population.” (Page 15)

      1.2) “[Another] issue is related to [heterogeneity of autism-related findings]. For example, take the following statement from the results: "while most TD children used the memory-based strategy most frequently following training, nearly half of the children with ASD used rule-based strategies most frequently for trained problems." Is this the heterogeneity of autism at play, or the noisiness of the task and measures?

      We hypothesize that group differences in changes in strategy use following training are due to atypical learning style or high level of inter-individual differences, i.e., greater heterogeneity, in autism, rather than noisiness of the measures. This hypothesis is based on the fact that we used the same tasks before and after training and a standardized training protocol across the two groups, which (i) allowed us to systemically examine atypical learning of these tasks in children with ASD compared to TD children and (ii) provided ecologically valid measures. This design minimized potential differences in measurement error between the two groups. We have clarified these points in the revised Introduction section (Page 4), including the following statement: “Crucially, we employed identical tasks before and after training and a standardized training protocol across the two groups. This approach enabled systemic analysis of learning in children with ASD relative to TD children.” (Page 4)

      1.3) “Conceptually, is it realistic to expect a unitary learning strategy in all of autism?

      We agree with the sentiment expressed by the reviewer, and indeed this notion led to the hypothesis that our study was to test. We hypothesized that children with ASD would not show a unitary learning strategy at this stage of development examined. Our results reveal that a disproportionate number of children with ASD use a rule-based strategy, reflecting atypical learning styles.

      1.4) “Lastly, the task itself can only be solved in a subset of autistic children and therefore presents a limited view of the condition.”

      We thank the reviewer for this important point and agree that additional studies tailored to more severely affected children with ASD are required for a more comprehensive characterization of learning in children with autism.

      Reviewer #2 (Public Review):

      “Overall, the authors sought to determine whether children with autism spectrum disorder (ASD) or typical development (TD) would both benefit from a 5-day intervention designed to improve numerical problem-solving. They were particularly interested in how learning across training would be associated with pre-post intervention changes in brain activity, measured with functional magnetic resonance imaging (fMRI). They also examined whether brain-behavior associations driven by learning might be moderated by a classic cognitive inflexibility symptom in ASD ("insistence on sameness"). The study is reasonably well-powered, uses a 5-day evidence-based intervention, and uses a multivariate correlation-based metric for examining neuroplastic changes that may be less susceptible to random variation over time than conventional mass univariate fMRI analyses. The study did have some weaknesses that draw into question the specific claims made based on the present set of analyses, as well as limit the generalizability of the findings to the significant proportion of individuals with ASD that are outside of the normative range of general cognitive functioning. The study also found minimal evidence for transfer between trained and untrained mathematical problems, limiting enthusiasm for the intervention itself. The majority of the authors' claims were rooted in the data and the team was generally able to accomplish their aims. I am sensitive to the fact that one of the main limitations I noted would have significant ethical implications-i.e. NOT offering potentially beneficial numerical training to children randomized to a sham or control group. I think the authors' work will represent a welcome addition to a growing corpus of studies showing similar neuropsychological test performance across several cognitive domains (e.g. learning, memory, proactive cognitive control, etc.) in ASD and TD. However, these relatively preserved cognitive functions still appear to be implemented by unique neural systems and demonstrate unique correlations to clinical symptoms in youth with ASD relative to TD, which may have implications for both educational and clinical contexts.

      We thank the reviewer for the positive feedback and helpful suggestions.

      Reviewer #3 (Public Review):

      “Liu and colleagues examined learning and brain plasticity in neurotypical children and children with autism. The main findings include autistic children relying more on rule-based versus memory-based learning strategies, altered associations between learning gains and brain plasticity in children with autism, and insistence on sameness as a moderator between brain plasticity and learning in autism. Although the sample size is limited in this study, the findings provide a significant contribution to the field. The major strengths of this paper include an extensive pre and post training protocol, a detailed methods section, rationale behind the study, investigation of a potential moderator of learning gains and neural plasticity, and investigation of "neural plasticity" in association to learning in autism. Weaknesses of the study include a small sample size, and some missing information/analyses from the study. The authors laid out four clear aims of the study. They investigated these aims and the analytic approaches were appropriate. The paper included significant findings toward better understanding the mechanisms underlying differences in learning strategies and behavior in children diagnosed with autism spectrum disorder. This holds significant value in educational and classroom settings. Further, the investigation of a potential moderator of learning gains and neural plasticity provides a potential mechanism to improve the relationship. Overall, this is a significant contribution to the field. The autism literature is limited in understanding differences in learning styles and the underlying neural mechanisms of these differences.”

      We thank the reviewer for the positive comments and detailed suggestions.

    1. While it may be obvious that there are specific technologies for those with different abilities that help them engage with their learning, never forget that how we choose existing learning technologies is probably the first step in ensuring access to our learners, and potentially presenting barriers to their learning. Learning Management Systems (LMSs) like Moodle, Canvas, Blackboard Learn, D2L Brightspace, Google Classroom and other technologies should have accessibility features built in as well – if they don’t, these foundational systems will present barriers for our learners. If we’re choosing to use ad-hoc or additional technologies that sit outside what our institutions have set up for us (e.g., Kahoot, Canva, etc.) it’s up to us to assess what technologies we use for accessibility.

      The key takeaway I think

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements

      We were naturally pleased to read the enthusiasm coming from both reviewers. Both mentioned that an extension to experimentation in cells would increase the impact of the study, even though both recognize that the biophysical and biochemical experiments constitute a study that is significant and interesting to a broad readership.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript by Bryan et al., describes the use of Hydrogen/Deuterium-exchange Mass Spectrometry (HXMS) as a powerful tool to identify key amino acid residues and associated interactions driving liquid-liquid demixing. They have particularly focused on the Chromosomal Passenger Complex (CPC), an important regulator of chromosome segregation, which has recently been shown to undergo liquid-liquid demixing in vitro. Their work presented here allowed them to identify a few key electrostatic interactions as molecular determinants driving the liquid-liquid demixing of the CPC. Their work also shows that crystal packing information of protein molecules, where available, can provide valuable insight into likely factors driving liquid-liquid demixing.

      Major comments:

      [#1] A previous study by Trivedi et al., NCB 2019 identified an unstructured region in Borealin (aa residues 139-160) as the main region driving the phase separation of CPC. Interestingly, this region only shows a moderate reduction in HX upon liquid-liquid demixing. But no experiments or discussions related to this observation are presented in the current version of the manuscript.

      In the Trivedi et al. paper, the authors were careful to state that the region of borealin between 139-160 contributed to phase separation, but there was clearly a remaining propensity to phase separate in vitro in the mutant. Thus, it is fully expected that there should be other regions in the complex that contribute to phase separation. It was satisfying that this region was independently identified in the hydrogen-deuterium exchange experiments and we suggest that a “moderate” reduction is consistent with a protein condensate having liquid properties. Since this region was already characterized we have focused our work in this paper to the new region identified by the hydrogen-deuterium exchange experiments.

      [#2] In the absence of cellular data on if and how these mutations (within the triple-helical bundle region) affect CPC's ability to phase separate in cells, the implication of this work is very limited - One can't say for sure these are interactions driving phase separation of CPC in a cellular environment. In the absence of any cellular data with the mutants described here, much of the discussion on the possible roles of CPC phase separation in cells does not appear relevant to this manuscript. I would suggest that the authors focus mainly on highlighting the power of using HXMS as a tool to characterise the molecular determinants of liquid-liquid demixing at a relatively high resolution.

      We have now added cellular data in the form of one of the key experiments used to explore CPC liquid-liquid demixing utilizing the Cry2 optogenetic system for inducible dimerization. The results of testing WT Borealin versus the mutant we identified is defective in droplet formation are shown in the all new Fig. 6. Some relation of our overall findings, encompassing observations made with purified components and now in cells, to the cellular function of the CPC is pertinent. In light of the reviewer comments, we have also reduced this aspect in the discussion (see the substantial edits on pg. 12).

      Minor comments:

      [#3] The authors should ensure that the introduction cites relevant literature thoroughly. For example, where the potential role of Borealin residues 139-160 in conferring phase separation properties to the CPC is mentioned, the authors failed to cite Abad et al., 2019, which showed the contribution of the same Borealin region in conferring nucleosome binding ability to the CPC.

      We have made this particular change on pg. 4 and also have gone through to ensure we are appropriately citing relevant literature.

      Reviewer #1 (Significance (Required)):

      This is a highly relevant and significant work, particularly considering the rapidly growing list of examples for Phase separation of proteins/protein assemblies and their potential biological roles (in spite of ongoing debates in the field about the cellular relevance of several phase separation claims). The data presented in this manuscript are solid and convincingly establish HXMS as a useful tool to characterise molecular interactions driving liquid-liquid demixing. Considering its applicability to characterise wide-ranging protein assemblies implicated in phase separation, this work will be of interest to a broad readership.

      We thank the reviewer for the strong praise of the significance of our study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC. Comments: 1) "The three non-catalytic subunits of the CPC (INCENP1-58, Borealin, and Survivin) form soluble homotrimers that have a propensity to undergo liquid-liquid phase separation.8 " Do the authors mean the hetero-trimeric CPC?

      Yes, we meant heterotrimers. It is now corrected.

      2) For better clarity, the authors can indicate the residue numbers of each of the components INCENP, Survivin, and Borealin in the CPC trimeric helix-bundle crystallographic structure in Fig 1.

      These are included on the revised Figure 1A.

      3) "In the condition we identified, 90% +/- 5% of the ISB protein was found within the rapidly sedimenting droplet population (Fig. 1C)." The authors should include the time-point corresponding to the gel shown in Fig 1C.

      This information is now directly labeled in Fig. 1C.

      4) Prior to the HXMS experiments on the phase-separated ISB protein complex, were the samples subjected to sedimentation to separate the dispersed from the condensed droplet phase? Since several time points after formation of phase-separated ISB complex have been characterized to compare and contrast between the dispersed and the droplet phase, the authors can consider performing a time-dependent sedimentation assay to ascertain the fraction of the ISB complex in the droplet phase.

      The HXMS experiments were not performed on sedimented samples, so this complication in our HX workflow is not necessary. We note that the sedimentation that we include in our study (Figs. 1C, 5E, and S6), involves centrifugation for 10 minutes, and that length of time presents a substantial design challenge to our HX experimentation. We considered it at the outset of our study, but, in the end, our study was facilitated by our finding early on that this separation step was unnecessary. Further, we note that we report statistically significant differences at the earliest HX timepoints in the areas prominently protected from HX upon droplet formation (10 and 100 s; see Fig. 1C for an example). Indeed, we do not observe broadening of our HXMS spectra (examples shown for all timepoints, Fig. 2B,F) that would be expected if there were a large degree of mixed states (i.e. a large population of molecules in the free protein state and a large population of molecules in the droplet state) each having different HXMS rates. One can imagine that this sort of envelope broadening behavior (“EX1-like”) could be observed in other samples where there are multiple substantially populated states of a protein present at a particular timepoint, but this is not what we observe in the experiments we performed in this study.

      5) "At the 100 s timepoint, the most prominent differences between the soluble and droplet state were located within the three-helix bundle of the ISB, with long stretches in two subunits (INCENP and Borealin) and a small region at the N-terminal portion of the impacted a-helix in Survivin (Fig. 1F)" According to Fig 1F, at the 100 s time-point, there is also another small region in Survivin (approximately residues 12-20) that exhibits slower exchange rates in the droplet state. Can the authors comment on whether this region undergoes any conformational change or if it exhibits homotypic interactions retarding the hydrogen/deuterium exchange rates in the droplet phase?

      Our general approach in the Black lab over the past decade-plus of HXMS has been to restrict our conclusions whenever practical to do so to the consensus behavior. This permits multiple partially overlapping peptides to be used to generate confidence in the changes that drive our conclusions. The reviewer carefully recognizes the behavior of a single peptide (in 2 different charge states) that might have actual changes relative to some of the longer peptides that it partially overlaps with, and smaller changes can yield larger percentage changes on small peptides. We have chosen to not include this single peptide in the text describing our main conclusions from the work to be consistent with our longstanding strategy for rigorous interpretation of HXMS data. Our conclusion is that this region of not substantially changed upon droplet formation.

      6) The authors mention that: "By the latest timepoint, 3000 s, there was some diminution in the number of droplets which may indicate the start of a transition of the droplets to a more solid state (i.e., gel-like)." As a result of this time points beyond 3000 s have not been used for comparing Hydrogen/Deuterium exchange rates in the condensed droplet phase with the soluble state. Can the authors comment on what happens to the nature of these specific interactions between the components of the CPC in the 'gel-like state'? A combination of both non-specific weak interactions as well as strong site-specific interactions between macromolecular components has been widely known to contribute towards the formation of several phase-separated compartments. It will be interesting to know the perspective of the authors on what sort of interactions get populated within these compartments to give rise to a more solid gel-like state. At this later time points, do the droplets exhibit reversibility under higher ionic strength conditions? Do the authors have some data to show how the material property of these droplets evolve as a function of time?

      We offered the idea of a transition to a more solid state to the reader because it was a reasonable conclusion, although challenging to prove (something the Stukenberg lab is actively working on, though, see our response to point #9, below). The vast majority of our conclusions in the paper, and essentially all of what we emphasize are the important ones, are based on earlier timepoints where this is not an issue. Thus, we find an extended study of the late-developing features in our droplets something more appropriate for separate studies outside the scope of the current one.

      7) "Examination of the entire time course shows that during intermediate levels of HX (i.e., between 100-1000 s), this region takes about three times as long to undergo the same amount of exchange when the ISB is in the droplet state relative to when it's in the free protein state (Figs. 2B, C and Supplemental Fig. 2). Upon droplet formation, HX protection within Borealin is primarily located in the interacting a-helix and is less pronounced at any given peptide when compared to INCENP peptides (Fig. 2E). Nonetheless, similar to INCENP peptides, it still takes about twice as long to achieve the same level of deuteration for this region of Borealin in the droplet state as compared to the free state." How do the hydrogen/deuterium exchange rates and extent of deuteration in the N-terminal part (residues 98-142) of the Survivin polypeptide chain, constituting the three-helix bundle core, evolve as a function of time? Also, how do the exchange rates for peptides in this region compare with those of the other protein subunits Borealin and INCENP and what inference can be drawn from these differences?

      The peptides from a.a. 98-142 of Survivin exhibit HX protection through the timecourse (and before and after droplet formation) consistent with a folded a-helix (and comparable to the overall HX behavior of the other helices in the 3-helix bundle of the ISB)(Fig. S2). There is subtly slower HX in the droplet state for this region at later timepoints for this portion of Survivin (Fig. S4), and this is explicitly highlighted in the Results section on pg. 6.

      8) The authors mention that mutating either all the glutamate residues or combinations of these residues on the acidic patch on the INCENP subunit, to positively charged residues, causes a decrease in the propensity of phase separation, as formation of salt bridges with Borealin subunit from adjacent hetero-trimeric complexes appears to be the major driving force for phase separation. Can the authors elaborate on how the reduction in the phase separation propensity of these salt-bridge inhibiting mutants might be directly affecting the subsequent localization of the CPC to the inner centromeres? Can the authors supplement their existing in vitro data with further in vivo characterization of CPC recruitment or localization to the centromeres, for each of the constructs exhibiting reduced propensity of phase separation?

      As we state in the introduction, the recruitment to centromeres requires established ‘conventional’ targeting via the specific histone marks to which we refer. We also cite the correlations demonstrated between prior mutations in Borealin (impacting aa 139-160) that both disrupt phase separation in vitro and reduce CPC levels at the centromere. In our revision, we have added what we feel are the most critical cell-based experiments to relate to our HX studies in the new Fig. 6. We are preparing for future studies to study mutants arising from our HX studies, and our plans are to pursue gene replacement approaches that will rigorously test the impact on the mitotic function of the CPC. In the process of these future studies, the impact on localization will be measured, too. As others in the field are investigating the correlations between observations made with purified components and those made in the cell, and where there are nuances at play in how the actual experiments are conducted, we are certain our cell-based studies will extend far beyond the timeframe appropriate for our HX-focused study. Rigorous cell-based studies of mitotic functions are what is needed, however, and we have made our plans with that in mind.

      9) It might be really interesting for the authors to look at the recent preprint from Hedtfeld et al. 2023 Molecular Cell, (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4472737). In this preprint they have recombinantly purified a stoichiometric trimer (referred to as CPC-TARGWT) comprising full length survivin, borealin, and a 1-350 residue fragment of INCENP (instead of 1-58 used in this study) and have tried to assess if any correlation exists between the in-vitro phase behaviour of CPC-TARGWT mutants and their corresponding recruitment to the inner centromere, to form a phase separated compartment. Targeting residues in the BIR domain of Survivin involved in interactions with the N-terminus of the Histone H3, Shugosin 1 or in the recognition of H3T3phos, and substituting them with Alanine or completely deleting C-terminal domain of Borealin (a region implicated in CPC dimerization and centromere recruitment), was found to result in poor centromere localization, although the in vitro phase separation properties of these constructs were found to be indistinguishable, suggesting no evident correlation between the two phenomena. Thus it might be a useful piece of data to correlate the phase separation propensities of the ISB complex variants used in this current study with the extents of their in vivo recruitment to the inner centromere. This maybe beyond the scope of the paper, but it would be good to comment on this.

      For the correlation studies, please refer to our response to point #8, above. From our reading of the June 2023 preprint that the reviewer mentions, the main concern raised by the authors is questioning whether the region first identified in the Trivedi et al paper in Borealin (aa 139-160) has a role in phase separation. As the reviewer noted, Hedtfeld et al report using a complex that includes more of the INCENP protein than used in the Trivedi et al study, complicating the direct comparison between studies. Using the data in figure 5E of the Hedtfeld et al preprint, the authors suggest that the condensate formation of their version of the Borealin mutant D139-160 in vitro complex has similar phase separation properties as the wild type. However, we note that in our inspection of these data we see numerous differences. The mutant forms rounder, and larger condensates than WT and have reduced concentration of protein (less bright intensity). Finally only the WT protein has a “grape bunch” morphology. We note that unpublished data in the Stukenberg lab show these same differences can represent a defect in liquid demixing properties of a version of the purified CPC. While it is intuitive that larger condensates represent more phase separation, the unpublished data mentioned above suggests the opposite is true for the CPC. In particular, the data from the Stukenberg lab suggest the size of a droplet is mostly governed by the amount of droplet fusion in the first minutes after dilution and thus is limited by relatively rapid hardening of the complex. We note that in the course of discussions with the corresponding author of the preprint mentioned by the reviewer we did apprise them of the unpublished observations mentioned, above, in case they saw fit to include in their ongoing studies what would seem to be critical measurements (e.g. measuring circularity, droplet size, droplet intensity, and FRAP) to assess our suspicion that their construct contains a portion of INCENP that can accelerate condensate formation. If true, the Hedtfeld et al data are fully consistent with the Borealin mutant D139-160 having a significant condensate formation potential than the WT protein.

      10[A]) "Our data also provide an important clue about the previously identified region on Borealin that is required for liquid-demixing in vitro and proper CPC assembly in cells 8. Specifically, our data (Fig. 1F, Supplementary Figs. 2, 4A) suggest this region of Borealin adopts secondary structure that undergoes additional HX protection in the liquid-liquid demixed state" This data fits perfectly with previous studies from Trivedi et al. (2019), which states that deletion of the Borealin 139-160 fragment obliterates its phase separation in vitro and also reduces the accumulation of CPC at the centromere. On the contrary, in the recent preprint from Hedtfeld et al. 2023 Molecular Cell, they have shown that the phase separation behaviour of their reconstituted CPC-TARGWT harboring the Borealin 139-160 deletion mutant was found to be indistinguishable from the WT. Can the authors comment on what might be the reason for this difference? Is it possible that this central Borealin region is involved in interactions with the additional fragment of INCENP subunit used in the helical bundle reconstitution, or with other centromere component proteins, whereby the deletion of region is causing inefficient recruitment to the inner centromere? This can be elaborated in the discussion section of the manuscript.

      This is discussed in the response to #9, above. Through this format (the Review Commons procedure for public posting of author responses before submission of the study to a journal), our comments herein will be made public for those with the most interest in comparing our data to what is has been posted on preprint servers. We think that is the most appropriate for now, with more to surely come when the aforementioned results from the Stukenberg lab are posted/published and, hopefully when there is more information about the nature of the droplets reported in the Hedtfeld et al., study.

      10 [B]) It is also well known that in addition to these electrostatic interactions, the core of the ISB helical bundle is formed by an extensive network of hydrophobic interactions. Have the authors ever looked into how perturbing any of these intra-trimeric complex hydrophobic interactions affect their ability to phase separate and perform their subsequent function?

      We think there is some confusion, here. The electrostatics we focus on are between heterotrimers rather than within them. We certainly would predict that disrupting the hydrophobic surface that generates a stable heterotrimer would, in turn, disrupt individual heterotrimers. Our study assumes a stable heterotrimer as a starting point, so we view this type of perturbation as unrelated to our conclusions.

      11) The phase separated CPC compartment is known to enrich several other inner centromere proteins such as the Histone H3, Sgo1, the histone H3T3phos, among others. Have the authors tried to increase the complexity of the reconstituted CPC scaffold by incorporating more components to look into whether that changes any of the interaction interfaces between the ISB trimeric complexes within the condensed phase? Can this CPC compartment be reconstituted using a bottom-up approach?

      We are glad that our studies with a reductionist biochemical reconstitution approach have inspired the questions that require increased complexity. They are now warranted based on the advance we have made in the present study, and hopefully will form the basis for future, separate studies.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation.

      Reviewer #2 (Significance (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation

      We thank the reviewer for the positive comments on the significance of our study.

    1. Residents crossing between islands during a rising tide on Majuro, Marshall Islands, in 2015. Majuro is home to former residents of Bikini Atoll who were relocated in the 1940s.Credit...Josh Haner/The New York TimesBy Pete McKenzieMay 3, 2023The golden sand of Bikini Atoll is laced with plutonium. The freshwater is poisoned with strontium. The coconut crabs contain hazardous levels of cesium.In the 1940s and ’50s, the U.S. government used this coral reef, in the Pacific nation of the Marshall Islands, for testing nuclear weapons. Radioactive residue has left Bikini uninhabitable to this day, forcing those whose families once lived on the atoll into exile on a handful of other Marshallese islands and in the United States.Recognizing the damage its testing caused, the U.S. government established two trust funds in the 1980s to help pay for Bikinians’ health care, build housing and cover living costs. In 2017, after a campaign by Bikini leaders for greater autonomy, the Trump administration announced that the government would lift withdrawal limits and stop auditing the main fund, then worth $59 million.Six years later, only about $100,000 remains, and the Bikini community is in crisis.Anderson Jibas, the mayor of the council that oversees the displaced Bikini community, made a series of questionable purchases on Bikini’s behalf, including of a large plot of land in Hawaii and a fleet of new vehicles. He has defended some of the purchases as investments against climate change, as necessary to support isolated Bikinians and as attempts at revenue-generating projects.AdvertisementSKIP ADVERTISEMENTMr. Jibas has also acknowledged using trust fund money for personal expenses and has been accused by a top Marshall Islands official of receiving kickbacks from an investment manager — a charge Mr. Jibas denies.ImageA U.S nuclear bomb test at Bikini Atoll in 1946.Credit...Universal Images Group, via Getty ImagesWith the fund virtually depleted, the council’s roughly 350 employees are no longer being paid. Monthly payments of about $150 each to the community’s 6,800 members — a vital lifeline that helped cover food and rent among a population with high rates of poverty — have ceased.The emergency highlights the lasting consequences of decades of U.S. nuclear testing in the Pacific, including lingering questions about the American commitment to address that legacy, an undertaking made more difficult by pervasive fraud and mismanagement in the region.“It’s a disaster,” said Tommy Jibok, a former member of the Bikini council who challenged Mr. Jibas in an election in 2019. “They told us we would be sitting and sleeping on money. Look what is happening now. We’re sleeping on nothing.”AdvertisementSKIP ADVERTISEMENTIn 1946, the United States relocated the 167 inhabitants of Bikini to clear the way for nuclear tests that it said would “end all world wars.” It then left them virtually alone on a small, desolate island, where many nearly starved. In 1948, the islanders were moved again.Over 12 years, the United States tested 23 nuclear bombs in Bikini. In 1968, President Lyndon B. Johnson announced that the Bikinians would return home. But after scientists found that radiation levels remained dangerously high, the United States in 1978 evacuated the almost 150 people who had chosen to go back. The Marshall Islands gained independence from the United States the next year.In 1982, the American government established a $25 million resettlement fund to clean up Bikini and support its people. In 1987, it created a second fund to provide annual payments directly to Bikinians. A year later, it contributed an additional $90 million to the resettlement fund. American officials administered the money and could veto withdrawals.Bikini representatives argued that the resettlement fund contained too little money to remedy the atoll’s radioactivity. They used the funds instead to support the exiled Bikinians.Editors’ PicksWhy You Can’t Stop Reading About Sofia Vergara’s SplitWould You Drink Wastewater? What if It Was Beer?Does My Fiancé Love Me, or Does He Just Want U.S. Citizenship?AdvertisementSKIP ADVERTISEMENTImageMike Pompeo, then the secretary of state, visiting in the Marshall Islands in 2019. With him is Hilda Heine, the Marshallese president from 2016 to 2020.Credit...Jonathan Ernst/Agence France-Presse — Getty ImagesBut the Bikini leaders were frustrated by American officials’ refusal to release more than a few million dollars each year. The struggle culminated in 2016 with the election of Mr. Jibas, who promised to take control of the resettlement fund. (The other fund is overseen by independent trustees.)AdvertisementSKIP ADVERTISEMENTDuring a 2017 congressional hearing, Mr. Jibas explained that Bikinians “​​know far better than the intermediaries or distant agencies of the United States what is needed to make the lives of the displaced population more bearable.”Douglas Domenech, at the time an assistant interior secretary, announced that the Interior Department would relinquish control of the resettlement fund to “restore trust and ensure that sovereignty means something.”Mr. Jibok, the former Bikini council member, had a different interpretation: that U.S. officials wanted to “wash their hands clean” of responsibility for Bikinians.Whatever the motivation, the result was a rapid increase in council spending under Mr. Jibas, from $7.6 million in 2016 to $25.7 million in 2018, according to audits from the time. Bank statements provided by Gordon Benjamin, a lawyer for the council, show that the fund, worth $59 million in 2017, was down to just $100,041 in March of this year.AdvertisementSKIP ADVERTISEMENTMany of the council’s purchases were popular, including of a small aircraft and two cargo ships to help supply isolated Bikinians, as well as construction equipment to build protections against rising seas that threaten low-lying Pacific islands because of climate change.But there were also more dubious purchases: $4.8 million for 283 acres of land in Hawaii; $1.3 million for an apartment complex in the Marshall Islands’ capital, Majuro; and multiple new vehicles for the personal use of Bikini council members, according to Mr. Benjamin. Mr. Jibas also introduced an annual $100,000 “representation package” to fund his regular trips to the United States.ImageIsles that form part of Majuro, the Marshall Islands’ capital. One of the purchases made with the resettlement fund was an apartment complex in Majuro.Credit...Josh Haner/The New York TimesMr. Jibas has said he wants to develop housing in Hawaii for rent or sale, but no development has taken place yet. The Majuro apartment complex was purchased as an investment property, but it appears to be losing money so far.Lani Kramer, a Bikinian who previously worked as the council’s city manager and is now challenging Mr. Jibas for the mayoralty, said Mr. Jibas and council members had used public funds for personal spending. “They were bringing receipts for diapers, chewing gum,” Ms. Kramer said. “It was obviously not for the people, it was for their own grocery shopping.”AdvertisementSKIP ADVERTISEMENTThe Marshall Islands’ banking commissioner has also accused Mr. Jibas of accepting $50,000 from a local bank manager who is being prosecuted on suspicion of unlawfully investing Bikini funds and laundering money. The Marshallese auditor general did not respond to requests for comment about the allegations.Starting in 2018, Mr. Jibas refused to disclose council finances to the Marshall Islands’ auditor general, prompting the police to seize council documents in 2021. Late last month, a spokesman for the Interior Department said it had written to bank officials seeking information about the fund and to Mr. Jibas requesting the council’s recent budgets.That request came after Jack Niedenthal, an American expatriate who served as the Marshallese health secretary, wrote to the Interior Department warning about the depleted trust fund and asking the department to intervene. He was subsequently fired for breaching diplomatic protocol by circumventing the Marshallese foreign ministry and the American Embassy.Mr. Jibas acknowledged in an interview that he occasionally used his representation package to buy food and other items for his family, which he said council staff members were aware of and had approved, but he denied taking money from the bank manager.ImageCollecting laundry on Ejit, an isle in Majuro. The money from the resettlement fund is nearly gone, and the Bikini community is in crisis.Credit...Josh Haner/The New York TimesAdvertisementSKIP ADVERTISEMENTMr. Jibas said in the interview that he was trying to access the independently controlled second fund, which now holds $28 million, to sustain council spending.According to Mr. Benjamin, starting in October 2021 the trustees of that fund permitted the council to withdraw roughly $13 million to fund its spending, but reversed their stance earlier this year and halted all payments out of the fund, including the regular living payments to Bikinians, to avoid further depletion. In the interview, Mr. Jibas said he also hoped to tap into new American funding to replenish the main fund.Earlier this year, the Biden administration promised to provide the Marshall Islands $700 million in one-time aid and to continue underwriting much of the government’s budget. Under a treaty, the United States controls the country’s defense policy, which the American government considers crucial to countering China in the region. The aid has not yet been approved, meaning Bikinians’ future remains uncertain.In a statement on behalf of Mr. Jibas, Mr. Benjamin said that the mayor’s critics were not pushing the United States hard enough for more funding.Mr. Jibok, who as a council member opposed Mr. Jibas’s efforts to gain control of the fund, said that the United States had done little to facilitate self-sufficiency in the Bikini community, leaving few financial safeguards in place.“I didn’t think we were ready,” Mr. Jibok said, “because I knew that we didn’t have anything in place to control” mismanagement or fraud.A version of this article appears in print on May 4, 2023, Section A, Page 4 of the New York edition with the headline: Bikini Atoll Leaders Blew Through Millions From U.S.. Order Reprints | Today’s Paper | Subscribe
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Centrioles are small cylindrical structures with roles in cell division, motility, and signaling. Typically, centrioles are highly stable structures which can persist for many cell generations. However, in some cells, such as the female germ line of many species, centrioles are programmed for elimination. This process is essential for maintaining centriole number from one generation to the next in sexually reproducing organisms, yet in nearly all species the molecular mechanisms underlying how centrioles are eliminated is unknown. The current study utilizes the nematode C. elegans to explore how centriole architecture changes during the elimination program in the female germ line. Using a suite of light microscopy techniques, the authors provide a stunning visual perspective of how centrioles are disassembled during oogenesis and show that removal of the central tube component SAS-1, a key regulator of centriole stability, is an early event in elimination. I have no major objections to the work and enthusiastically endorse its publication with the following minor revisions.

      Page 9 line 200: In the pcmd-1 mutant, the authors state that centriolar foci devoid of nuclei are present in rachis, but they do not mention in the text that there are also nuclei that lack centriole foci in early pachytene. This is mentioned in the figure legend, but I felt it was important enough to mention in the text.

      As per the reviewer’s suggestion, we will provide this information in the main text as well.

      Page 9 line 211. The authors found that in the absence of dynein heavy or light chain that centrioles remain associated with the nuclear envelope (rather than moving to the periphery). To me this was striking as dynein depletion in the embryo results in the opposite phenotype with centrioles losing attachment to the nuclear envelope and moving to the cell periphery (Gonczy et al. 1999 JCB 147:135). It might be worth pointing this out somewhere in the manuscript and speculating about the reasons for this difference.

      We will expand the Discussion section to better explain the difference of dynein’s involvement in the oocyte versus the embryo.

      Page 11 line 277: The authors state that elimination timing is not affected by the loss of SPD-5. This is a small but important point. It really is the absence of PCMD-1 and not SPD-5, as SPD-5 is still present in the cell. An alternative would be to say "in the absence of PCM" or "in absence of a pericentriolar accumulation of SAS-5".

      Fully agreed, we will modify the text accordingly.

      Figure 4D: Why does loss of PCMD-1 result in a delay in oocyte maturation as judged by RME-2 accumulation? This is not mentioned in the paper. Is this a general response to a loss of PCM or is this specific to a loss of PCMD-1?

      We realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point further and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution. See also related minor comment 2 of reviewer 2, and major comment 1 of reviewer 3.

      Figure 7 E and F. The authors measure the tubulin and SAS-4 intensity in wild-type and sas-1(t1521) embryos and conclude that microtubules and SAS-4 signals decay faster in the sas-1 mutant than in the control. To me, this is convinceingly the case with microtubules in panel E but I am not so sure this is the case with SAS-4 as shown in panel F. The differences in SAS-4 levels are much smaller between mutant and control. Could the authors provide statistical analysis to show how significant the differences are?

      We will provide the requested statistical analysis (which indeed shows significance).

      Page 15 line 363. I think this sentence should be reworded to: "Finally, we demonstrate that the central tube protein SAS-1 is the first of the factors analyzed here to leave centrioles..."

      In response to this suggestion and to the related comment of reviewer 2 (see below), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      Reviewer #1 (Significance (Required)):

      The work contained in this manuscript represents a fundemental step forward in understanding the process of centriole elimination. The authors have carefully described the stepwise disassembly of the centriole including changes in the architechure during oogenesis. They have identified loss of the centriole stability factor SAS-1, as an early event in the elimination program and have found that in a sas-1 mutant, the centriole disassembles prematurely. They have also shown that loss of SAS-1 is followed by expansion of the centriole and ultimately loss of structural integrity. This work should be of interest to a broad range of scientists including those interested in centrosome dynamics, germ line development, and more generally cell biologists.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript Pierron et al. explore the mechanisms of centriole elimination during oogenesis in C. elegans. Centriole elimination is a common feature of oogenesis in many species, but it is relatively poorly understood and understudied. Here, the authors characterise the kinetics with which several key centriole and centrosome proteins are lost during this process in living worms, and they correlate this with an EM and expansion microscopy (U-Ex-STED) analyses of fixed tissues. They conclude that centriole elimination begins with the loss of SAS-1 from the central region of the centrioles, which correlates with the widening of the structure and the loss of the centriole MTs. A remnant structure containing several core centriole proteins remains, however, and this often ultimately detaches from the nuclear envelope and moves towards the plasma membrane in a MT-motor-dependent fashion before it dissipates (although detachment from the nucleus does not seem to be required for the eventual elimination of this residual structure). Intriguingly, centriole loss in this system does not appear to require the down-regulation of PLK activity, which is in contrast to the situation in Drosophila oogenesis.

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Major comment:

      In the EM shown in Figure 5F the authors claim that the central tube of the centriole is disrupted, but the other elements (inner tube, MTs and paddlewheel) are not. I don't think this is as clear cut as the authors claim-at least from comparing the images of the one normal centriole (5E) and one centriole that is starting to be eliminated (5F). It seems much harder to distinguish the MTs and the inner tube in the image in 5F. Perhaps this is obvious to the authors as they have compared many more images, but I think they need to find some way of showing this more convincingly (a montage of multiple centrioles)?

      We understand that Figure 5F alone may have left the reviewer wondering whether the central tube is truly the first element to be disrupted during centriole elimination. We plan on strengthening this point by providing additional EM images as a Supplemental Figure.

      This same issue is compounded in Figure 6D where, using a different technique (U-Ex-STED), the authors claim that the centriolar distribution of SAS-1 is gradually disrupted as centriole elimination proceeds. It does look like the amount of SAS-1 has decreased from early prophase to late pachytene, but the central tube it stains doesn't look particularly disrupted and, if anything, the MTs look more disrupted (and also possibly of lower intensity, perhaps explaining why the ratio of SAS-1/tubulin doesn't change very much over these stages, as shown in Figure 6G).

      As the reviewer correctly noticed, there is some variability in central tube removal during oogenesis. In some cases, such as in the centriole on the right of the late pachytene panel in Fig. 6D, SAS-1 signal intensity diminishes uniformly, without apparent holes in the central tube. By contrast, in other cases, such as in the centriole on the left of the late pachytene panel, SAS-1 signal intensity diminution is accompanied by a loss of central tube continuity. We will clarify the writing and qualify our findings on this important point in the revised manuscript.

      These points are important, as throughout the manuscript the authors assume it as a fact that SAS-1 leaves the centriole early (which is clear), and that this leads to the specific loss of the central tube (which, at least on the basis of this data, is not so clear).

      As mentioned above, we will make certain that the results linking SAS-1 departure and central tube loss are explained in a clear and balanced manner in the revised manuscript.

      Minor comments:

      1. The authors state that the kinetics of GFP-SAS-7 or SAS-4 loss were not altered in pcmd-1 mutants (Figure 4A-C; Figure S3E,F). This doesn't look correct to me, as both proteins seem to stay brighter for longer in the mutant embryos (and this is quite easy to see on the quantification graph for SAS-7 in Figure 4C). It looks similar for SAS-4 from the pictures shown in Figure S3E,F, although this data is not quantified (and is there any reason why this data is not quantified?).

      As mentioned in response to reviewers 1 and 3, we will mention in the revised manuscript that a mild developmental delay can impact the number of maturing oocytes present in the proximal gonad, thereby leading to this slight shift in GFP::SAS-7 and GFP::SAS-4 persistence.

      1. The authors state that they demonstrate that SAS-1 is the first component to leave the disassembling centrioles. I would rephrase as they can't know this for sure (i.e. there could be some untested component that leaves earlier).

      In response to this suggestion and to the related comment of reviewer 1 (see above), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      In the latter part of the Discussion the authors state that SAS-1 is critical for centriole elimination. I would rephrase, as this seems to suggest it is required for centriole elimination, which is not the case. It might also be worth discussing that the elimination machinery clearly seems to target SAS-1 early on, but we don't yet know what this machinery is or how it is regulated.

      We thank the reviewer for raising this important point, which we will implement in the Discussion accordingly.

      Reviewer #2 (Significance (Required)):

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Pierron et al. uses C. elegans oocytes to tackle a fundamental, yet heavily under-studied question in developmental biology: how are centrioles are eliminated during gamete formation/maturation? The paper's main conclusion is that SAS-1 (a key protein that make up the central tube in C. elegans centrioles) plays a critical part to regulate the timing of centriole elimination. I congratulate the authors on all the experiments related to SAS-1 part of their story, as they are done meticulously and in unprecedented detail (particularly all the fascinating EM and expansion microscopy data!).

      The paper also concludes that the Polo-like kinase family does not have a central role in this process, in stark contrast to a previous report demonstrating their importance for centriole elimination in Drosophila oogenesis (Pimenta-Marques et al. 2016 Science). Unfortunately, I am less convinced about this part of the paper, and half of my major comments below relate to the experiments/analyses in this regard. I was similarly not very enthusiastic about a part of story that I didn't find very relevant to the main point of the paper: half of the centrioles detach from the nucleus and translocate to plasma membrane prior to their elimination. I find the observations here quite epiphenomenal and lacking a direct/mechanistic relevance to either the PLK or SAS-1 part of the story. In my view, the authors should consider taking this part out.

      Regarding this last suggestion: we think that even if the movement of centrioles remnant is not essential for final removal, an account of this process provides important information about cellular dynamics during oocyte maturation. We note also that the two other reviewers did not raise this point, but leave the final decision to the editor.

      Overall, the piece is well written and organized, however it suffers from several shortcomings that preclude it from publication in its current form. I list my criticisms and suggestions below.

      Major comments:

      1. The authors state firmly at several places in the text that PCM components do not contribute to the timing of centriole elimination (e.g., lines 420-421), particularly given their experiments with Polo kinase paralogs. In my view, the data speaks otherwise. The centriole elimination process appears strikingly premature when SPD5__1__ (another PCM component) is overexpressed with the fluorescent transgene (Figure 1I). The opposite is also true - when another PCM component, PCMD-1, is knockdown by a temperature sensitive allele, the centriole elimination process is severely delayed 2 (Figure 4C). Even more extremely in the epistatic Polo mutant conditions (Fig. S3B), the centrioles do not appear to be eliminated at all__3__ (though the authors prefer to interpret this result differently in line 260-263, which could be flawed per my second comment below). How do the authors explain all these intriguing results? (underlining and numbering added above to clarify our responses point by point hereafter)

      1 > We respectfully disagree, since our quantifications show clearly that the SAS-7 signal disappears with an analogous timing in the line expressing RFP::SPD-5 (Fig. 1J) when compared to the other lines (Fig. 1D, 1F and 1H). The image shown currently for RFP::SPD-5 (Fig. 1I) is somewhat of an outlier compared to the others (Fig. 1C, 1E and 1G), and we will therefore provide a more representative specimen in the revised manuscript to avoid confusion.

      2 > As mentioned also in response to reviewers 1 and 2, we realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution (as opposed to representing a delay in centriole elimination in pcmd-1(t3421ts) mutant animals).

      3 > We used plk-1(or683ts); plk-2(ok1936) double mutants to further test whether there might be premature elimination in this strong reduction-of-function condition compared to RNAi-mediated depletion. Although centriolar foci appear to remain for a longer time, these gonads are extremely disorganized, so that our conclusion regarding PLK-1 and PLK-2 are based primarily on the combined data shown in Fig. 3 and Fig. S3, which do not exhibit premature centriole elimination. We will rectify the writing to clarify these points.

      Also, I believe these claims (on the PCM components and their role in centriole elimination) will benefit from more nuanced statements. For instance, although Plk paralogs may not be necessary for the centriole elimination process, some other centrosome components clearly are. Paradoxically, the effects observed here (when disrupting or promoting PCM formation) has the totally opposite effects observed in Pimenta-Marques et al. 2016 Science. The 2016 piece claimed that the loss of PCM renders centrioles more vulnerable to losing their stability (which makes sense). How do the authors interpret their own results (i.e. that a disturbed PCM leads to slower centriole elimination, and vice versa)?

      As suggested by the reviewer, we will consider toning down claims regarding the role of PCM components in centriole elimination. Moreover, we will expand the section in the Discussion comparing our results with the published work of Pimenta-Marques et al. in Drosophila. This being written, as mentioned above, our findings do not suggest that removing the PCM (in pcmd-1(t3421ts) mutant animals) alters centriole elimination timing in C. elegans.

      I invite the authors to more carefully tread these nuances throughout their manuscript, which otherwise may cast major doubt on their claims.

      See point above.

      1. When investigating the role of Polo-like kinases, the authors assume that centriole elimination must follow (or correlate with) the dynamics of RME-2 (as a proxy for oocyte maturation). What guarantees that the centriole elimination process has to follow oocyte maturation? As far as I could tell, there is no direct evidence presented in the paper about this point. Do the authors have direct data (or reference to another work) that this trend must hold true at all times? I can readily see several places in the paper where this correlation doesn't appear to hold (e.g., in Fig. 4D the centriole elimination precedes the oocyte maturation under pcmd-1 condition).

        We will provide further data supporting the view that oocyte maturation and centriole elimination are correlated, whereby premature oocyte maturation mutants, such as let-60(ga89ts) and kin-18(ok395), exhibit precocious elimination.

      To correctly interpret their results on the epistatic Polo mutants, the authors could examine centriole elimination timing with mutants that can pre-maturely trigger or delay oocyte maturation (and do so without affecting the centriole biology itself).

      See above point.

      1. Lines 155-159 on the dimness of the SAS-6 signal make me worried about how successfully the transgenes were generated. Could the authors comment on, or perhaps extend in detail in the Methods section, through what assays the transgenes were validated? For example, did the authors try to rescue a SAS-6-/- with a SAS-6::GFP transgene? I would like to see further support for their validities.

      We will explicitly explain in the Material and Methods section that the SAS-6::GFP transgene indeed rescues the sas-6 null phenotype.

      If the authors can demonstrate the validity of their transgenes more reliably, could they possibly comment on the bunch of seemingly random SAS-6::GFP foci in Fig. 1G?

      We will comment on the presence of small SAS-6::GFP foci in the most mature oocytes, which correspond to potential precursors of centriolar elements later assembled in the embryo.

      1. Starting from line 204, the authors use the percentage of oocytes with detached centrioles (from the nucleus) as a proxy for movement to plasma membrane. This can be very confounding in my view (due to erroneous detachments etc.). As the authors explicitly state that the detachment is a process followed by a directed movement (with a defined velocity) towards the plasma membrane, this calls for a much better measurement in general. The authors should directly measure how far the centrioles are from the closest plasma membrane region in each condition they are examining (and should do this as a function of the "time progression" in different oocytes as they get closer to fertilization).

      As mentioned above, we think that an account of the movement of centriole remnants provides important information about cellular dynamics during oocyte maturation. However, given that this movement is not essential for the elimination of such remnants, it appears that providing additional complex 3D analysis as suggested by the reviewer will not benefit the present manuscript.

      Do the authors observe any propensity in sas1(t1521ts) oocytes as to where the centrioles are being degraded more prominently in the cytoplasm (i.e., when attached to the nucleus vs. when near the plasma membrane)? They could perform analyses à la their assessments in Fig. S2 and see whether they can extract some more information about this. In other words, I am wondering whether SAS-1 regulates the centriole elimination process more prominently at near the nucleus or near the plasma membrane.

      Centriole elimination occurs during pachytene in sas-1(t1521) mutant animals, when nuclei are packed in the gonad and surrounded by little cytoplasm. Therefore, even if foci were to detach from nuclei at this stage, we would not be able to quantify it with certainty. We will discuss these points in the revised manuscript.

      I ask this because the section about "centrioles moving to plasma membrane" appears epiphenomenal and rather random (i.e., the chances of a centriole moving to plasma membrane appears 50-50 under some control conditions - see control RNAi in Fig. 2G for example). Could the authors explore their existing data more closely (like suggested above), to see whether they could find intriguing correlations that tells us a little more about whether the centriole elimination at these two places are achieved differently? Otherwise, I frankly do not think this section contributes significantly to the essence of the story.

      We apologize for the confusion our writing seems to have generated. The chances of moving to the plasma membrane are not 50-50. The actual figure is 78.7% (reported as ~80% in the manuscript, line 187), and stems from the live imaging experiments where every travelling event can be monitored. By contrast, the analysis of fixed specimens is an underestimate as it provides only a snapshot of a dynamic process. We will expand the writing in the revised manuscript to clarify this point.

      Finally, the statements about a deterministic function for the plasma membrane re-localization should be toned down, because unlike what the authors claim in the paper (that ~80% of the centrioles move to plasma membrane), the control data (in Fig. 2B) clearly demonstrates that this number is more like ~60% (hence close to its chances being 50-50).

      Please see response just above.

      The paper carefully quantifies most of the data (for which I sincerely congratulate the authors!), however the experiments in Fig. S3 fall short of this. It would be nice if the authors could do the same here for completion.

      We will provide quantifications for Fig. S3E and S3F. However, due to the high disorganization of plk-1(or683ts); plk-2(ok1936) gonads, the presence of centriolar foci relative to oocyte position cannot be quantified accurately in this case.

      Minor comments:

      1. Sentence in lines 110-113 is too long and perturbs the flow. This should be shortened or be broken into better clauses. Perhaps the following way? "Prior analysis of centriole elimination in C. elegans oogenesis uncovered that this process takes place during diplotene..."

      The text will be modified accordingly.

      What are the orange arrowheads in the figure panels? They are not stated explicitly in the figure legends. My prediction was that they point to regions where centrioles are in another plane (though the overview is depicted from a different slice in the stack). Is this right? Either way, it will be useful to over-guide the reader on these orange arrowheads.

      The meaning of the orange arrowheads is explained in lines 520-521.

      If I am not wrong, the data/graph in Figures S2G and 2E are essentially the same (i.e., the data are duplicated). I couldn't find any statement in the figure legends indicating this. This should be added.

      Apologies about this oversight -the reviewer is correct and we will make a mention of this redundancy in the legend of Fig. S2.

      Some may consider the discussion on C2CD3 a little far-fetched, as this protein localizes to the distal end of centrioles (completely unlike SAS-1). Also, unlike the C. elegans centrioles, mammal centrioles do not contain a discernible central tube, casting doubt on the possibility of speculations made in the Discussion section. I suggest to remove out this paragraph, and instead to explicitly state whether the SAS-1 dependent mechanism could be applicable to other species is unclear.

      We will nuance these thoughts, further stressing their speculative nature, but intend to maintain them in some form as they provide a potential parallel that will be of interest to the human cell biology community.

      Could the authors add in their Discussion section some comment/thought on what the remaining GFP::SAS-7 pool (line 300-302) might possibly be? Curiously, there doesn't seem to be any structure associated with it in their EM tomograms, so it would be helpful to guide the reader further on this interesting finding.

      Although we would love to comment on this further, the remaining GFP::SAS-7 foci lack ultrastructural organization and do not exhibit recognizable electron densities. That this is the case will be stated explicitly in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      General Assessment: This paper's strength is in its rigorous cell biology approaches to tackle a fundamental developmental biology problem. However, some of their conclusions are too firm while not being well-supported by the data, so the paper requires major revision before its publication.

      Advance: Discovery of a new molecular player in the centriole elimination process in worm oocytes, which can pave the way for future discoveries of centriole elimination mechanisms in other species. It is not yet clear whether the results will be broadly applicable, as some of the findings presented are in stark contrast to previous studies published on centriole elimination processes in Drosophila oocytes (e.g., Pimenta-Marques et al. 2016 Science). However, as summarized in the above section, these conclusions require further experimental evidence/support.

      Audience: Centriole elimination mechanisms are not widely studied, so I am not entirely sure whether this piece will be of immediate interest to the broad cell biology community. It will certainly be of general interest to several groups studying centriole elimination mechanisms, as well as developmental biologists trying to understand the oocyte maturation process.

      My expertise: Molecular and cellular mechanisms of cytoplasmic organization in development

    1. There’s tremendous value in coming into yourself as a person. Why wouldn’t that be true online, too? Recognizing that my online self was lacking, I decided to learn how to be myself on the internet.

      It is impossible to present yourself truly on the internet, to come into yourself as a person, when everything is highly self conscious and selective, as well as limited and misleading. In person we struggle to understand eachother. This may be because of the internet so I have no frame of reference, but how is the internet any better? Maybe because your inner dreams and thoughts can be shared alongside pictures of you - I am realizing what I know of internet representation of people is basically instagram and snapchat so I can't imagine a different reality. To accurately represent oneself you must be honest, a quality we are all incapable of to an extent, and I think the internet and its way of falsely representing things might create so much insecurity that this only pushes us further from honesty. You can't hide nearly as much when you are in front of people.

    1. Reviewer #1 (Public Review):

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"<br /> This seems to be based on Figure 4, which leaves one with significant concerns.

      "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."<br /> The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:<br /> "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."<br /> In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made more clear.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank reviewers for their insightful comments.

      Overall, there were two major concerns/suggestions:

      • Applicability to humans of the increase of BTC in non-alcoholic steatohepatitis (NASH) and mechanisms of downregulation of BTC by omega-3. We now analyzed __3 __additional human gene expression datasets and show that BTC not only is increased in human NASH (as we have already shown for liver cancer meta-analysis), but is also decreased in livers of patients who received omega-3.

      • One of the reviewers suggested investigating a potential mechanism of how BTC is regulated by omega3 fatty acids. Although a complete answer to this question would require entirely new studies to be done, we still performed additional investigation that was possible within a reasonable timeframe. We found that transcription factor FOXO3 (well-known inhibitor of carcinogenesis) is a highly probable mediator of the DHA inhibitory effect on BTC.

      See all details of items 1 and 2 as well as answers to other (less critical concerns) below after each specific question.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This work by Padiadpu and colleagues investigate the mechanism by which pufa of the n-3 series (mostly DHA) may influence NAFLD progression using systems biology analysis and multiple omics analysis. The work is interesting and may provide a novel view of the topic. However, there are a number of issues the authors may wish to consider in order to improve their manuscript.

      Major issues: Clarity: Since the authors refer to previously published experiments, they must refer to this work in the figure legends and improve the clarity of such legends. Here are a list of issues that must be fixed:

      Fig.1: First panel is not clear. What does the table tell the reader? What are the effects of the different diets on NAFLD?

      All the transcriptomic data are newly generated from the samples of previously published studies. The table shows the number of features changed by DHA and/or EPA in each of the -omics and phenotypic data used in the analysis.

      I understand that the results are published elsewhere, but the authors must provide information regarding the NAFLD/ NASH scores.

      We now added a supplementary table 1a showing the scores.

      Fig.4: Why is there sometimes a DHA diet, sometimes DHA and EPA. Legend is not clear. What does WD + Mean? I guess it is olive oil... But the legend must be improved.

      We added details in the legend for more clarity. Specifically, WD+O means WD + olive oil added as a control for WD+DHA, WD+EPA. As described in the 2nd paragraph of results, when both EPA and DHA had a similar and significant effects in reversing WD effect, it was defined as “EPA&DHA category” of parameters. When only WD+DHA or WD+EPA were significantly changed vs WD+O, those were assigned as “DHA category” or “EPA category”, respectively.

      One issue the authors may consider trying to fix is the specificity of the effect of DHA on BTC.

      Is it really specific? It seems to me that EPA has more or less the same effect. If the effect is DHA-specific, than make this clearer through the text.

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a statistically significant stronger effect than EPA (Fig. 3D).

      Another issue the authors may wish to investigate is the relationship between W3 consumption and BTC expression in studies performed by other labs (if available on Gene expression omnibus?).

      Thanks for the suggestion. We used publicly available data of human and mouse studies that showed significant increase in liver BTC gene expression in NASH in multiple datasets while a human trial with Omega 3 treatment for one year showed its significant reduction (Figures 3F - human data, S3G-mouse data).

      Finally, a key issue would be to identify the mechanism by which DHA inhibits BTC expression? How does this happen? could such inhibition be induced by other fatty acids of the W3 series? I understand that this is not easy to address but it would significantly strengthen the manuscript.

      Thanks to your question we investigated and found at least one of potential mechanisms contributing to how “DHA inhibits BTC expression”. See details in the answer to next question. As for “other fatty acids” while we agree this is important question, it is outside of the scope of the current study but will be investigated in future studies.

      Moreover, it might be possible to identify the set of genes highly co-regulated with BTC expression and to investigate the possible transcription factors at play in the control of such gene set.

      We really appreciate this question as our efforts in this direction provided one potential mechanism. A direct screen of transcription factor (TF) motifs in genes co-regulated with BTC did not provide any clear results. Therefore, we implemented a combination of network analysis and screen for motifs in BTC gene with the in vivo and in vitro treatment results and found FOXO3 as a candidate TF regulated by DHA upstream of BTC.

      See details of the analysis and results in a new Supplementary Figure S6 and corresponding text located at the end of the results.

      Minor: the authors use the term "beneficial" transcriptome alterations by DHA.

      I do not think it is correct to use "beneficial".

      We agree and removed the word "beneficial”.

      Reviewer #1 (Significance (Required)):

      Strength: This paper uses new approaches to investigate the relationship between W3 consumption and liver gene expression and its relevance to chronic metabolic liver diseases.

      The experiments and data set used to perform systems biology are from an excellent lab (the authors lab) who has published a lot of important and reproducible discoveries in the field of regulation of gene expression by dietary fatty acids.

      The work has high translational relevance in medicine / hepatology / metabolism.

      I am not a qualified reviewer to assess the systems biology that has been done.

      Limitation: The mechanistic link between DHA consumption and BTC expression is not very clear. The specificity of this effect could also be tested (DHA vs other W3 and/or W6).

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a significantly stronger effect than EPA (Fig. 3D). Other omega fatty acids were not tested but it can be done in future studies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids in WD-fed mice.

      Major Comments: (i) No histological analysis was presented and indeed this is of clinical relevance for NASH since diagnosis is still based on biopsy.

      While histological evaluation was presented in the originally published papers (PMID: 28422962, 23303872), it is now provided in Supplementary Table S1a.

      (ii) Human comparative analysis: is done with HCC not with NASH patients.

      This cancer-related dataset is most likely obtained from different etiologies.

      I would suggest comparing these mouse datasets with GSE48452 (human NAFLD-NASH spectra).

      Thanks for this important question. We now analyzed available human data of NASH and show significant increase of BTC expression in two datasets while a human trial with omega-3 treatment for one year showed its significant reduction of BTC expression (Figure 3F) resembling our observations in mice.

      (iii) to compare the inflammation and fibrosis (also lipid metabolism), one can compare these mouse datasets with GSE222576 and cite this preprint (https://doi.org/10.21203/rs.3.rs-2009380/v1)

      Using the suggested dataset (of a chemically induced liver fibrosis), we first observed that Btc gene expression was significantly increased over 10 weeks of the model and now included this result in Fig. S3G.

      We also queried the 66 genes from the network modules described by the authors to check their changes in our NASH model. We observed that 28 genes were differentially expressed in NASH with 14 of them belonging to the module that authors named as “Pathways in Cancer”. Other genes were from the lipid metabolism (4 genes), immunity (2) and inflammation (2 genes). In addition, we observed that several genes we found regulated by omega-3 and changed in this fibrosis model contained other inflammatory genes such as classical macrophage genes (Mmp12, Lgals3, Cd68, Trem2), fibrosis (Col4a1, Col27a1, Itga2b, Itga8) and lipid metabolism (Scd2, Lpl, Soat1). Of note, the preprint has been published and we now cite the corresponding article.

      Minor comments:

      (i) The heatmap in Figure 1B and another heatmap should show all mice not the average to see the variability

      The supplementary figure with all the individual mouse data as another heatmap is added to show the variability and similarity (Figure S1D).

      Reviewer #2 (Significance (Required)): The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids.

      This is well designed experiment, and the results are of interest to hepatologists and should be indeed published after consideration of the following points

      Strength is multiOMICs approach.

      Weakness is human applicability.

      We improved human applicability by investigating 3 additional human datasets of NASH (Fig. 3F) and finding consistent changes in BTC expression closely resembling our observations in mouse NASH model, including one trial with omega-3 treatment of patients for one year showing significant reduction in BTC gene expression.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that a hybrid measurement method increases 3 fold the resolution of mouse USV localization. This increased resolution enables to revise previous occurrence frequency measures for female vocalizations and establishes the existence of vocal dominance in triadic interactions. The method is well described and its efficiency is carefully quantified. A limitation of the study is the absence of ground truth data, which may have been generated eventually with miniaturized loudspeakers in mouse puppets. However, a careful error estimation partially compensates for the absence of these likely challenging calibrations. In addition, the conclusions take into account this uncertainty. The gain in accuracy with respect to previous methods is clear and the impact of localisation accuracy on biological conclusions about vocalisation behavior is clearly exemplified. This study demonstrates the impact of the new method for understanding vocal interactions in the mouse model, which should be of tremendous interest for the growing community studying social interactions in mice.

      We have performed the requested, additional ground estimate using a movable miniature speaker, for more details see point 2 of Reviewer 2, and the new supplementary figure.

      Reviewer #2 (Public Review):

      Past systems for identifying and tracking rodent vocalizations have relied on triangulating positions using only a few high-quality ultrasonic microphones. There are also large arrays of less sensitive microphones, called acoustic cameras that don't capture the detail of the sounds, but do more accurately locate the sound in 3D space. Therefore the key innovation here is that the authors combine these two technologies by primarily using the acoustic camera to accurately find the emitter of each vocalization, and matching it to the highresolution audio and video recordings. They show that this strategy (HyVL) is more accurate than other methods for identifying vocalizing mice and also has greater spatial precision. They go on to use this setup to make some novel and interesting observations. The technology and the study are timely, important, and have the potential to be very useful. As machine learning approaches to behavior become more widespread in use, it is easy to imagine this being incorporated and lowering entry costs for more investigators to begin looking at rodent vocalizations. I have a few comments.

      1) What is the relationship of the current manuscript to this: https://www.biorxiv.org/content/10.1101/2021.10.22.464496v1 which has a number of very similar figures and presents a SLIM-only method that reportedly has lower precision than the current HyVL approach. Is this superseded by the submitted paper?

      The referred manuscript (now published in Scientific Reports) is indeed related to the current work: The currently presented system is based on the integration between SLIM (based on 4 high quality microphones) and Beamforming (based on the 64-channel microphone array). The accuracy of SLIM is generally lower than that of HyVL, but it makes essential contributions to the overall accuracy of HyVL through the integration of the complementary strengths of the two methods/microphone arrays (see Fig. 3A, L-shape of errors). To our knowledge, SLIM was the previously most accurate technique (based on 4 microphones, see comparison in the Discussion), but HyVL exceeds this by a substantial margin. Some figures appear similar mostly due to related code in the underlying analysis pipeline and visualization scripts (e.g. the half-disc densities). However, the set of dyadic and triadic recordings was collected specifically for the present study, and all top-level analyses were performed separately. The single mouse (C57Bl/6 WT) ground truth dataset is shared between the two studies, where in the SLIM paper only the USM4/SLIM part was evaluated (leading to a correspondingly lower, single animal accuracy).

      We felt that the level of detail above would probably impede the reading of the manuscript, and we have therefore added a subset of the above clarifications to the methods and the first time the other study is mentioned.

      2) Can the authors provide any data showing the accuracy of their system in localizing sounds emitted from speakers as a function of position and amplitude? I am imagining that it would be relatively easy to place multiple speakers around the arena as ground truth emitting devices to quantify the capabilities of the system.

      Ground truth data is critical for any meaningful comparison. First, we would like to highlight that we already provided ground truth data in the previous version of the manuscript: In Fig. 3C. we analyzed vocalization data from trials with (1) just a single mouse as well as (2) vocalization at times when all mice were far apart in relation to the accuracy of HyVL (>100 mm, i.e. >25x the accuracy of HyVL) where the chances of erroneous assignment are negligible. We think that these tests are the most relevant, as they are conducted with the relevant sounds, at their actual intensity, spectral profile and emitter acoustics.

      In addition, we have now conducted a series of tests with sounds produced by a miniature speaker placed in 25 different locations to demonstrate the lower-bound of accuracy achievable with the system. The tests indicate an accuracy of MAE < 1mm under these ideal conditions, i.e. without the absorption of the mouse bodies, varying direction of emission of the mouse snout, varying intensity, varying spectral content, duration, etc. Exploring the dependence on all these parameters is in itself interesting, but requires a detailed study in itself. The detailed experimental conditions and results are now provided in Supplementary Fig. 4, including a quantification of the dependence on amplitude.

      3) How is the system's performance affected by overlapping vocalizations? It might be useful to compare the accuracy of caller identification for periods where only one animal is calling at a time vs. periods where multiple animals are simultaneously calling.

      This is an excellent question. Our current code for detecting vocalizations cannot automatically determine if one or multiple vocalizations are concurrently present. We have therefore manually checked all vocalizations for overlapping instances, including those in triadic recordings with two males, where this would be expected to occur most frequently.

      We considered vocalizations to be overlapping if the overlapping constituent timefrequency traces did not form a harmonic stack. Overall, overlaps were surprisingly rare. We did find a couple of cases (<0.1%) where our detection algorithm produced a longer vocalization interval that contained multiple, differently shaped vocalization traces that, when re-analyzed in shortened time-frequency bins with beamforming, belonged to two different males. Note here that beamforming is separately performed from the onset to the end of each vocalization, so the cumulative heatmap can change depending on these onset and end times, which are normally determined by our detection algorithm.

      However, although the identity of the assigned vocalizer could shift in these very rare cases depending on which time bin was re-analyzed, the system’s localization performance remained in principle unaffected: as mentioned above, shorter time bins on non-overlapping parts correctly show the origin of the vocalizations in this case, and therefore a solution to this issue could be a USV detection algorithm that is able to detect the overlap based on the spectral shapes and parses them apart. During the beamforming each vocalization can then be separately localized, by restricting the beamforming to the corresponding time and frequency range. Further, the analysis could be refined so that multiple salient peaks can be detected in the soundfield estimate. This would, however, substantially change the analysis approach, i.e. rather than a single estimate per USV, a sequence of soundfield estimates should be computed and later fused again. Since such a procedure uses less data per single estimate, it also increases the possibility of false positives, which in the current situation with very few overlaps in time, would likely reduce the overall accuracy of the system, we decided to not modify the algorithm in this direction, but we agree that ideally a joint approach - combining separation on the spectrogram and soundfield level - should be pursued. For the present data, if a time window was analyzed such that the intensity map of the sound field contains multiple hotspots of an approximately equal magnitude, the USV would likely remain unassigned, because the within soundfield uncertainty would be higher than for a single peak, and this would reduce the MPI. However, given the rarity of these cases in our dataset, we do not think that their exclusion would change the results appreciably. This information was added as a paragraph to the Discussion.

      It is worth noting that HyVL is very robust: There were a number of cases (<5%) where environmental dampening in combination with harmonic stacking produced interesting timefrequency traces in some of the USM4 microphones, but our system did not have any issue spatially localizing this - what seems like a - smeared vocalization trace. We provide a few examples of this kind in a short video (see Rebuttal Video 2 and the legend at the bottom of this document), where the overlap is also reflected in the intensity map of the sound field, overlaid onto the platform.

      4) Can the authors comment on how sound shadows cast by animals standing between the caller and a USM4 affect either the accuracy of identification or the fidelity of the vocal recording?

      An important point to raise. Sound scattering and dampening caused by the conspecifics of the vocalizing animal can impede the accuracy of any sound localization system, but can unfortunately not be avoided in a social setting. To address this issue, we raised all USM4 microphones by ~12 cm above the interaction platform to minimize the instances of sound blocked by the mice. Further, the Cam64 device should largely be unaffected by sound shadows as it is centrally located above the platform. We have added a modified version of the above comment to the discussion under the heading "Current limitations and future improvements of the presented system".

      5) I'm a bit confused about how the algorithm uses the information from the video camera. Reading through the methods, it seems like they primarily calculate competing location estimates by the two types of microphone data and then make sure that a mouse is in close proximity to one location, discarding the call if there isn't. Why did the authors choose this procedure rather than use the tracked position of the snouts as constrained candidate locations and use the microphone data to arbitrate between them? Do they think that their tracking data are not reliable or accurate enough?

      Thanks for this important suggestion, which we have actually grappled with a lot during the analysis. First of all, the visual tracking data, in particular the manual data, is in our opinion (based on human visual identification) near perfect (within the limits of the video resolution, pixel resolution = 0.8 mm), i.e. on the order of 1-2 mm, and is therefore not the source of any unattributable vocalizations. If we understand the reviewer correctly, then we indeed perform the attribution as he indicates based on the tracked snouts of all mice, specifically by measuring the MPI's of both acoustic location estimates for all mice and then choosing the most reliable one. Specifically, the attributions can be grouped into 3 cases: (i) Estimated origin close to one snout, and snouts rather far apart, (ii) Estimated origin close to one snout and snouts close, and (iii) estimated origin not close to either snout. (i) is easy to address, (ii) is appropriately handled by the mouse probability index, but (iii) is tricky. Since the vocalization has to come from one of the mice, this already indicates that the localization is not working well in this case. Therefore we found it prudent (similar to Neunuebel et al. 2015) to not assign in these cases. Interestingly the MPI is not useful in these cases, as due to the exponential dependence of the normal density on distance, for example a case with a distance of 50 mm to one snout and 60 mm to another snout could lead to an MPI close to 1, which is likely not trustable. We have described this in the Methods as follows:

      "This distance threshold mainly serves to compensate for a deficiency of the 𝑀𝑃𝐼: if all mice are far from the estimate, all 𝑃𝑘 are extremely small, however, the 𝑀𝑃𝐼𝑘 will often exceed 0.95."<br /> Due to the inherent limit for localizing very quiet, short USVs by any system, we think this kind of selection (introduced originally by Neunuebel et al 2015) is a valuable and necessary step in the processing to avoid confusions (which are of course already substantially reduced through HyVL here).

      6) I guess the authors have code that we can run, but I couldn't access it. The manuscript describes the algorithms and equations that are used to calculate the location, but this doesn't really give me a feel for how it works. If you want to have the broadest impact possible, I think you would do well to make the code user-friendly (maybe it is, I don't know). In pursuit of that goal, I would suggest that the authors devote some of the paper to a guided example of how to use it.

      While the code was made available to the reviewers via the link at the beginning of the manuscript (p2, before abstract), we completely agree that this method of distribution is not very accessible. We have therefore created a publicly available GitHub repository (https://github.com/benglitz/HyVL) which hosts the code and details its use on the basis of a sample data set (which is available to the reviewers in the repository link, and later to the public under https://doi.org/10.34973/7kgc-ta72). While we do provide a sample video and analysis workflow there, our data analysis pipeline is quite integrated and other labs will likely use different pipelines. We have therefore tried to make the core functions independent of our pipeline and thus easy to integrate by others into their analysis pipelines.

      Reviewer #3 (Public Review):

      The present manuscript describes a new method to identify the emitter of ultrasonic vocalisations during social interactions between 2 or 3 mice. The method combines two technologies (an "acoustic camera" and a set of four microphones) and succeeds in increasing the spatial precision and the attribution of USV emission to one of the mice. The manuscript describes the characteristics and advantages of each method and the advantages of using both to optimize the identification of USV emitter. The authors used the method to confirm that females are also vocalising during male-female interactions and that females emit USV mostly during nose-nose contact while this was not the case for males. Interestingly, the authors identified that the vocal behaviour of two competing males was strongly asymmetric when facing a female. This was not the case for two females facing one male.

      The method is really promising since the identification of the emitter of USVs during mouse social interactions is a necessary step to speed up our understanding of this communication modality. The increase in spatial precision and in the proportion of attributed vocalisations is non-negligible and will be of great utility in the future.

      We would like to thank the reviewer for this positive perspective on the future utility of our system.

      Generally, the statistical analyses should be adjusted. Indeed, the statistical analyses do not consider the fact that the same individuals were recorded several times (if we understood well the methods). Each point was considered independent (in non-parametric Wilcoxon tests), while this is not the case given the repetitions with the same individuals (the number of repeated encounters per individual should be given in the methods section, by the way). We strongly recommend revising the statistical analyses of the results in Figures 4 and 5. In addition, it could be interesting to check whether the vocal behaviour is stable within each individual (i.e., a male that is vocalising frequently in one situation vocalises always frequently in other situations).

      We generally agree with this suggestion: In order to properly conduct the analysis for individuals as you suggest, a balanced dataset should be used. We had initially collected such a balanced dataset, which was previously not detailed in the manuscript, as the focus was on USV localization/attribution and hence only the recordings containing USVs were analyzed (detailed now in the beginning of Results and Methods). However, overall, the probability of a recording containing vocalizations at all is low: in our balanced set only 23/112 recordings contained vocalizations. We therefore had collected additional recordings with the best vocalizers which created the previously analyzed set of 83 recordings containing USVs recorded with all microphones. This dataset is therefore dominated by recordings from mice that are active vocalizers. While this does not raise any issue for the estimation of the accuracy of the method (Figure 3) or the female vocalizations (Figure 4, because recordings were always randomized across female mice), it precludes an encompassing analysis of individual differences in Figure 5, i.e. the dyadic-triadic comparison. In the new Figure 5, we address the reviewer's question for the dyadic recordings, finding that the current set of recordings does not provide sufficient evidence that individual male mice had significantly different vocalization rates. We would, however, like to point out that this is likely a consequence of the n=4 recordings that are compared here. For the female mice, we also did not find differences in vocalization rates, which is based on n=14 recordings and thus a more reliable result (p=0.16, 1-way ANOVA with factor individual).

      For the triadic recordings, however, due to a limitation in the experiment execution, we unfortunately do not have the complete information available on an experiment level for the triadic recordings, i.e. the video stream was accidentally started after all mice were placed in the platform, and since the same sex animals are visually not separable (while the female mice are separable from the males, based on a slightly shaved region on their head), we cannot completely assess this question in triadic recordings based on the available data. When including the triadic recordings in addition and assuming a single vocalizer (combining all male USVs, see below for why the males could not be assigned in the triadic condition) the male individual comparison can be approximately performed with n=8 recordings, and then the dependence on individual becomes borderline significant (p=0.028, 2-way ANOVA with factors individual and condition).

      For the comparison of vocalization rates in the previous Figure 5 that the reviewer was referring to, we cannot perform a rigorous analysis on the individual level, due to the lack of balance. While we thus agree that differences between individual mice can contribute to the differences observed, we do not think that this would change the conclusion that one of the mice dominates the vocal emissions. If the reviewers agree, we would thus leave Figures 6 (old Fig. 5) and new Figure 7 (behavioral confirmation of dominant/subordinate division) as part of the manuscript, with a clear cautioning about the possible contribution of individual differences to the observed differences. If the reviewers find it inappropriate to leave the results based on the unbalanced dataset in, all results after figure 5 could also be excluded (although we would find this unfortunate, given the additional time and effort we have invested in these).

      It is not easy to understand the rationale behind testing animals in pairs and in triads from the beginning of the manuscript. The authors should better introduce this aspect in the manuscript, especially given the fact that biological results deal with this aspect in Figure 5. The authors might strengthen the parts of the biological results extracted from their new method.

      Thank you for pointing out the need for clarification regarding the rationale behind testing animals in pairs and in triads. It is because courtship interactions are particularly vocal and social, that they are of interest to many fields, e.g. neurodevelopmental disorders.3,4 Due to the natural competitiveness between mice during courtship interactions, high accuracy is particularly beneficial in this regard because it allows disentangling USVs at close distances. We adapted the introduction to better reflect this reasoning and included an extra paragraph in the introduction and also where the biological results from old Fig. 5 / new Fig. 6 are summarized.

      More specifically, the fact that one male takes over the vocal behaviour within a triad is of high interest. Nevertheless, some behavioural data would be needed to strengthen these findings.

      We agree that this is an interesting finding and also agree that some additional behavioral analysis is useful to complement it. In order to arrive at this analysis, we performed all-frame, 3-animal tracking on the 14 triadic recordings with two males. This required switching to skeleton tracking with SLEAP5 in addition to manual post-processing to ensure that no identity switches occur. In each recording the dominant male was then defined as the one that emitted more vocalizations, and then the vocalization-independent spatial interaction histogram was computed, similar to the ones in Fig.4, but now separating between the dominant and the subordinate males (see new Figure 7). The results are consistent with the most typical location of vocalization of the male, in proximity to the female abdomen: The dominant male's spatial interaction histogram (Fig. 7A) was more clearly peaked in the location of the female abdomen very close to the male's snout, in comparison with the subordinate male's histogram (Fig. 7B), which shows up very clearly in the difference between the normalized histograms (Fig. 7C). Significance analysis was performed using 100x bootstrapping on the relative spatial positions to estimate p=0.99 confidence bounds around the histograms of the dominant and subordinate respectively. Significance at a level of p<0.01 highlights multiple relative spatial positions (Fig. 7D), including the one proximal to the snout which has the largest absolute difference (Fig. 7C). Note, that these analyses were conducted on the basis of the non-balanced dataset which contained enough vocalizations to assess the dominant male based on the vocalization rates and thus individual traits of certain animals remain as a possible confound.

      A small proportion of USVs was not assigned. The authors did not discuss the potential reason for this failure (Were the USVs too soft? Did they include specific acoustic characteristics that render them difficult to localise?). These points could be of interest when testing other mouse strains or other species.

      Good point, we agree that it is interesting to know the reasons for failure. As so often, there is not a single property that makes localization hard, but multiple factors contribute. In the SLIM paper, we already identified duration and intensity as important contributors (Fig. 3E/F), and in the speaker test (see new Supplementary Fig. 4) we again demonstrated the influence of intensity. In addition, frequency bandwidth and acoustic occlusion are two other main contributors that each influence the availability of the information/signal-to-noise ratio at the microphones:

      • Frequency bandwidth: In signals that are very narrowband, there are more opportunities for phase ambiguity, in particular for very high-frequency signals. These are avoided/reduced for more wideband signals.

      • Acoustic occlusion: As ultrasonic sounds can be quite directional, if an animal is vocalizing away from a microphone, which in addition would put its body in the way of the sounds to the microphone, then this can reduce the intensity at the microphone to a level where the information is insufficient to utilize information from this microphone. This mostly influences the 4 microphones surrounding the platform, while the Cam64 overhead will likely not be affected by acoustic occlusion in the plain.

      We have added a brief version of this explanation to the discussion under the heading: "Current limitations and future improvements of the presented system"

    1. Author Response

      Reviewer #1 (Public Review):

      Hoang, Tsutsumi and colleagues use 2-photon calcium imaging to study the activity of Purkinje cells during a Go/No-go task and related this activity to their location in Aldolase-C bands. Tensor component analysis revealed that a substantial part of the calcium responses can be linked to four functional components. The manuscript addresses an important question with an elegant technical approach and careful analysis. There are a few points that I think could be addressed to further improve the quality of the manuscript.

      1) The authors should be careful not to overstate the goal and results. For instance, in the abstract it is stated that dynamical functional organization is necessary for dimension reduction. However, the statement that the 4 TCs together account for about half of the variance (line 220) indicates that dimensionality may not be reduced that much. I would suggest revising the first and last sentence of the abstract accordingly.

      Dynamic functional organization of TC1 and TC2 by synchronization is the major finding of this study and we believe that it is one of the most efficient mechanisms of dimension reduction, given the unique anatomy of the cerebellum. In the revised manuscript, we added a supplemental result showing that the dimensionality of TC1 and TC2 neurons decreased and increased, respectively, in accordance with bi-directional changes in their synchronization (Figure 3 – figure supplement 1DE). Dimension reduction was further confirmed by conventional PCA (Figure 6 – figure supplement 1). However, we agree that the statement that the cerebellum reduces dimensions by self-organization of components is speculative, and we revised the abstract accordingly.

      At the end of the introduction, the authors refer to "the first evidence supporting the two major theories of cerebellar function" but which two theories is referred to and how this manuscript support them is not very obvious. Similarly, they state that "This study unveiled the secret of cerebellar functional architecture", which I would consider to be an unnecessary overstatement of the impact of the work described.

      In the revised Introduction, we explicitly stated that TC1 and TC2 are related to timing control and cognitive error learning, respectively, with some indirect causal evidence. We also revised the last paragraph of the Introduction to emphasize that this study provides the first evidence to support the view that distinct cerebellar components may serve divergent cerebellar functions in a single task. The statement "This study unveiled the secret of cerebellar functional architecture" was removed.

      In the title, the authors use the word modular. In the consensus paper on cerebellar modules (Apps et al., 2018) an attempt is made to unify the terms used to describe cerebellar anatomical structures. Here "module" is used for the longitudinal zone of interconnected PCs, CN neurons and olivary neurons. As the authors only studied PC activity (and indirectly the IO), I would suggest using band, stripe or subpopulation instead.

      Because we used TCA to identify functional components underlying the Go/No-go data, we changed the word “module” to “component” in the title.

      Finally, the term "CF firing" or "CF activity" is used when referring to the recorded signals. However, the authors measure postsynaptic calcium responses that are indeed likely driven by CF inputs, but could also be influenced by PF inputs. At the very least, because Purkinje cells and not climbing fibers are being imaged, "complex spike" should be used instead. It would be more accurate still to use the more general "calcium response" and make less of an assumption about the origin of the calcium response.

      In this study, CF-dependent dendritic Ca2+ signals in adjacent AldC compartments were recorded by the two-photon imaging. The HA_time algorithm (Hoang et al. 2020) was then applied to extract spike timings from the recorded signals. In the revised manuscript, we used the terms “calcium responses” and “complex spikes” when referring to the recorded Ca2+ signals and the estimated spikes, respectively.

      2) For some figure panels and statements in the manuscript error bars or confidence intervals and statistics are missing. This is the case for, for example, the changes in fraction correct, lick latency, fraction incorrect, etc. (Fig 1B, 2E-F, TC levels in 3, 4D-E and 5A-C). Including these is particularly relevant in Fig 4E as this is a key result, mentioned also in the abstract. Please indicate clearly if these plots are cumulative for all mice or per mouse and averaged. I advise the authors to statistically support the claim that the changes are significant and in opposite direction as this element of the study is referred to in the abstract and discussion (summary).

      We added the error bars / confidence intervals to the related figures. Most importantly, we added histograms of synchrony strength for TC1/TC2 neurons (Figure 4E) and conducted statistical tests to strengthen the claim of bi-directional changes in synchronization of TC1/TC2.

      3) Data presentation sometimes does not do the work justice. For example, the data in Figure 6 are very interesting, but hard to read because of the design of the figure. It is clear how the components are mostly confined to Aldolase-C domains, but within the domains the distribution is not clear. I would advise to also more clearly indicate what the locations of the colors within the bands refers to. The spatial distribution of the selected top 300 cells for each TC could be added.

      We added pie-chart plots for the fraction of TC1-4 neurons in each Ald-C zone and learning stage. We also indicated in the figure legend that the location of a single-color bar referred to the geographic distance of the corresponding neuron relative to Ald-C boundaries. We included spatial distribution of the selected neurons in Figure 4 – figure supplement 1D.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a study to investigate the reporting practices in three top cardiovascular research journals for articles published in 2019. The study was preregistered, which makes the intent and methodology transparent, and the authors also make their materials, data, and code open. While the preregistration and sample strategy is a strength, it suffers from a higher than expected number of non-empirical articles decreasing the sample size and thus inference that can be drawn. The author's focus was mainly on transparency of reporting and not on the actual reproducibility or replicability of the articles; however, the accessibility of data, code, materials, and methods is a prerequisite. While the authors were still able to draw inferences to their main objectives, they could not perform some of their proposed analyses because of a small sample size (due partly to the less than half empirical articles in their sample as well as the low number of papers with accessible information to code). One of the descriptive analyses they performed, the country level scores (Figure 6), in particular suffers from the small sample size and while the authors state indicates this in their manuscript I do not think it would be reasonable to include as it has the potential to be misinterpreted since so many are based on an n=1. Overall, I found the authors presentation and discussion clear and concise; however, a lack of a more in-depth discussion is an area to improve the current manuscript. The manuscript outlines opportunities for researchers, journals, funders, and institutions to improve the way cardiovascular research is reported to enable discovery, reuse, and reproducibility.

      We appreciate the reviewer’s recognition of our pre-registration, methodology, and resource sharing and also their feedback regarding the small sample size of empirical research articles and need for a more in-depth discussion of the impacts of our study. We have now increased the number of empirical studies to a total of 393 out of 639 articles screened. We also agree that our study focuses more on transparency than reproducibility and replicability, and we have changed our title to reflect this. While the sample size of empirical papers has increased, a comparison of accessibility scores across countries continued to suffer from small sample size and we have removed it based on the recommendation of the reviewers. We have updated the Materials and Methods section to reflect our updated analyses, as well as included additional paragraphs on Limitations and Future Work in our Discussion to acknowledge future improvements that could be made to the accessibility score used in our study.

      Reviewer #2 (Public Review):

      This is a descriptive paper in the field of metascience, which documents levels of accessibility and reproducible research practices in the field of cardiovascular science. As such, it does not make a theoretical contribution, but it argues, first, that there is a problem for this field, and second, it provides a baseline against which the impact of future initiatives to improve reproducibility can be assessed. The study was pre-registered and the methods and data are clearly documented. This kind of study is extremely labour-intensive and represents a great deal of work.

      I have a major concern about the analysis. It is stated that to be fully reproducible, publications must include sufficient resources (materials, methods, data and analysis scripts). But how about cases where materials are not required to reproduce the work? In line 128-129 it is noted that the materials criterion was omitted for meta-analyses, but what about other types of study where materials may be either described adequately in the text, readily available (eg published questionnaires), or impossible to share (e.g. experimental animals).

      To see how valid these concerns might be, I looked at the first 4 papers in the deposited 'EmpricalResearchOnly.csv' file. Two had been coded as 'No Materials availability statement' and for two the value was blank.

      Study 1 used registry data and was coded as missing a Materials statement. The only materials that I could think might be useful to have might be 'standardized case report forms' that were referred to. But the authors did note that the Registry methods were fully documented elsewhere (I am not sure if that is the case).

      Study 2 was a short surgical case report - for this one the Materials field was left blank by the coder.

      Study 3 was a meta-analysis; the Materials field was blank by the coder

      Study 4 was again coded as lacking a Material statement. It presented a model predicting outcome for cardiac arrhythmias. The definitions of the predictor variables were provided in supplementary materials. I am not clear what other materials might be needed.

      These four cases suggest to me that it is rather misleading to treat lack of a Materials statement as contributing to an index of irreproducibility. Certainly, there are many studies where this is the case, but it will vary from study to study depending on the nature of the research. Indeed, this may also be true for other components of the irreproducibility index: for instance, in a case study, there may be no analysis script because no statistical analysis was done. And in some papers, the raw data may all be present in the text already - that may be less common, but it is likely to be so for case studies, for instance.

      A related point concerns the criteria for selecting papers for screening: it was surprising that the requirement for studies to have empirical data was not imposed at the outset: it should be possible to screen these out early on by specifying 'publication type'; instead, they were included and that means that the numbers used for the actual analysis are well below 400. The large number of non-empirical papers is not of particular relevance for the research questions considered here. In the Discussion, the authors expressed surprise at the large number of non-empirical papers they found; I felt it would have been reasonable for them to depart from their pre registered plan on discovering this, and to review further papers to bring the number up to 400, restricting consideration to empirical papers only - also excluding case reports, which pose their own problems in this kind of analysis.

      A more minor point is that some of the analyses could be dropped. The analysis of authorship by country had too few cases for many countries to allow for sensible analysis.

      Overall, my concern is that the analysis presented here may create a backlash against metascientific analyses like this because it appears unfair on authors to use a metric based on criteria that may not apply to their study. I am strongly in favour of open, reproducible science, and agree it is important to document the state of the science for different disciplines. But what this study demonstrates to me is that if you are going to evaluate papers as to whether they include things like materials/data/ availability statements, then you need to have a N/A option. Unfortunately, I suspect it may not be possible to rely on authors' self-evaluation of N/A and that means that metascientists doing an evaluation would need to read enough of the paper to judge whether such a statement should apply.

      We thank the reviewer for the time taken to review our paper, the appreciation of the work we conducted, and for the suggestions for improving our research methods. To address the initial concern about our analytical approach, the definition for fully reproducible publications that we used was only applicable to research that utilized empirical research methods. We recognize that publications such as editorials and reviews are not inherently reproducible experimental studies; thus, such papers were not provided with an accessibility score, were only screened for the components such as funding and conflict of interest information, and were only compared amongst each other. Additionally, articles such as meta-analyses and systematic reviews that do not include materials had adjusted accessibility scores. We expanded our Methods and Discussion section to further explain our screening process and our assumption that all empirical research articles contain methods, data, and analysis scripts and to acknowledge the limitations of our approach. We also agree that screening more empirical research articles is more in line with the intent of our pre-registration and we expanded the number of empirical research articles screened to 393. We also agree with the reviewer that the analysis by country should be excluded because of the small sample size for most countries, and we have adjusted the manuscript accordingly.

    1. Reviewer #1 (Public Review):

      The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories.

      This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

    1. Author Response

      Reviewer #3 (Public Review):

      Because of the position of pigeon embryos in eggs, light exposure will only stimulate the right eye, leading to lateralisation of brain responses and behaviour. Lorenzi and colleagues injected manganese chloride into pigeon eggs, to assess neuronal activation in the embryonic brain. While the eggs were placed in the light or dark, manganese ions accumulated in neurons that were activated (in cell bodies and axons), which was then visualized with MRI of the embryos before hatching. The authors report lateralisation of neuronal activity in three brain regions, which could potentially be important for our understanding of experience-dependent development of lateralised neural activation.

      The tectofugal pathway in pigeons projects from the retina to the optical tectum, then to the nucleus rotundus in the thalamus, and then to the entopallium. The thalamofugal pathway projects from the retina to the GLd in the thalamus, and then to the wulst in the hyperpallium. The two pathways involve different thalamic nuclei (e.g., Deng 2006). In the methods and throughout the manuscript it should be specified which thalamic region is used as ROI.

      Here we refer to the Gld in the thalamofugal visual pathway, we did not estimate activity in the n. rotundus. We have now clarified this point in the revised MS (ll. 54, 80, 86).

      This manuscript only describes neural activity, but the MEMRI technique should also be used to assess the effect of experimental manipulations on axonal connectivity. It is important to learn about the asymmetry of contralateral projections in the light vs dark groups for answering the research question.

      Here we used systemic administration of Mn through the CAM. The Blood Brain Barrier at this embryonic stage is not completely developed and its permeability to ions and small molecules is way higher in embryo than in later stages of development (Engelhardt, B. (2003). Development of the blood-brain barrier. Cell and tissue research, 314(1), 119-129.). Other studies involving direct, local injection in selected brain regions are more apt to investigate connectivity, but this is not the protocol used here. We appreciate the reviewer’s suggestion, and this will be the object of future experiments. However, we would like to disseminate the current protocol and the results it led to at an early stage to enable and encourage its use by other researchers in the field.

      There is an overinterpretation of post-hoc statistics that are reported without correction for multiple testing. The wulst light group lateralization is probably not actually different from zero (uncorrected p=0.04).

      We considered the reviewer's observation regarding the need for improvements in the statistical methods. In response, we have made amendments to the relevant section of the manuscript, explicitly stating that significant findings were obtained using a two-way ANOVA. For comparisons between conditions within specific brain regions, we conducted two-sample t-tests, and the results were corrected for Type I errors using the false discovery rate (FDR) method. Post-hoc one-sample t-tests were employed to assess lateralization across brain regions and conditions, and the corresponding p-values were reported without correction for multiple comparisons (as explicitly reported in the text, to avoid any confusion).

      The first line in the discussion states that there is thalamofugal lateralization, but no lateralization in the tectofugal pathway. To my understanding, previous literature reported it the other way around: in altricial pigeons, light exposure in the egg mainly affected the tectofugal pathway (Deng & Rogers 2002), while the thalamofugal pathway in pigeons was not lateralized (Strockens et al., 2013). The manuscript should compare the current findings with the literature and discuss differences.

      We are aware of the substantial differences in brain lateralization of the two visual pathways between pigeons and chicks after embryonic light exposure. However, in the present work we employed chick embryos (Gallus gallus domesticus), and the space limitations of a Brief Communication do not allow for an in-depth discussion of these differences between avian species.

      Moreover, the tectum is the only region shown here from the tectofugal pathway. However, lateralization of contralateral connections is expected from tectum to the nucleus rotundus in the thalamus, and thus lateralization of activation may only arise in downstream brain regions from the optical tectum. Therefore, the conclusion that there is no lateralization in the tectofugal pathway is not supported by the data.

      In conclusion, I think it is interesting and worthwhile that the authors assessed neural activity in response to visual stimulation in the embryo prior to hatching, but multiple methodological weaknesses and unclarities should be addressed.

      The ROI that we here named Thalamus does not include the nucleus rotundus, but is referring to the nucleus geniculatus lateralis (Gld). We have now clarified this point in the revised MS (ll. 54, 80, 86), and we now refer only to the tectum, without generalizing to the entire tectofugal pathway, which will be the subject of future investigations.

    1. Reviewer #3 (Public Review):

      There has been a long-standing link between the biology of sulfur-containing molecules (e.g., hydrogen sulfide gas, the amino acid cysteine, and its close relative cystine, et cetera) and the biology of hypoxia, yet we have a poor understanding of how and why these two biological processes and are co-regulated. Here, the authors use C. elegans to explore the relationship between sulfur metabolism and hypoxia, examining the regulation of cysteine dioxygenase (CDO1 in humans, CDO-1 in C. elegans), which is critical to cysteine catabolism, by the hypoxia inducible factor (HIF1 alpha in humans, HIF-1 in C. elegans), which is the key terminal effector of the hypoxia response pathway that maintains oxygen homeostasis. The authors are trying to demonstrate that (1) the hypoxia response pathway is a key regulator of cysteine homeostasis, specifically through the regulation of cysteine dioxygenase, and (2) that the pathway responds to changes in cysteine homeostasis in a mechanistically distinct way from how it responds to hypoxic stress.

      Briefly summarized here, the authors initiated this study by generating transgenic animals expressing a CDO-1::GFP protein chimera from the cdo-1 promoter so that they could identify regulators of CDO-1 expression through a forward genetic screen. This screen identified mutants with elevated CDO-1::GFP expression in two genes, egl-9 and rhy-1, whose wild-type products are negative regulators of HIF-1, raising the possibility that cdo-1 is a HIF-1 transcriptional target. Indeed, the authors provide data showing that cdo-1 regulation by EGL-9 and RHY-1 is dependent on HIF-1 and that regulation by RHY-1 is dependent on CYSL-1, as expected from other published findings of this pathway. The authors show that exogenous cysteine activates cdo-1 expression, reflective of what is known to occur in other systems. Moreover, they find that exogenous cysteine is toxic to worms lacking CYSL-1 or HIF-1 activity, but not CDO-1 activity, suggesting that HIF-1 mediates a survival response to toxic levels of cysteine and that this response requires more than just the regulation of CDO-1. The authors validate their expression studies using a GFP knockin at the cdo-1 locus, and they demonstrate that a key site of action for CDO-1 is the hypodermis. They present genetic epistasis analysis supporting a role for RHY-1, both as a regulator of HIF-1 and as a transcriptional target of HIF-1, in offsetting toxicity from aberrant sulfur metabolism. The authors use CRISPR/Cas9 editing to mutate a key amino acid in the prolyl hydroxylase domain of EGL-9, arguing that EGL-9 inhibits CDO-1 expression through a mechanism that is largely independent of the prolyl hydroxylase activity.

      Overall, the data seem rigorous, and the conclusions drawn from the data seem appropriate. The experiments test the hypothesis using logical and clever molecular genetic tools and design. The sample size is a bit lower than is typical for C. elegans papers; however, the experiments are clearly not underpowered, so this is not an issue. The paper is likely to drive many in the field (including the authors themselves) into deeper experiments on (1) how the pathway senses hypoxia and sulfur/cysteine/H2S using these distinct mechanisms/modalities, (2) how oxygen and sulfur/cysteine/H2S homeostasis influence one another, and (3) how this single pathway evolved to sense and respond to both of these stress modalities.

      Major strengths of the paper include (1) the use of the powerful whole animal C. elegans model to reveal results that have meaning in vivo, (2) the careful demonstration through mutant rescue experiments that key transgenes have functional activity, (3) the use of CRISPR/Cas9 editing to mutate a critical residue in the catalytic domain of the EGL-9 prolyl hydroxylase, (4) transgenic rescue experiments that show that CDO-1 operates in the hypodermis with regard to the larval arrest phenotype, and (5) the thorough epistatic analysis of different pathway mutants.

      Major weaknesses of the paper include (1) the over-reliance on genetic approaches, (2) the lack of novelty regarding prolyl hydroxylase-independent activities of EGL-9, and (3) the lack of biochemical approaches to probe the underlying mechanism of the prolyl hydroxylase-independent activity of EGL-9.

      Major Issues We Feel the Authors Should Address:

      1. One particularly glaring concern is that the authors really do not know the extent to which the prolyl hydroxylase activity is (or is not) impacted by the H487A mutation in egl-9(rae276). If there is a fair amount of enzymatic activity left in this mutant, then it complicates interpretation. The paper would be strengthened if the authors could show that the egl-9(rae276) eliminates most if not all prolyl hydroxylase activity. In addition, the authors may want to consider doing RNAi for egl-9 in the egl-9(rae276) mutant as a control, as this would support the claim that whatever non-hydroxylase activity EGL-9 may have is indeed the causative agent for the elevation of CDO-1::GFP. Without such experiments, readers are left with the nagging concern that this allele is simply a hypomorph for the single biochemical activity of EGL-9 (i.e., the prolyl hydroxylase activity) rather than the more interesting, hypothesized scenario that EGL-9 has multiple biochemical activities, only one of which is the prolyl hydroxylase activity.

      2. The authors observed that EGL-9 can inhibit HIF-1 and the expression of the HIF-1 target cdo-1 through a combination of activities that are (1) dependent on its prolyl hydroxylase activity (and subsequent VHL-1 activity that acts on the resulting hydroxylated prolines on HIF-1), and (2) independent of that activity. This is not a novel finding, as the authors themselves carefully note in their Discussion section, as this odd phenomenon has been observed for many HIF-1 target genes in multiple publications. While this manuscript adds to the description of this phenomenon, it does not really probe the underlying mechanism or shed light on how EGL-9 has these dual activities. This limits the overall impact and novelty of the paper.

      3. Cysteine dioxygenases like CDO-1 operate in an oxygen-dependent manner to generate sulfites from cysteine. CDO-1 activity is dependent upon availability of molecular oxygen; this is an unexpected characteristic of a HIF-1 target, as its very activation is dependent on low molecular oxygen. Authors neither address this in the text nor experimentally, and it seems a glaring omission.

      4. The authors determined that the hypodermis is the site of the most prominent CDO-1::GFP expression, relevant to Figure 4. This claim would be strengthened if a negative control tissue, in the animal with the knockin allele, were shown. The hypodermal specific expression is a highlight of this paper, so it would make this article even stronger if they could further substantiate this claim.

      Minor issues to note:

      Mutants for hif-1 and cysl-1 are sensitive to exogenous cysteine levels, yet loss of CDO-1 expression is not sufficient to explain this phenomenon, suggesting other targets of HIF-1 are involved. Given the findings the authors (and others) have had showing a role for RHY-1 in sulfur amino acid metabolism, shouldn't the authors consider testing rhy-1 mutants for sensitivity to exogenous cysteine?

      The cysteine exposure assay was performed by incubating nematodes overnight in liquid M9 media containing OP50 culture. The liquid culture approach adds two complications: (1) the worms are arguably starving or at least undernourished compared to animals grown on NGM plates, and (2) the worms are probably mildly hypoxic in the liquid cultures, which complicates the interpretation.

      An easily addressable concern is the wording of one of the main conclusions: that cdo-1 transcription is independent of the canonical prolyl hydroxylase function of EGL-9 and is instead dependent on one of EGL-9's non-canonical, non-characterized functions. There are several points in which the wording suggests that CDO-1 toxicity is independent of EGL-9. In their defense, the authors try to avoid this by saying, "EGL-9 PHD," to indicate that it is the prolyl hydroxylase function of EGL-9 that is not required for CDO-1 toxicity. However, this becomes confusing because much of the field uses PHD and EGL-9/EGLN as interchangeable protein names. The authors need to be clear about when they are describing the prolyl hydroxylase activity of EGL-9 rather than other (hypothesized) activities of EGL-9 that are independent of the prolyl hydroxylase activity.

      The authors state in the text, "the egl-9; suox-1 double mutants are extremely sick and slow growing." We appreciate that their "health" assay, based on the exhaustion of food from the plate, is qualitative. We also appreciate that it is a functional measure of many factors that contribute to how fast a population of worms can grow, reproduce, and consume that lawn of food. However, unless they do a lifespan assay and/or measure developmental timing and specifically determine that the double mutant animals themselves are developing and/or growing more slowly, we do not think it is appropriate to use the words "slow growing" to describe the population. As they point out, the rate of consumption of food on the plate in their health assay is determined by a multitude and indeed a confluence of factors; the growth rate is one specific one that is commonly measured and has an established meaning.

    1. Neither Spread of U.S. Slavery nor Invasion of America uses language explicitly condemning slavery or imperialism, allowing the map’s usage by potentially racist and xenophobic visitors. The objective, socially-neoliberal portrayal of data without subjectivity perpetuates color-blind racism and allows bigotry to take root.

      I think this may be precisely because these maps are scholarly maps. Members of academia tend to avoid making a "subjective" or "biased" argument, especially regarding historial matters. On the other hand, non-scholarly maps created bottom-up through community engagement (such as the Anti-Eviction Mapping Project referenced in Data Feminism) can more explicitly call out injustices. I want to learn more about the ways in which we can complement the limitations of scholarly mapping projects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments. We have now addressed all the comments in a revised version of the manuscript, which we believe has strengthened our paper.

      1) Introduction LINE 60: the authors cite Funato et al 2016 as the paper first describing a role for SIk3 in sleep regulation. In fact, the role for this kinase was first identified nearly a decade earlier in C. elegans (Van der Linden et al, Genetics 2008 PMID 18832350).

      Thank you for pointing us to this reference. Van der Linden et al. demonstrated that the C. elegans homolog of Sik3 (KIN-29) regulates satiety quiescence, in which worms stop moving following feeding on high quality food. However, as pointed out in Trojanowski and Raizen “Call it Worm Sleep” (2016), not all of the behavioral criteria for sleep has been applied to C. elegans satiety quiescence, and we cannot find any references that unequivocally demonstrate satiety quiescence is a sleep state. As McClanahan et al., (2020) show, quiescent states following mild sensory arousal do not fulfill the sleep criteria of changes in arousal threshold and homeostatic regulation, so not all quiescent states in C. elegans are sleep. Then again Grubbs et al, 2020 does demonstrate that KIN29 regulates both developmentally timed and stress induced sleep states in worms, suggesting that the observations in Van der Linden were ahead of its time and these behavioral states are possibly inter-related. We believe, though, that our line “the roles of… SIK3 kinase in modulating sleep homeostasis in mice (Funato et al. 2016) were identified in genetic screens” remains accurate.

      2) Introduction LINE 71: remove the word "known" from "...while some known human sleep/wake regulators, such as the...")

      Good idea. Done.

      3) I was confused regarding Supplemental data 1 describing the genes they targeted with their forward genetic screen. Am I understanding correctly from the "Summary stats" tab that 702 fish lines with virus insertions were screened behaviorally? In Figure S1, it looks like about 60 are shown in the histograms but in the text (in the Discussion) they say 25 were screened. Were all the genes listed under the Excel tabs (GPCRs, channels, etc) tested? Or was just a subset tested? Where are the sleep data for these lines? Negative results may be relevant to their manuscript since they listed (tested??) a number of ion channel genes under tab "channels" which appear to NOT have a sleep phenotype.

      We apologize for the confusion on these points. As highlighted in the legend to Supplementary Figure S1, we had planned a screening strategy with the following pipeline: Candidate mammalian gene → Zebrafish ortholog → ID viral insertion from “Zenemark” library → grow viral insertion lines from frozen sperm→ phenotype F3 heterozygous and homozygous mutant generation. Unfortunately, the company, Znomics, which held the Zenemark library, could not reliably reconstitute the correct live fish from the sperm library, and of the 702 lines we planned to screen, we could only screen 26 (25 was a typo) lines. We treated heterozygous and homozygous animals for each line independently, for a total of 52 screened lines in the histograms.

      To make this clearer, we have edited the main text as follows (lines 104-105): “For screening, we identified zebrafish sperm samples from the Zenemark collection (Varshney et al., 2013) that harboured viral insertions in genes of interest and used these samples for in vitro fertilization and the establishment of F2 families, which we were able to obtain for 26 lines.” And lines 111-112: “While most screened heterozygous and homozygous lines had minimal effects on sleep-wake behavioural parameters (Figure S1B-S1C),”

      We believe it is important to include the full set of Supplementary Data 1, even though the vast majority of these candidate lines were not tested.

      4) Results LINE 117: remove the word "prominent", which is subjective, from the sentence "...showed a prominent decrease in sleep during the..."

      Good point. Done.

      5) LINES 185-186: did you see any circadian variation in your dmist:GFP protein abundance or localization? Protein trafficking has been described as a mechanism of circadian regulation of excitability.

      For practical reasons, we imaged the membrane localization of Dmist:GFP in plasmidinjected embryos at 90% epiboly, which is about 9 hours after fertilization and when the cells remain large and in a relatively flat epithelium. Thus, we could not follow circadian fluctuations in abundance or localization. For circadian studies, we believe the best method will be to raise an antibody that recognizes Dmist.

      6) LINE 203: does the GFP-tagged Dmist rescue the loss-of-function phenotype? This is relevant to Figure 2E. it is also relevant to the issue of structure-function. If it rescues, then the C-terminus may not be essential to protein function.

      As noted, for practical reasons, we observed Dmist-GFP only transiently at early stages of development, expressed using a strong, ubiquitous promoter. A rescue experiment is a good idea for future experiments, where we carefully control the expression of Dmist in neurons.

      7) LINE 220: explain what you mean by "...consistent with nonsense-mediated decay." and/or give a reference.

      In zebrafish and other species including humans, mutant transcripts that have premature stop codons often undergo “nonsense mediated decay”, whereby the expression levels are largely reduced (Wittkopp et al., 2009). In the zebrafish community, this is often used as secondary evidence of a loss of function mutation, as relatively few antibodies are available to directly observe zebrafish proteins. We have added a reference that describes this phenomenon (Wittkopp et al., 2009).

      8) LINE 225: define "LME model"

      Now reads: “Linear mixed effects (LME).”

      9) LINES 227-229: could the vir/vir phenotype be explained by specific effects on protein structure? could vir/vir be a gain-of-function allele?

      We can’t rule this out formally, and vir/+ animals do show some sleep phenotypes, albeit weaker than those of vir/vir animals (Figure 1G). However, it is not uncommon for heterozygous mutants to show significant phenotypes that are weaker than those of their homozygous mutant siblings, and the strong suppression of dmist expression by the viral insertion (which is located in the dmist intron) is more consistent with a hypomorphic loss-of-function phenotype for the vir allele.

      10) LINES 229-230: I don't quite follow the argument for pursuing further studies only of i8/i8. i8/i8 seems to also be a hypomorphic allele based on your qPCR data.

      First, the dmist viral line was generated by an insertional mutagenesis method followed by sequencing, and each line has multiple other inserts in a background that does not match the background of the other animals reported in this paper. Second, the dmist vir allele is an insertion in the intron, leading to reduced, but not complete loss of expression. In contrast, the i8 allele was generated on the same background strain as our other existing and newly reported lines. Moreover, our i8 line is likely a loss-of-function allele and not a hypomorph. Yes, dmist expression is reduced in the i8 allele; however, this is likely due to nonsense mediated decay of dmist mRNA. The mutation introduces a frameshift in the dmist coding sequence, and as a result the amino acid sequence of the protein is altered after the N-terminal signal sequence.

      11) LINES 241-243: grammar.

      Fixed

      12) LINE 245: define "JackHMMR iterative search"

      We’ve added the phrase: “and seeding a hidden Markov model iterative search (JackHMMR)”

      13) LINE 246 is missing the word "we" prior to "...found distant homology between..."

      Added

      14) LINE 301: show data demonstrating deviation from Mendelian ratios. Also, comment on meaning of such data (embryonic lethality??).

      We have added this data in the line (301):

      “atp1a3b mutant larvae were not obtained at Mendelian ratios (55 wild type [52.5 expected], 142 [105] atp1a3b+/-, 13 [52.5] atp1a3b-/-; p<0.0001, Chi-squared) suggesting some impact on early stages of development leading to lethality.”

      15) Discussion LINES 362-372: This paragraph seems to be of only tangential relevance to the paper. Consider removing.

      Our screening strategy was a large-scale reverse genetic screen, but the number of lines was limited by the technical issues described above. We think it is important to mention that the strategy, if employed today, could benefit from newer technologies.

      16) Discussion. Another model is that Dmist and NaK pump have a developmental effect. Arguing against this developmental model is the Oubain expt.

      This is an important point. We’ve added the line (454:457): “We also cannot exclude a role for Dmist and the Na+/K+ pump in developmental events that impact sleep, although our observation that ouabain treatment, which inhibits the pump acutely after early development is complete, also impacts sleep, argues against a developmental role.”

      17) FIGURE 1G: Are these significance cut offs corrected for multiple comparisons?

      Yes, all the data is corrected for multiple comparisons.

      18) performing neuronal activity measures, either via neural activity imaging or phospho-ERK labeling in different mutants at day or night conditions, to determine whether baseline neuronal activity brain-wide or in specific brain regions are altered.

      These are excellent experiments that we plan to perform in the future.

      19) Please check all Figure numbers for accuracy.

      We have double checked these.

      20) The authors emphasize the role of increased cellular sodium, but equally plausibly, the phenotypes could be due to decreased cellular potassium. The potassium channel shaker has been previously identified as a critical sleep regulator in Drosophila.

      We completely agree. We would like to highlight that we did devote an entire paragraph to the possibility of changes in extracellular potassium in the discussion: “A third possibility is that Dmist and the Na+,K+-ATPase regulate sleep not by modulation of neuronal activity per se but rather via modulation of extracellular ion concentrations. Recent work has demonstrated that interstitial ions fluctuate across the sleep/wake cycle in mice. For example, extracellular K+ is high during wakefulness, and cerebrospinal fluid containing the ion concentrations found during wakefulness directly applied to the brain can locally shift neuronal activity into wake-like states (Ding et al., 2016). Given that the Na+,K+-ATPase actively exchanges Na+ ions for K+ , the high intracellular Na+ levels we observe in atp1a3a and dmist mutants is likely accompanied by high extracellular K+. Although we can only speculate at this time, a model in which extracellular ions that accumulate during wakefulness and then directly signal onto sleep-regulatory neurons could provide a direct link between Na+,K+ ATPase activity, neuronal firing, and sleep homeostasis. Such a model could also explain why disruption of fxyd1 in non-neuronal cells also leads to a reduction in night-time sleep.”

      We also agree that Shaker may be an important component of this sleep regulatory mechanism. Indeed, we previously showed that another potassium channel in zebrafish regulates sleep (Rihel et al., 2010).

      We have emphasized sodium homeostasis in our title and paper only because we were able to directly observe intracellular sodium levels, so we are confident that these have been altered in our mutants. We can only presume that potassium levels have also been altered, but we could not directly observe this.

      21) The similar phenotype between dmist and Fxyd1 in sleep reduction yet very different expression patterns, with dmist being mostly neuronal while fxyd1 being mostly non-neuronal, raise many possible questions: 1) are the sleep phenotypes due to neuronal Na/K imbalance? Or 2) Are the sleep phenotypes due to extracellular Na/K imbalance? Or 3) both? Some feasible experiments may help achieve a better mechanistic understanding of the observed sleep defects.

      Yes, we think these are excellent studies for future work. As noted in the previous point (20), we did discuss the possibility that changes to extracellular potassium might be a parsimonious explanation for the similar phenotypes of fxyd1 and dmist mutants.

      Future experiment suggestions (not required)

      1) Perform a double mutant analysis of fxyd1 and atp1a3a, to determine whether an epistatic relationship similar to that of dmist and atp1a3a is observed in the case of fxyd1 and atp1a3a.

      This is a great experiment that we will do in the future. Unfortunately, the fxyd1 mutant had been sperm frozen during the COVID-19 pandemic, so we cannot do this experiment at this time.

      2) Given the differences in the sleep phenotypes between vir/vir and i8/i8 mutants, would be informative to see the phenotype of the vir/i8 trans-heterozygote.

      This is also a good experiment to perform in the future. Since obtaining the cleaner i8 allele, the dmistvir/vir lines were sperm frozen.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: This study by Magalhaes et al sheds light on the molecular underpinnings of the relative resistance of children to severe COVID-19. The authors found that priming of epithelial cells by resident immune cells to express tonic levels PRR receptors MDA-5 and RIG-I predisposes the epithelial cells for a faster and more robust onset of IFN-beta production upon SARS-CoV-2 infection. The study uses a combination of in vitro and ex vivo models, as well as mining of scRNA-Seq datasets from clinical specimens.

      Major comments: The claims and conclusions are supported by the data and therefore no new experiments are needed.

      Optional

      1. The use of primary cells (i.e. human airway epithelial cultures cross talking to immune cells) would make this study more compelling, although I assume that the major findings would be recapitulated in such models.
      2. It is not clear how the use of Yersinia enterocolitica to trigger activation of PBMC is relevant to this story. Using different (commensal) pathogens to achieve PBMC activation may yield different and more physiologically relevant results.
      3. The manuscripts would greatly benefit from improved structure and focus, particularly in the Abstract, Introduction and Results sections. The text is very dense, and makes it difficult for the reader to follow the flow and to distinguish important from less important information. Particularly, the introduction starts very broadly introducing COVID-19, which I think we are by now all familiar with. Directly starting with the burning question why kids get less sick with SARS-CoV-2 would capture the readers' attention better. Figure 1 a is beautiful for a review but much too dense to help the reader as a graphical abstract. In the results section, for each experiment, leading with clearly stating the rationale of the specific question, the gap in knowledge and why the gap is there, then followed by the results, then summarizing the impact of said results, would make this a much more enjoyable read and help the reader evaluate the novelty and impact better, particularly for Figures 1, 2, and 3 (but also all others). The interaction wheel graphs (Figure 4. are amazing, but are not properly explained in the text (do I read this right that in adults, all the crosstalk is basically performed by proliferating T-cells?). In all, these scientific writing issues sell an otherwise beautiful story short.

      Referees cross-commenting

      I agree with reviewers 1 and 2 that the use of primary cells would significantly elevate the story. However, I think this should be "optional", as I do not think it would change the findings.

      Significance

      General assessment:

      The main strength of the study are its topic and clearly relevant question: why do kids rarely get severe COVID-19? The main novelty is the answer to this question, that immune cell-epithelial crosstalk in children elevates the tonic expression of MDA5 and RIG-I via the IRF1 axis, leading to faster onset of IFN production and signaling upon SARS-CoV-2 challenge, which ultimately mounts an antiviral response detrimental to robust SARS-CoV-2 replication. The study uses an innovative combination of in vivo and ex vivo experiments and analysis of clinical specimens.

      The significant advance of this study to the field is clear to this reviewer, although it could be much better stated in the manuscript, as described at length above. The study is of great interest to the field of immunology and virology, and also has clinical and translational impact with respect to risk assessment for severe COVID-19 per age group, as well as epidemiological considerations for infection control.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Reviewer #3 (Public Review):

      Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      First, a few referencing issues: the key reference quoted for distinguishing natural from artefactual markings (Fernandez-Jalvo et al. 2014), whilst mentioned in the text, is not included in the references. In the acknowledgements, the claim that "permits to conduct research in the Rising Star Cave system are provided by the South African National Research Foundation" should perhaps refer rather to SAHRA? In the primary description of their own markings from Rising Star and their presumed significance, there are, oddly, several unacknowledged quotes from the abstract of one of the most significant European references (Rodriguez-Vidal et al. 2014). These need attention.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands are no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half-convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment. But the latter is a painted fragment, not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion, not an observation, and the relationship between hominins and designs no less so. In fact, the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      • Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      • Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      • Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      • White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.